From: "Luke A. Guest" <laguest@archeia.com>
Subject: Re: How to read in a (long) UTF-8 file, incrementally?
Date: Tue, 16 Nov 2021 15:25:10 +0000 [thread overview]
Message-ID: <sn0ijs$7v2$1@gioia.aioe.org> (raw)
In-Reply-To: f0d17e38-58c7-4914-ab9c-8632cecc8215n@googlegroups.com
On 16/11/2021 11:55, Marius Amado-Alves wrote:
> I'm worried. I need the concept of character, for proper text processing. For example, I want to reference characters in a text file by their position. Any tips/references on how to deal with combining characters, or any other perturbating feature of Unicode, greatly appreciated.
>
> (For me, a combining character is not a character, the combination is. Unicode agrees, right?)
>
You can't. The concept of character is dead, the new concept are
grapheme clusters.
next prev parent reply other threads:[~2021-11-16 15:25 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-02 17:42 How to read in a (long) UTF-8 file, incrementally? Marius Amado-Alves
2021-11-02 18:17 ` Dmitry A. Kazakov
2021-11-03 7:43 ` Vadim Godunko
2021-11-03 8:48 ` Luke A. Guest
2021-11-04 11:43 ` Marius Amado-Alves
2021-11-04 12:13 ` Dmitry A. Kazakov
2021-11-04 14:30 ` Luke A. Guest
2021-11-05 10:56 ` Marius Amado-Alves
2021-11-05 19:55 ` Simon Wright
2021-11-16 11:55 ` Marius Amado-Alves
2021-11-16 12:36 ` Dmitry A. Kazakov
2021-11-16 13:52 ` Marius Amado-Alves
2021-11-16 20:23 ` Randy Brukardt
2021-11-16 15:25 ` Luke A. Guest [this message]
2021-11-16 17:38 ` Vadim Godunko
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox