Re: How to read in a (long) UTF-8 file, incrementally?

comp.lang.ada
 help / color / mirror / Atom feed

From: "Randy Brukardt" <randy@rrsoftware.com>
Subject: Re: How to read in a (long) UTF-8 file, incrementally?
Date: Tue, 16 Nov 2021 14:23:28 -0600	[thread overview]
Message-ID: <sn1401$ubi$1@franka.jacob-sparre.dk> (raw)
In-Reply-To: sn08jf$pkq$1@gioia.aioe.org

"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message 
news:sn08jf$pkq$1@gioia.aioe.org...
> On 2021-11-16 12:55, Marius Amado-Alves wrote:
>> I'm worried. I need the concept of character, for proper text processing.
>
> Simply ignore or reject decomposed characters.

Unicode calls that "requiing Normalization Form C". ("Form D" is all 
decomposed characters.) You'll note that what Ada compilers do with text not 
in Normalization Form C is implementation-defined; in particular, a compiler 
could reject such text.

My understanding is that various Internet standards also require 
Normalization Form C. For instance, web pages are supposed to always be in 
that format. Whether browsers actually enforce that is unknown (they should 
enforce a lot of stuff about web pages, but generally just try to muddle 
through, which causes all kinds of security issues).

                            Randy.

next prev parent reply	other threads:[~2021-11-16 20:23 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-02 17:42 How to read in a (long) UTF-8 file, incrementally? Marius Amado-Alves
2021-11-02 18:17 ` Dmitry A. Kazakov
2021-11-03  7:43 ` Vadim Godunko
2021-11-03  8:48 ` Luke A. Guest
2021-11-04 11:43   ` Marius Amado-Alves
2021-11-04 12:13     ` Dmitry A. Kazakov
2021-11-04 14:30     ` Luke A. Guest
2021-11-05 10:56       ` Marius Amado-Alves
2021-11-05 19:55         ` Simon Wright
2021-11-16 11:55           ` Marius Amado-Alves
2021-11-16 12:36             ` Dmitry A. Kazakov
2021-11-16 13:52               ` Marius Amado-Alves
2021-11-16 20:23               ` Randy Brukardt [this message]
2021-11-16 15:25             ` Luke A. Guest
2021-11-16 17:38             ` Vadim Godunko

replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox