From: "Niocláisín Cóilín de Ghlostéir" <Spamassassin@irrt.De>
Subject: Re: How to read in a (long) UTF-8 file, incrementally?
Date: Sun, 24 Aug 2025 22:57:38 +0200 [thread overview]
Message-ID: <0e73134a-a5d5-d594-3d5c-cec2d1d100a0@insomnia247.nl> (raw)
In-Reply-To: <d1c5ba75-bc0a-4e7b-a2df-394bc710cbcen@googlegroups.com>
Doctor Marius Amado-Alves wrote on 2nd November 2021:
|----------------------------------------------------------------------------|
|"As I understand it, to work with Unicode text inside the program it is |
|better to use the Wide_Wide (UTF-32) variants of everything. |
| |
|Now, Unicode files usually are in UTF-8. |
| |
|One solution is to read the entire file in one gulp to a String, then |
|convert to Wide_Wide. This solution is not memory efficient, and it may not |
|be possible in some tasks e.g. real time processing of lines of text. |
| |
|If the files has lines, I guess we can also work line by line (Text_IO). But|
|the text may not have lines. Can be a long XML object, for example. |
| |
|So it should be possible to read a single UTF-8 character, right? Which |
|might be 1, 2, 3, or 4 bytes long, so it must be read into a String, right? |
|Or directly to Wide_Wide. Are there such functions? |
| |
|Thanks a lot." |
|----------------------------------------------------------------------------|
Timings can be significantly affected when a thing has an unknown varying
quantity of octets.
prev parent reply other threads:[~2025-08-24 20:57 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-02 17:42 How to read in a (long) UTF-8 file, incrementally? Marius Amado-Alves
2021-11-02 18:17 ` Dmitry A. Kazakov
2021-11-03 7:43 ` Vadim Godunko
2021-11-03 8:48 ` Luke A. Guest
2021-11-04 11:43 ` Marius Amado-Alves
2021-11-04 12:13 ` Dmitry A. Kazakov
2021-11-04 14:30 ` Luke A. Guest
2021-11-05 10:56 ` Marius Amado-Alves
2021-11-05 19:55 ` Simon Wright
2021-11-16 11:55 ` Marius Amado-Alves
2021-11-16 12:36 ` Dmitry A. Kazakov
2021-11-16 13:52 ` Marius Amado-Alves
2021-11-16 20:23 ` Randy Brukardt
2021-11-16 15:25 ` Luke A. Guest
2021-11-16 17:38 ` Vadim Godunko
2025-08-24 20:57 ` Niocláisín Cóilín de Ghlostéir [this message]
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox