From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 4.0.1 (2024-03-25) on ip-172-31-91-241.ec2.internal X-Spam-Level: X-Spam-Status: No, score=0.0 required=3.0 tests=none autolearn=ham autolearn_force=no version=4.0.1 Path: nntp.eternal-september.org!eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail From: =?UTF-8?Q?Niocl=C3=A1is=C3=ADn_C=C3=B3il=C3=ADn_de_Ghlost=C3=A9ir?= Newsgroups: comp.lang.ada Subject: Re: How to read in a (long) UTF-8 file, incrementally? Date: Sun, 24 Aug 2025 22:57:38 +0200 Organization: A noiseless patient Spider Message-ID: <0e73134a-a5d5-d594-3d5c-cec2d1d100a0@insomnia247.nl> References: MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset=US-ASCII Injection-Date: Sun, 24 Aug 2025 20:57:48 +0000 (UTC) Injection-Info: dont-email.me; posting-host="521c15616be3f84b108e8385c962691e"; logging-data="3206823"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX196Gh6xoC8CCF8R+GHLUnxjhHePaog1Q7rrBhwJeKgCLg==" Cancel-Lock: sha1:maOI3OJcCjQdq8li6pWKbDxuziw= In-Reply-To: Xref: feeder.eternal-september.org comp.lang.ada:66910 List-Id: Doctor Marius Amado-Alves wrote on 2nd November 2021: |----------------------------------------------------------------------------| |"As I understand it, to work with Unicode text inside the program it is | |better to use the Wide_Wide (UTF-32) variants of everything. | | | |Now, Unicode files usually are in UTF-8. | | | |One solution is to read the entire file in one gulp to a String, then | |convert to Wide_Wide. This solution is not memory efficient, and it may not | |be possible in some tasks e.g. real time processing of lines of text. | | | |If the files has lines, I guess we can also work line by line (Text_IO). But| |the text may not have lines. Can be a long XML object, for example. | | | |So it should be possible to read a single UTF-8 character, right? Which | |might be 1, 2, 3, or 4 bytes long, so it must be read into a String, right? | |Or directly to Wide_Wide. Are there such functions? | | | |Thanks a lot." | |----------------------------------------------------------------------------| Timings can be significantly affected when a thing has an unknown varying quantity of octets.