From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 4.0.1 (2024-03-25) on ip-172-31-91-241.ec2.internal X-Spam-Level: X-Spam-Status: No, score=0.0 required=3.0 tests=none autolearn=ham autolearn_force=no version=4.0.1 Path: nntp.eternal-september.org!eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail From: Nicolas Paul Colin de Glocester Newsgroups: comp.lang.ada,fr.comp.lang.ada Subject: Re: Ada 202x; 2022; and 2012 and Unicode package (UTF-nn encodings handling) Date: Tue, 2 Sep 2025 19:40:07 +0200 Organization: A noiseless patient Spider Message-ID: <4248d82b-d759-b5d2-aa45-e4c0e26c81e5@insomnia247.nl> References: <10974d1$jn0e$1@dont-email.me> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="708268602-835474360-1756834812=:2382941" Injection-Date: Tue, 02 Sep 2025 17:40:12 +0000 (UTC) Injection-Info: dont-email.me; posting-host="012d6f2228039dbcea876b9657b806b3"; logging-data="697834"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+KUA/r7kqcWE/u9mBcw80mRPA0t06wLVyOOpc/QRopAw==" Cancel-Lock: sha1:wvSy5W9u2VzistnbSlgqajtDnhQ= In-Reply-To: <10974d1$jn0e$1@dont-email.me> Xref: feeder.eternal-september.org comp.lang.ada:67016 fr.comp.lang.ada:2352 List-Id: This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --708268602-835474360-1756834812=:2382941 Content-Type: text/plain; format=flowed; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Alex // nytpu wrote during this decade, specifically today: |--------------------------------------------------------------------------= -| |"I can't find any of my old writing on it so I've rewritten it = | |here lol." = | |--------------------------------------------------------------------------= -| Dear Alex: A teammate had once solved a problem but he had forgotten how he solved it. So he has queried a search engine. So it showed him a webpage with a perfect solution --- a webpage written by him! I recommend searching for that old writing about Unicode: perhaps it has more details than this comp.lang.ada thread, or perhaps a perspective has been changed in an interesting way. Even if there is no difference, perhaps it is in a directory with other missing files which need to be backed up! |--------------------------------------------------------------------------= -| |"If you use Latin-1 or Windows-1252 or some weird = | |regional encoding everyone will hate you, and if you restrict inputs to = | |7-bit ASCII everyone will hate you too lol. And people will get annoyed = | |if you use UTF-16 or UTF-32 instead of UTF-8 as the interchange/storage = | |format in a new program." = | |--------------------------------------------------------------------------= -| I quote Usenet articles in a way which does not endear me to persons. Not everyone reacts in the same way. OC Systems asked me how do I draw those boxes. I advocate Ada which also does not endear me to persons. |--------------------------------------------------------------------------= -| |"[. . .] = | | = | |I personally use Wide_Wide_<> for everything just because it's more = | |convenient to have more useful built-in string functions, and it makes = | |dealing with input/output encoding much easier later (detailed below). = | | = | |[. . .] = | | = | |I'm unfortunate enough to know most of the nuances of Unicode but I = | |won't subject you to it, but a lot of the statements in your collection = | |are a bit oversimplified (UCS-4 has a number of additional differences = | |from UTF-32 regarding "valid encodings", [. . .] = | |[. . .]" = | |--------------------------------------------------------------------------= -| Thanks for this feedback and more will be as welcome as can be. I quoted examples of what I found in this newsgroup. This newsgroup used not have many statements with explicit references to "UTF-32" or "UTF32" or "UCS-4" which differ overwhelmingly from what I quoted during the previous week. |--------------------------------------------------------------------------= -| |"Also, I just stumbled across Ada.Strings.Text_Buffers which seems to be = | |new to Ada 2022, makes "string builder" stuff much more convenient = | |because you can write text using any of Ada's string types and then get = | |a string in whatever encoding you want [. . .] = | |[. . .]" = | |--------------------------------------------------------------------------= -| Package Ada.Strings.Text_Buffers does not support UCS-4. |--------------------------------------------------------------------------= -| |"Note that there is zero chance in hell that UTF-32 will ever be adopted a= s| |an interchange or storage encoding (except in isolated singular corporate = | |apps *maybe*), so UTF-32 being used should purely be an internal = | |implementation detail: incoming text in whatever encoding gets converted t= o| |it and outgoing text will always get converted from it." = | |--------------------------------------------------------------------------= -| One can know but what one can too optimistically know can be false. Character sets or encodings used to be subjects of unfulfilled expectations. I can say that for now, UTF-8 is enough for a particular application. Deadly Head did not have the same luck. |--------------------------------------------------------------------------= -| |"The encodings used by = | |Text_IO are mostly (but not entirely) based off of the `-gnatW` flag, whic= h| |is configuring the encoding of THE PROGRAM'S SOURCE CODE." = | |--------------------------------------------------------------------------= -| GNAT has many switches. It could easily gain more switches. Sinc=C3=A8res salutations. Nicolas Paul Colin de Glocester --708268602-835474360-1756834812=:2382941--