From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 4.0.1 (2024-03-25) on ip-172-31-91-241.ec2.internal X-Spam-Level: X-Spam-Status: No, score=0.0 required=3.0 tests=none autolearn=ham autolearn_force=no version=4.0.1 Path: nntp.eternal-september.org!eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail From: Alex // nytpu Newsgroups: comp.lang.ada,fr.comp.lang.ada Subject: Re: Ada 202x; 2022; and 2012 and Unicode package (UTF-nn encodings handling) Date: Tue, 2 Sep 2025 13:13:12 -0600 Organization: A noiseless patient Spider Message-ID: <1097fk8$mvl7$1@dont-email.me> References: <10974d1$jn0e$1@dont-email.me> <1097br7$lh15$1@dont-email.me> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Tue, 02 Sep 2025 19:13:13 +0000 (UTC) Injection-Info: dont-email.me; posting-host="157fc295d848f13cf128b9b50ab8eec6"; logging-data="753319"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+HAN7qjdsnbjKz6ysWaik2" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:DUlzoQ6E060XXB1Z84INyUEQ+Nw= Content-Language: en-US, en-US-large In-Reply-To: <1097br7$lh15$1@dont-email.me> Xref: feeder.eternal-september.org comp.lang.ada:67020 fr.comp.lang.ada:2356 List-Id: On 9/2/25 12:08 PM, Dmitry A. Kazakov wrote: > The matter is quite straightforward: Objectively false, "text" is never actually straightforward despite what it seems like on a surface level :P > 1. Never ever use Wide and Wide_Wide. There is a marginal case of > Windows API where you need Wide_String for UTF-16 encoding. Otherwise, > use cases are absent. No text processing algorithms require code point > access. Somewhat inclined to agree with Wide_<> but I don't see strong justification to *never* use Wide_Wide_<>, there's pretty substantial tradeoffs to both using UTF-32 and UTF-8 (in any programming language that supports both, but particularly with Ada's string situation) so unfortunately it ultimately falls on the programmer to understand and choose. > 2. Use Character as octet. String as UTF-8 encoded. Perfectly valid, explicitly mentioned as an option in my post. Maybe actually would be better for most applications because they wouldn't need to transcode it, I should've noted that more clearly in my original response. The only two issues: make sure to avoid the Latin-1 String routines unless you know you're doing is sound; and in older Ada versions I remember reading long debates about the String type may not be able to safely store UTF-8 on many compilers (of the era), but that issue was clarified by even Ada 95 IIRC. I just personally prefer Wide_Wide_<> to get its slightly more Unicode-aware string routines, but it's not the only (or even inherently the best) option. ~nytpu -- Alex // nytpu https://nytpu.com/ - gemini://nytpu.com/ - gopher://nytpu.com/