From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 4.0.1 (2024-03-25) on
	ip-172-31-91-241.ec2.internal
X-Spam-Level: 
X-Spam-Status: No, score=0.0 required=3.0 tests=none autolearn=ham
	autolearn_force=no version=4.0.1
Path: nntp.eternal-september.org!eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail
From: Alex // nytpu <nytpu@example.invalid>
Newsgroups: comp.lang.ada,fr.comp.lang.ada
Subject: Re: Ada 202x; 2022; and 2012 and Unicode package (UTF-nn encodings
 handling)
Date: Tue, 2 Sep 2025 13:13:12 -0600
Organization: A noiseless patient Spider
Message-ID: <1097fk8$mvl7$1@dont-email.me>
References: <op.vhrad6mjule2fv@garhos>
 <a8156fc2-bfbd-8199-b440-0ca9192d6936@insomnia247.nl>
 <10974d1$jn0e$1@dont-email.me> <1097br7$lh15$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 02 Sep 2025 19:13:13 +0000 (UTC)
Injection-Info: dont-email.me; posting-host="157fc295d848f13cf128b9b50ab8eec6";
	logging-data="753319"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+HAN7qjdsnbjKz6ysWaik2"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:DUlzoQ6E060XXB1Z84INyUEQ+Nw=
Content-Language: en-US, en-US-large
In-Reply-To: <1097br7$lh15$1@dont-email.me>
Xref: feeder.eternal-september.org comp.lang.ada:67020 fr.comp.lang.ada:2356
List-Id: <comp.lang.ada>

On 9/2/25 12:08 PM, Dmitry A. Kazakov wrote:
> The matter is quite straightforward:
Objectively false, "text" is never actually straightforward despite what 
it seems like on a surface level :P
> 1. Never ever use Wide and Wide_Wide. There is a marginal case of 
> Windows API where you need Wide_String for UTF-16 encoding. Otherwise, 
> use cases are absent. No text processing algorithms require code point 
> access.
Somewhat inclined to agree with Wide_<> but I don't see strong 
justification to *never* use Wide_Wide_<>, there's pretty substantial 
tradeoffs to both using UTF-32 and UTF-8 (in any programming language 
that supports both, but particularly with Ada's string situation) so 
unfortunately it ultimately falls on the programmer to understand and 
choose.
> 2. Use Character as octet. String as UTF-8 encoded.
Perfectly valid, explicitly mentioned as an option in my post.  Maybe 
actually would be better for most applications because they wouldn't 
need to transcode it, I should've noted that more clearly in my original 
response.  The only two issues: make sure to avoid the Latin-1 String 
routines unless you know you're doing is sound; and in older Ada 
versions I remember reading long debates about the String type may not 
be able to safely store UTF-8 on many compilers (of the era), but that 
issue was clarified by even Ada 95 IIRC.

I just personally prefer Wide_Wide_<> to get its slightly more 
Unicode-aware string routines, but it's not the only (or even inherently 
the best) option.

~nytpu

-- 
Alex // nytpu
https://nytpu.com/ - gemini://nytpu.com/ - gopher://nytpu.com/