From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: 103376,e4abd14106db0029 X-Google-NewGroupId: yes X-Google-Attributes: gida07f3367d7,domainid0,public,usenet X-Google-Language: ENGLISH,UTF8 Path: g2news1.google.com!news3.google.com!proxad.net!feeder1-2.proxad.net!usenet-fr.net!gegeweb.org!aioe.org!not-for-mail From: =?utf-8?Q?Yannick_Duch=C3=AAne_=28Hibou57?= =?utf-8?Q?=29?= Newsgroups: comp.lang.ada Subject: Re: Ada 2012 and Unicode package (UTF-nn encodings handling) Date: Sat, 21 Aug 2010 10:12:11 +0200 Organization: Ada At Home Message-ID: References: NNTP-Posting-Host: Fauhn3mTl+ARyEmE16DHig.user.speranza.aioe.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes Content-Transfer-Encoding: Quoted-Printable X-Complaints-To: abuse@aioe.org X-Notice: Filtered by postfilter v. 0.8.2 User-Agent: Opera Mail/10.61 (Win32) Xref: g2news1.google.com comp.lang.ada:13567 Date: 2010-08-21T10:12:11+02:00 List-Id: > I still fail to see the benefit of encoding 31 bits values into 32 bit= s > values... UTF-32 is not formally an encoding format, it would better be referred t= o = as a matter of Byte order. But this byte order is not system dependent, = it = is cross-platform data dependent. > And even if implementation is not a nightmare, it always has a cost. > Implementers are reluctant to spend money for features that nobody wil= l > use. (Wide_Wide_Character was forced on us by ISO). I suppose the ISO forced the introduction of Wide_Wide_Character because= = it is part of the Unicode standard, and as you know, conformance require= s = full-conformance. There is no part-of with this, because as soon and it = is = defined, this may really have occurrences. Imagine a web crawler: it would have to be designed with this option in = = mind. Designers could not say =E2=80=9CWe do not feel UTF-32 is useful, = our = crawler will then not be offered the capabilities of handling such = documents=E2=80=9D. I just though this was a little pity, if one want to rely on the standar= d = packages capabilities, then this one will only be able to do it partiall= y. = This would be a bit like Two way linked list without the one way (or the= = opposite). A matter of completeness. > A package provides functionnalities. It should not presume how it is > used. Since this package is clearly in the "string handling" class, it= > makes sense to handle this with strings. Right, this is defined in *String*_Encoding. > For files, the usage is to have a BOM on the first line of the file. T= he > way the functions are defined makes it easy to not process the first > line specially; see the use case in the AI. I just had a look back at http://www.ada-auth.org/standards/12aarm/html/AA-A-4-11.html Only Encode has this capability (via Output_BOM : Boolean). Decode/Conve= rt = has nothing similar and will always skip any 16#FEFF# which will be = interpreted as a BOM instead of as a character (there is nothing like an= = Interpret_BOM : Boolean). But may be I am missing something. Will have a deeper look at it and at = = the AI which come with it (I saw UTF-32 was at least =E2=80=9Cpronounced= =E2=80=9D during = the talk).