From: "Yannick Duchêne (Hibou57)" <yannick_duchene@yahoo.fr>
Subject: Re: Ada 2012 and Unicode package (UTF-nn encodings handling)
Date: Sat, 21 Aug 2010 10:12:11 +0200
Date: 2010-08-21T10:12:11+02:00 [thread overview]
Message-ID: <op.vhr3qlivule2fv@garhos> (raw)
In-Reply-To: i4ntld$njs$1@news.eternal-september.org
> I still fail to see the benefit of encoding 31 bits values into 32 bits
> values...
UTF-32 is not formally an encoding format, it would better be referred to
as a matter of Byte order. But this byte order is not system dependent, it
is cross-platform data dependent.
> And even if implementation is not a nightmare, it always has a cost.
> Implementers are reluctant to spend money for features that nobody will
> use. (Wide_Wide_Character was forced on us by ISO).
I suppose the ISO forced the introduction of Wide_Wide_Character because
it is part of the Unicode standard, and as you know, conformance requires
full-conformance. There is no part-of with this, because as soon and it is
defined, this may really have occurrences.
Imagine a web crawler: it would have to be designed with this option in
mind. Designers could not say “We do not feel UTF-32 is useful, our
crawler will then not be offered the capabilities of handling such
documents”.
I just though this was a little pity, if one want to rely on the standard
packages capabilities, then this one will only be able to do it partially.
This would be a bit like Two way linked list without the one way (or the
opposite). A matter of completeness.
> A package provides functionnalities. It should not presume how it is
> used. Since this package is clearly in the "string handling" class, it
> makes sense to handle this with strings.
Right, this is defined in *String*_Encoding.
> For files, the usage is to have a BOM on the first line of the file. The
> way the functions are defined makes it easy to not process the first
> line specially; see the use case in the AI.
I just had a look back at
http://www.ada-auth.org/standards/12aarm/html/AA-A-4-11.html
Only Encode has this capability (via Output_BOM : Boolean). Decode/Convert
has nothing similar and will always skip any 16#FEFF# which will be
interpreted as a BOM instead of as a character (there is nothing like an
Interpret_BOM : Boolean).
But may be I am missing something. Will have a deeper look at it and at
the AI which come with it (I saw UTF-32 was at least “pronounced” during
the talk).
next prev parent reply other threads:[~2010-08-21 8:12 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-08-20 21:38 Ada 2012 and Unicode package (UTF-nn encodings handling) Yannick Duchêne (Hibou57)
2010-08-20 21:41 ` Yannick Duchêne (Hibou57)
2010-08-21 6:21 ` Dmitry A. Kazakov
2010-08-21 7:01 ` J-P. Rosen
2010-08-21 8:12 ` Yannick Duchêne (Hibou57) [this message]
2010-08-22 18:51 ` J-P. Rosen
2010-08-22 19:48 ` Georg Bauhaus
2010-08-22 20:40 ` J-P. Rosen
2010-08-23 10:32 ` Georg Bauhaus
2010-08-23 22:28 ` Randy Brukardt
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox