From: "Yannick Duchêne (Hibou57)" <yannick_duchene@yahoo.fr>
Subject: Ada 2012 and Unicode package (UTF-nn encodings handling)
Date: Fri, 20 Aug 2010 23:38:20 +0200
Date: 2010-08-20T23:38:20+02:00 [thread overview]
Message-ID: <op.vhrad6mjule2fv@garhos> (raw)
Extract from the thread “S-expression I/O in Ada”. Subtopic moved in a
separate thread for clarity.
Le Wed, 18 Aug 2010 15:16:50 +0200, J-P. Rosen <rosen@adalog.fr> a écrit:
> Slightly OT, but you (and others) might be interested to know that Ada
> 2012 will include string encoding packages to the various UTF-X
> encodings. These will be (are?) provided very soon by GNAT.
>
> See AI05-137-2
> (http://www.ada-auth.org/cgi-bin/cvsweb.cgi/ai05s/ai05-0137-2.txt?rev=1.2)
Time for my stupid question of the day :)
I've noticed this introduction in the last amendment, because Unicode has
always been an issue/matter for me (actually use my own).
I could not avoid two questions: why no UTF-32 ? (this would not be an
implementation nightmare) and why BOM handled for each string while BOM is
to be used at stream/file level ? (see XML or HTML files for example). Or
are these strings supposed to hold the whole content of a file/stream ?
Quote:
http://www.unicode.org/faq/utf_bom.html
> A: A byte order mark (BOM) consists of the character code U+FEFF at the
> beginning of a data stream
This is a FAQ at Unicode.org; but all references (Unicode PDF files, XML
reference, HTTML reference) all says the same.
This matter, because the code point U+FEFF can stands for two different
things: Zero Width No Break Space or encoding Byte Order Mark. The only
way to distinguish both usage, is where-it-appears.
If it appears as the first code point of a stream, this is a BOM
(heuristics may be applied to automatically switch encoding with an
analysis of the first byte of a stream, this is what I do) ; if this
appears any where else in a stream, this is a character code point.
next reply other threads:[~2010-08-20 21:38 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-08-20 21:38 Yannick Duchêne (Hibou57) [this message]
2010-08-20 21:41 ` Ada 2012 and Unicode package (UTF-nn encodings handling) Yannick Duchêne (Hibou57)
2010-08-21 6:21 ` Dmitry A. Kazakov
2010-08-21 7:01 ` J-P. Rosen
2010-08-21 8:12 ` Yannick Duchêne (Hibou57)
2010-08-22 18:51 ` J-P. Rosen
2010-08-22 19:48 ` Georg Bauhaus
2010-08-22 20:40 ` J-P. Rosen
2010-08-23 10:32 ` Georg Bauhaus
2010-08-23 22:28 ` Randy Brukardt
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox