comp.lang.ada
 help / color / mirror / Atom feed
From: "Yannick Duchêne (Hibou57)" <yannick_duchene@yahoo.fr>
Subject: Ada 2012 and Unicode package (UTF-nn encodings handling)
Date: Fri, 20 Aug 2010 23:38:20 +0200
Date: 2010-08-20T23:38:20+02:00	[thread overview]
Message-ID: <op.vhrad6mjule2fv@garhos> (raw)

Extract from the thread “S-expression I/O in Ada”. Subtopic moved in a  
separate thread for clarity.

Le Wed, 18 Aug 2010 15:16:50 +0200, J-P. Rosen <rosen@adalog.fr> a écrit:
> Slightly OT, but you (and others) might be interested to know that Ada
> 2012 will include string encoding packages to the various UTF-X
> encodings. These will be (are?) provided very soon by GNAT.
>
> See AI05-137-2
> (http://www.ada-auth.org/cgi-bin/cvsweb.cgi/ai05s/ai05-0137-2.txt?rev=1.2)

Time for my stupid question of the day :)

I've noticed this introduction in the last amendment, because Unicode has  
always been an issue/matter for me (actually use my own).

I could not avoid two questions: why no UTF-32 ? (this would not be an  
implementation nightmare) and why BOM handled for each string while BOM is  
to be used at stream/file level ? (see XML or HTML files for example). Or  
are these strings supposed to hold the whole content of a file/stream ?

Quote:
http://www.unicode.org/faq/utf_bom.html
> A: A byte order mark (BOM) consists of the character code U+FEFF at the  
> beginning of a data stream

This is a FAQ at Unicode.org; but all references (Unicode PDF files, XML  
reference, HTTML reference) all says the same.

This matter, because the code point U+FEFF can stands for two different  
things: Zero Width No Break Space or encoding Byte Order Mark. The only  
way to distinguish both usage, is where-it-appears.

If it appears as the first code point of a stream, this is a BOM  
(heuristics may be applied to automatically switch encoding with an  
analysis of the first byte of a stream, this is what I do) ; if this  
appears any where else in a stream, this is a character code point.



             reply	other threads:[~2010-08-20 21:38 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-20 21:38 Yannick Duchêne (Hibou57) [this message]
2010-08-20 21:41 ` Ada 2012 and Unicode package (UTF-nn encodings handling) Yannick Duchêne (Hibou57)
2010-08-21  6:21 ` Dmitry A. Kazakov
2010-08-21  7:01 ` J-P. Rosen
2010-08-21  8:12   ` Yannick Duchêne (Hibou57)
2010-08-22 18:51     ` J-P. Rosen
2010-08-22 19:48       ` Georg Bauhaus
2010-08-22 20:40         ` J-P. Rosen
2010-08-23 10:32           ` Georg Bauhaus
2010-08-23 22:28 ` Randy Brukardt
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox