comp.lang.ada
 help / color / mirror / Atom feed
* Ada 2012 and Unicode package (UTF-nn encodings handling)
@ 2010-08-20 21:38 Yannick Duchêne (Hibou57)
  2010-08-20 21:41 ` Yannick Duchêne (Hibou57)
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Yannick Duchêne (Hibou57) @ 2010-08-20 21:38 UTC (permalink / raw)


Extract from the thread “S-expression I/O in Ada”. Subtopic moved in a  
separate thread for clarity.

Le Wed, 18 Aug 2010 15:16:50 +0200, J-P. Rosen <rosen@adalog.fr> a écrit:
> Slightly OT, but you (and others) might be interested to know that Ada
> 2012 will include string encoding packages to the various UTF-X
> encodings. These will be (are?) provided very soon by GNAT.
>
> See AI05-137-2
> (http://www.ada-auth.org/cgi-bin/cvsweb.cgi/ai05s/ai05-0137-2.txt?rev=1.2)

Time for my stupid question of the day :)

I've noticed this introduction in the last amendment, because Unicode has  
always been an issue/matter for me (actually use my own).

I could not avoid two questions: why no UTF-32 ? (this would not be an  
implementation nightmare) and why BOM handled for each string while BOM is  
to be used at stream/file level ? (see XML or HTML files for example). Or  
are these strings supposed to hold the whole content of a file/stream ?

Quote:
http://www.unicode.org/faq/utf_bom.html
> A: A byte order mark (BOM) consists of the character code U+FEFF at the  
> beginning of a data stream

This is a FAQ at Unicode.org; but all references (Unicode PDF files, XML  
reference, HTTML reference) all says the same.

This matter, because the code point U+FEFF can stands for two different  
things: Zero Width No Break Space or encoding Byte Order Mark. The only  
way to distinguish both usage, is where-it-appears.

If it appears as the first code point of a stream, this is a BOM  
(heuristics may be applied to automatically switch encoding with an  
analysis of the first byte of a stream, this is what I do) ; if this  
appears any where else in a stream, this is a character code point.



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-08-23 22:28 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-20 21:38 Ada 2012 and Unicode package (UTF-nn encodings handling) Yannick Duchêne (Hibou57)
2010-08-20 21:41 ` Yannick Duchêne (Hibou57)
2010-08-21  6:21 ` Dmitry A. Kazakov
2010-08-21  7:01 ` J-P. Rosen
2010-08-21  8:12   ` Yannick Duchêne (Hibou57)
2010-08-22 18:51     ` J-P. Rosen
2010-08-22 19:48       ` Georg Bauhaus
2010-08-22 20:40         ` J-P. Rosen
2010-08-23 10:32           ` Georg Bauhaus
2010-08-23 22:28 ` Randy Brukardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox