comp.lang.ada
 help / color / mirror / Atom feed
From: Georg Bauhaus <rm.dash-bauhaus@futureapps.de>
Subject: Re: Ada 2012 and Unicode package (UTF-nn encodings handling)
Date: Mon, 23 Aug 2010 12:32:50 +0200
Date: 2010-08-23T12:32:50+02:00	[thread overview]
Message-ID: <4c724e52$0$6775$9b4e6d93@newsspool3.arcor-online.net> (raw)
In-Reply-To: <i4s207$mdl$1@news.eternal-september.org>

On 22.08.10 22:40, J-P. Rosen wrote:
> Le 22/08/2010 21:48, Georg Bauhaus a écrit :
>> On 8/22/10 8:51 PM, J-P. Rosen wrote:
>>
>>> I think you missed the "Encoding" function. The intended usage
>>> (extracted from the !discussion section) is:
>>> 1) Read the first line. Call function Encoding on that line with an
>>>     appropriate default to use if the line does not start with a
>>>     BOM. Initialize the encoding scheme to the value returned by the
>>>     function.
>>
>> Since Ada is an ISO language, is the name BOM for the non-UTF-8
>> thing used by Microsoft actually ISO? (I.e., has it become part of ISO
>> 10646)?
>>
> It's from Unicode. ISO 10646 defines only character encodings
> (code-points).

Uhm, minor nitpicking ; ISO/IEC 10646:2003

"* specifies a multiple byte (one to four) byte transformation
   UTF-8 for use with ISO 646 (ASCII) byte-oriented environments;

"* specifies a two 16-bit form and associated transformation
   UTF-16 for supplementary characters;"

(and LRM A.4.11 seems too mention, IINM.)

Markus Kuhn explains why in POSIX environments UTF-8 files---that
never have a byte order issue---should *not* have a BOM "signature".
It is, therefore, a good thing that Convert/Encode turn off outputting a
"BOM used as signature" byte sequence, since that sequence works on recent
Windows(TM) platforms but creates problems on the ISO standards compliant
platforms.

http://www.cl.cam.ac.uk/~mgk25/unicode.html#ucsutf

"It has also been suggested to use the UTF-8 encoded BOM (0xEF 0xBB 0xBF)
 as a signature to mark the beginning of a UTF-8 file. This practice
 should definitely not be used on POSIX systems for several reasons:

 ..."

Indeed, program source files that use "incorrect" Microsoft UTF-8
signatures do create problems with Eclipse when they are used
with both Windows and GNU/Linux editions of Eclipse.


Georg



  reply	other threads:[~2010-08-23 10:32 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-20 21:38 Ada 2012 and Unicode package (UTF-nn encodings handling) Yannick Duchêne (Hibou57)
2010-08-20 21:41 ` Yannick Duchêne (Hibou57)
2010-08-21  6:21 ` Dmitry A. Kazakov
2010-08-21  7:01 ` J-P. Rosen
2010-08-21  8:12   ` Yannick Duchêne (Hibou57)
2010-08-22 18:51     ` J-P. Rosen
2010-08-22 19:48       ` Georg Bauhaus
2010-08-22 20:40         ` J-P. Rosen
2010-08-23 10:32           ` Georg Bauhaus [this message]
2010-08-23 22:28 ` Randy Brukardt
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox