From: Georg Bauhaus <rm.dash-bauhaus@futureapps.de>
Subject: Re: Ada 2012 and Unicode package (UTF-nn encodings handling)
Date: Mon, 23 Aug 2010 12:32:50 +0200
Date: 2010-08-23T12:32:50+02:00 [thread overview]
Message-ID: <4c724e52$0$6775$9b4e6d93@newsspool3.arcor-online.net> (raw)
In-Reply-To: <i4s207$mdl$1@news.eternal-september.org>
On 22.08.10 22:40, J-P. Rosen wrote:
> Le 22/08/2010 21:48, Georg Bauhaus a écrit :
>> On 8/22/10 8:51 PM, J-P. Rosen wrote:
>>
>>> I think you missed the "Encoding" function. The intended usage
>>> (extracted from the !discussion section) is:
>>> 1) Read the first line. Call function Encoding on that line with an
>>> appropriate default to use if the line does not start with a
>>> BOM. Initialize the encoding scheme to the value returned by the
>>> function.
>>
>> Since Ada is an ISO language, is the name BOM for the non-UTF-8
>> thing used by Microsoft actually ISO? (I.e., has it become part of ISO
>> 10646)?
>>
> It's from Unicode. ISO 10646 defines only character encodings
> (code-points).
Uhm, minor nitpicking ; ISO/IEC 10646:2003
"* specifies a multiple byte (one to four) byte transformation
UTF-8 for use with ISO 646 (ASCII) byte-oriented environments;
"* specifies a two 16-bit form and associated transformation
UTF-16 for supplementary characters;"
(and LRM A.4.11 seems too mention, IINM.)
Markus Kuhn explains why in POSIX environments UTF-8 files---that
never have a byte order issue---should *not* have a BOM "signature".
It is, therefore, a good thing that Convert/Encode turn off outputting a
"BOM used as signature" byte sequence, since that sequence works on recent
Windows(TM) platforms but creates problems on the ISO standards compliant
platforms.
http://www.cl.cam.ac.uk/~mgk25/unicode.html#ucsutf
"It has also been suggested to use the UTF-8 encoded BOM (0xEF 0xBB 0xBF)
as a signature to mark the beginning of a UTF-8 file. This practice
should definitely not be used on POSIX systems for several reasons:
..."
Indeed, program source files that use "incorrect" Microsoft UTF-8
signatures do create problems with Eclipse when they are used
with both Windows and GNU/Linux editions of Eclipse.
Georg
next prev parent reply other threads:[~2010-08-23 10:32 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-08-20 21:38 Ada 2012 and Unicode package (UTF-nn encodings handling) Yannick Duchêne (Hibou57)
2010-08-20 21:41 ` Yannick Duchêne (Hibou57)
2010-08-21 6:21 ` Dmitry A. Kazakov
2010-08-21 7:01 ` J-P. Rosen
2010-08-21 8:12 ` Yannick Duchêne (Hibou57)
2010-08-22 18:51 ` J-P. Rosen
2010-08-22 19:48 ` Georg Bauhaus
2010-08-22 20:40 ` J-P. Rosen
2010-08-23 10:32 ` Georg Bauhaus [this message]
2010-08-23 22:28 ` Randy Brukardt
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox