From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Thread: 103376,e4abd14106db0029
X-Google-NewGroupId: yes
X-Google-Attributes: gida07f3367d7,domainid0,public,usenet
X-Google-Language: ENGLISH,UTF8
Path: 
 g2news1.google.com!news3.google.com!feeder3.cambriumusenet.nl!feed.tweaknews.nl!87.79.20.105.MISMATCH!news.netcologne.de!ramfeed1.netcologne.de!newsfeed.arcor.de!newsspool2.arcor-online.net!news.arcor.de.POSTED!not-for-mail
Date: Mon, 23 Aug 2010 12:32:50 +0200
From: Georg Bauhaus <rm.dash-bauhaus@futureapps.de>
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US;
 rv:1.9.2.8) Gecko/20100802 Thunderbird/3.1.2
MIME-Version: 1.0
Newsgroups: comp.lang.ada
Subject: Re: Ada 2012 and Unicode package (UTF-nn encodings handling)
References: <op.vhrad6mjule2fv@garhos>
 <i4ntld$njs$1@news.eternal-september.org> <op.vhr3qlivule2fv@garhos>
 <i4rrje$qrt$1@news.eternal-september.org>
 <4c717f18$0$7652$9b4e6d93@newsspool1.arcor-online.net>
 <i4s207$mdl$1@news.eternal-september.org>
In-Reply-To: <i4s207$mdl$1@news.eternal-september.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Message-ID: <4c724e52$0$6775$9b4e6d93@newsspool3.arcor-online.net>
Organization: Arcor
NNTP-Posting-Date: 23 Aug 2010 12:32:50 CEST
NNTP-Posting-Host: 7a7af91c.newsspool3.arcor-online.net
X-Trace: 
 DXC=YI<j1i6AdFP@>[RYkFXOIPMcF=Q^Z^V3X4Fo<]lROoRQ8kF<OcfhCO[d875<NSW4B^nc\616M64>ZLh>_cHTX3j]HZmf5EP9;=S
X-Complaints-To: usenet-abuse@arcor.de
Xref: g2news1.google.com comp.lang.ada:13663
Date: 2010-08-23T12:32:50+02:00
List-Id: <comp.lang.ada>

On 22.08.10 22:40, J-P. Rosen wrote:
> Le 22/08/2010 21:48, Georg Bauhaus a écrit :
>> On 8/22/10 8:51 PM, J-P. Rosen wrote:
>>
>>> I think you missed the "Encoding" function. The intended usage
>>> (extracted from the !discussion section) is:
>>> 1) Read the first line. Call function Encoding on that line with an
>>>     appropriate default to use if the line does not start with a
>>>     BOM. Initialize the encoding scheme to the value returned by the
>>>     function.
>>
>> Since Ada is an ISO language, is the name BOM for the non-UTF-8
>> thing used by Microsoft actually ISO? (I.e., has it become part of ISO
>> 10646)?
>>
> It's from Unicode. ISO 10646 defines only character encodings
> (code-points).

Uhm, minor nitpicking ; ISO/IEC 10646:2003

"* specifies a multiple byte (one to four) byte transformation
   UTF-8 for use with ISO 646 (ASCII) byte-oriented environments;

"* specifies a two 16-bit form and associated transformation
   UTF-16 for supplementary characters;"

(and LRM A.4.11 seems too mention, IINM.)

Markus Kuhn explains why in POSIX environments UTF-8 files---that
never have a byte order issue---should *not* have a BOM "signature".
It is, therefore, a good thing that Convert/Encode turn off outputting a
"BOM used as signature" byte sequence, since that sequence works on recent
Windows(TM) platforms but creates problems on the ISO standards compliant
platforms.

http://www.cl.cam.ac.uk/~mgk25/unicode.html#ucsutf

"It has also been suggested to use the UTF-8 encoded BOM (0xEF 0xBB 0xBF)
 as a signature to mark the beginning of a UTF-8 file. This practice
 should definitely not be used on POSIX systems for several reasons:

 ..."

Indeed, program source files that use "incorrect" Microsoft UTF-8
signatures do create problems with Eclipse when they are used
with both Windows and GNU/Linux editions of Eclipse.


Georg