From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: 103376,e4abd14106db0029 X-Google-NewGroupId: yes X-Google-Attributes: gida07f3367d7,domainid0,public,usenet X-Google-Language: ENGLISH,UTF8 Path: g2news1.google.com!news3.google.com!feeder3.cambriumusenet.nl!feed.tweaknews.nl!87.79.20.105.MISMATCH!news.netcologne.de!ramfeed1.netcologne.de!newsfeed.arcor.de!newsspool2.arcor-online.net!news.arcor.de.POSTED!not-for-mail Date: Mon, 23 Aug 2010 12:32:50 +0200 From: Georg Bauhaus User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2.8) Gecko/20100802 Thunderbird/3.1.2 MIME-Version: 1.0 Newsgroups: comp.lang.ada Subject: Re: Ada 2012 and Unicode package (UTF-nn encodings handling) References: <4c717f18$0$7652$9b4e6d93@newsspool1.arcor-online.net> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Message-ID: <4c724e52$0$6775$9b4e6d93@newsspool3.arcor-online.net> Organization: Arcor NNTP-Posting-Date: 23 Aug 2010 12:32:50 CEST NNTP-Posting-Host: 7a7af91c.newsspool3.arcor-online.net X-Trace: DXC=YI[RYkFXOIPMcF=Q^Z^V3X4Fo<]lROoRQ8kFZLh>_cHTX3j]HZmf5EP9;=S X-Complaints-To: usenet-abuse@arcor.de Xref: g2news1.google.com comp.lang.ada:13663 Date: 2010-08-23T12:32:50+02:00 List-Id: On 22.08.10 22:40, J-P. Rosen wrote: > Le 22/08/2010 21:48, Georg Bauhaus a écrit : >> On 8/22/10 8:51 PM, J-P. Rosen wrote: >> >>> I think you missed the "Encoding" function. The intended usage >>> (extracted from the !discussion section) is: >>> 1) Read the first line. Call function Encoding on that line with an >>> appropriate default to use if the line does not start with a >>> BOM. Initialize the encoding scheme to the value returned by the >>> function. >> >> Since Ada is an ISO language, is the name BOM for the non-UTF-8 >> thing used by Microsoft actually ISO? (I.e., has it become part of ISO >> 10646)? >> > It's from Unicode. ISO 10646 defines only character encodings > (code-points). Uhm, minor nitpicking ; ISO/IEC 10646:2003 "* specifies a multiple byte (one to four) byte transformation UTF-8 for use with ISO 646 (ASCII) byte-oriented environments; "* specifies a two 16-bit form and associated transformation UTF-16 for supplementary characters;" (and LRM A.4.11 seems too mention, IINM.) Markus Kuhn explains why in POSIX environments UTF-8 files---that never have a byte order issue---should *not* have a BOM "signature". It is, therefore, a good thing that Convert/Encode turn off outputting a "BOM used as signature" byte sequence, since that sequence works on recent Windows(TM) platforms but creates problems on the ISO standards compliant platforms. http://www.cl.cam.ac.uk/~mgk25/unicode.html#ucsutf "It has also been suggested to use the UTF-8 encoded BOM (0xEF 0xBB 0xBF) as a signature to mark the beginning of a UTF-8 file. This practice should definitely not be used on POSIX systems for several reasons: ..." Indeed, program source files that use "incorrect" Microsoft UTF-8 signatures do create problems with Eclipse when they are used with both Windows and GNU/Linux editions of Eclipse. Georg