From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=unavailable autolearn_force=no version=3.4.4 Path: border1.nntp.dca3.giganews.com!backlog3.nntp.dca3.giganews.com!border3.nntp.dca.giganews.com!border1.nntp.dca.giganews.com!nntp.giganews.com!news.glorb.com!us.feeder.erje.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.fsmpi.rwth-aachen.de!uucp.gnuu.de!newsfeed.arcor.de!newsspool1.arcor-online.net!news.arcor.de.POSTED!not-for-mail Date: Sun, 17 Nov 2013 14:32:55 +0100 From: Georg Bauhaus User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:17.0) Gecko/20130801 Thunderbird/17.0.8 MIME-Version: 1.0 Newsgroups: comp.lang.ada Subject: Re: strange behaviour of utf-8 files References: <73e0853b-454a-467f-9dc7-84ca5b9c29b2@googlegroups.com> <1ghx537y5gbfq.17oazom68d4n6.dlg@40tude.net> <9d00683c-949c-4e88-a161-ebd78b350d39@googlegroups.com> <1w23uq33ul2i8$.wzjpp3evot36.dlg@40tude.net> In-Reply-To: <1w23uq33ul2i8$.wzjpp3evot36.dlg@40tude.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Message-ID: <5288c584$0$6639$9b4e6d93@newsspool2.arcor-online.net> Organization: Arcor NNTP-Posting-Date: 17 Nov 2013 14:32:52 CET NNTP-Posting-Host: b9fe2b65.newsspool2.arcor-online.net X-Trace: DXC=Sc6=_NSlcD7^8FBo0_81f>A9EHlD; 3Yc24Fo<]lROoR18kFejV8[k3<:EhI9Z0_I<\O1g83L> X-Complaints-To: usenet-abuse@arcor.de X-Original-Bytes: 2800 Xref: number.nntp.dca.giganews.com comp.lang.ada:183910 Date: 2013-11-17T14:32:52+01:00 List-Id: On 16.11.13 16:55, Dmitry A. Kazakov wrote: > As I said in order to avoid troubles, don't use anything but ASCII. ASCII-ism is the soil in which dangerous bugs keep many things from working.(*) With an attitude of denial towards encoding basics, would anyone ever approach *numbers* in the same way? I doubt it. The best medication against chronic character FUD is to (a) see how some unambiguous encoding does work everywhere (e.g. the universally supported UTF-16) (**), (b) understand that single units of text and single octets are not in general isomorphic; this leads to bugs just as harmless or harmful as erroneous execution in the presence of not 'Valid, (c) understand that maybe wasting 9 bits of 16 bit characters (or a few bits per octet sequence in UTF-8) is not worth mentioning these days, considering source text. Part (b) will not come to be as long as most programmers are fine thinking that text is always 7bit characters in real life. If, instead, programmers start learning about further bits--- that Character is a type, not an encoding---integrating software will start working better. __ (*) A big one of these ASCII bugs yields Google's infrastructure stuck with Python 2.7. (**) I understand that even the US Navy has officially started using more characters than ASCII. So, can I maintains hopes that GNAT will one day read source files that use UTF-NN, which GNAT does support?