From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00
	autolearn=unavailable autolearn_force=no version=3.4.4
Path: 
 border1.nntp.dca3.giganews.com!backlog3.nntp.dca3.giganews.com!border3.nntp.dca.giganews.com!border1.nntp.dca.giganews.com!nntp.giganews.com!news.glorb.com!us.feeder.erje.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.fsmpi.rwth-aachen.de!uucp.gnuu.de!newsfeed.arcor.de!newsspool1.arcor-online.net!news.arcor.de.POSTED!not-for-mail
Date: Sun, 17 Nov 2013 14:32:55 +0100
From: Georg Bauhaus <rm.dash-bauhaus@futureapps.de>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7;
 rv:17.0) Gecko/20130801 Thunderbird/17.0.8
MIME-Version: 1.0
Newsgroups: comp.lang.ada
Subject: Re: strange behaviour of utf-8 files
References: <73e0853b-454a-467f-9dc7-84ca5b9c29b2@googlegroups.com>
 <1ghx537y5gbfq.17oazom68d4n6.dlg@40tude.net>
 <9d00683c-949c-4e88-a161-ebd78b350d39@googlegroups.com>
 <1w23uq33ul2i8$.wzjpp3evot36.dlg@40tude.net>
In-Reply-To: <1w23uq33ul2i8$.wzjpp3evot36.dlg@40tude.net>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Message-ID: <5288c584$0$6639$9b4e6d93@newsspool2.arcor-online.net>
Organization: Arcor
NNTP-Posting-Date: 17 Nov 2013 14:32:52 CET
NNTP-Posting-Host: b9fe2b65.newsspool2.arcor-online.net
X-Trace: DXC=Sc6=_NSlcD7^8FBo0_81f>A9EHlD; 3Yc24Fo<]lROoR18kF<OcfhCO; _2XE;
 UOVIG?PCY\c7>ejV8[k3<:EhI9Z0_I<\O1g83L>
X-Complaints-To: usenet-abuse@arcor.de
X-Original-Bytes: 2800
Xref: number.nntp.dca.giganews.com comp.lang.ada:183910
Date: 2013-11-17T14:32:52+01:00
List-Id: <comp.lang.ada>

On 16.11.13 16:55, Dmitry A. Kazakov wrote:
> As I said in order to avoid troubles, don't use anything but ASCII.

ASCII-ism is the soil in which dangerous bugs keep many things
from working.(*)

With an attitude of denial towards encoding basics, would anyone
ever approach *numbers* in the same way?  I doubt it.

The best medication against chronic character FUD is to

(a) see how some unambiguous encoding does work everywhere
     (e.g. the universally supported UTF-16)  (**),
(b) understand that single units of text and single octets
     are not in general isomorphic; this leads to bugs just
     as harmless or harmful as erroneous execution in the
     presence of not 'Valid,
(c) understand that maybe wasting 9 bits of 16 bit characters
     (or a few bits per octet sequence in UTF-8)
     is not worth mentioning these days, considering source text.

Part (b) will not come to be as long as most programmers are
fine thinking that text is always 7bit characters in real life.
If, instead, programmers start learning about further bits---
that Character is a type, not an encoding---integrating software
will start working better.

__
(*) A big one of these ASCII bugs yields Google's infrastructure
     stuck with Python 2.7.
(**) I understand that even the US Navy has officially started
     using more characters than ASCII. So, can I maintains hopes
     that GNAT will one day read source files that use UTF-NN, which
     GNAT does support?