From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00, REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4 Path: border1.nntp.dca3.giganews.com!backlog3.nntp.dca3.giganews.com!border3.nntp.dca.giganews.com!border1.nntp.dca.giganews.com!nntp.giganews.com!goblin2!goblin.stu.neva.ru!aioe.org!.POSTED!not-for-mail From: "Dmitry A. Kazakov" Newsgroups: comp.lang.ada Subject: Re: strange behaviour of utf-8 files Date: Mon, 18 Nov 2013 10:01:33 +0100 Organization: cbb software GmbH Message-ID: References: <73e0853b-454a-467f-9dc7-84ca5b9c29b2@googlegroups.com> <1ghx537y5gbfq.17oazom68d4n6.dlg@40tude.net> <9d00683c-949c-4e88-a161-ebd78b350d39@googlegroups.com> <1w23uq33ul2i8$.wzjpp3evot36.dlg@40tude.net> <5288c584$0$6639$9b4e6d93@newsspool2.arcor-online.net> <52891372$0$6636$9b4e6d93@newsspool2.arcor-online.net> <10ec0vuld83fy.1t7bduzwsrfe.dlg@40tude.net> <5289d1e7$0$6643$9b4e6d93@newsspool2.arcor-online.net> Reply-To: mailbox@dmitry-kazakov.de NNTP-Posting-Host: IenaDxMXK2hi7fvYcb+MlQ.user.speranza.aioe.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Complaints-To: abuse@aioe.org User-Agent: 40tude_Dialog/2.0.15.1 X-Notice: Filtered by postfilter v. 0.8.2 X-Original-Bytes: 2509 Xref: number.nntp.dca.giganews.com comp.lang.ada:183919 Date: 2013-11-18T10:01:33+01:00 List-Id: On Mon, 18 Nov 2013 09:38:06 +0100, Georg Bauhaus wrote: > On 17.11.13 21:38, Dmitry A. Kazakov wrote: >> The problem is that the common part (ASCII) is sufficient for Ada >> programming while the varying part is subtle enough to cause difficult to >> detect bugs in string literals. Bugs that cannot be detected by the >> compiler. > > UTF-8 can actually be so checked (and is checked by typical implementations) 1. The share of illegal UTF-8 sequences is negligible. The one among Ada programs is even less than that. 2. Latin1 sequences are all legal. Now, carefully observe that the program in question was dealt with as if it were encoded in Latin1. So much for your theory. --------------- P.S. In order to make a point you should take a set of legal [and practical] Ada programs encoded in X and then reinterpreted in Y. Then you compare how many of them become: 1. illegal 2. remain legal keeping the semantics 3. remain legal breaking the semantics The last case is the worst possible scenario, which the OP experienced. P.P.S. Also important when dealing with the issue of keeping it sane ASCII, Ada provides a standard package that defines Latin1 characters: Characters.Latin_1 (RM A.3.3) -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de