From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: 103376,5bcc293dc5642650 X-Google-NewGroupId: yes X-Google-Attributes: gida07f3367d7,domainid0,public,usenet X-Google-Language: ENGLISH,ASCII-7-bit Received: by 10.68.31.73 with SMTP id y9mr8812478pbh.0.1319060593497; Wed, 19 Oct 2011 14:43:13 -0700 (PDT) MIME-Version: 1.0 Path: d5ni33370pbc.0!nntp.google.com!news2.google.com!goblin1!goblin2!goblin.stu.neva.ru!newsfeed.x-privat.org!news.jacob-sparre.dk!pnx.dk!jacob-sparre.dk!ada-dk.org!.POSTED!not-for-mail From: "Randy Brukardt" Newsgroups: comp.lang.ada Subject: Re: Why no Ada.Wide_Directories? Date: Wed, 19 Oct 2011 16:43:08 -0500 Organization: Jacob Sparre Andersen Research & Innovation Message-ID: References: <9937871.172.1318575525468.JavaMail.geo-discussion-forums@prib32> <418b8140-fafb-442f-b91c-e22cc47f8adb@y22g2000pri.googlegroups.com> <7156122c-b63f-487e-ad1b-0edcc6694a7a@u10g2000prl.googlegroups.com> <409c81ab-bd54-493b-beb4-a0cca99ec306@p27g2000prp.googlegroups.com> NNTP-Posting-Host: static-69-95-181-76.mad.choiceone.net X-Trace: munin.nbi.dk 1319060591 6923 69.95.181.76 (19 Oct 2011 21:43:11 GMT) X-Complaints-To: news@jacob-sparre.dk NNTP-Posting-Date: Wed, 19 Oct 2011 21:43:11 +0000 (UTC) X-Priority: 3 X-MSMail-Priority: Normal X-Newsreader: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109 X-RFC2646: Format=Flowed; Original Xref: news2.google.com comp.lang.ada:14098 Date: 2011-10-19T16:43:08-05:00 List-Id: "Dmitry A. Kazakov" wrote in message news:a3j4wzrhrj65$.bkkht9t97w84.dlg@40tude.net... > On Tue, 18 Oct 2011 08:02:31 -0700 (PDT), Adam Beneschan wrote: > >> On the other hand, I was confused by your statement >> "Ada.Character.Handling.To_Upper breaks UTF-8". > > When String X contains UTF-8 encoded text (means: Character'Pos = octet > value), then To_Upper (X) would yield garbage for some texts. You should have just said: When String X contains UTF-8 encoded text (means: Character'Pos = octet value), then virtually all existing string operations will yield garbage for some texts. The only way to safely use a UTF-8 string is opaquely, which means you can store it whole, but any operation on it is performed after decoding it. That's of course the best argument for having it be a separate type. The problem is that Ada doesn't have any reasonable way to define conversions for that type (and having long-winded conversion functions with long winded names like "Ada.Strings.Unbounded.To_Unbounded_String" don't count in my view). And there is just enough need to treat these things as arrays-of-bytes (slicing is needed for storage of variable length UTF-8 strings in "plain Ada", for one example) that treating them as "opaque" isn't ideal. Randy.