From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: a07f3367d7,31af760e939556ef X-Google-Attributes: gida07f3367d7,public,usenet X-Google-NewGroupId: yes X-Google-Language: ENGLISH,ASCII-7-bit Path: g2news2.google.com!news4.google.com!feeder.news-service.com!club-internet.fr!feedme-small.clubint.net!news.ecp.fr!news.jacob-sparre.dk!pnx.dk!not-for-mail From: "Randy Brukardt" Newsgroups: comp.lang.ada Subject: Re: Interpretation of extensions different from Unix/Linux? Date: Tue, 18 Aug 2009 15:48:16 -0500 Organization: Jacob Sparre Andersen Message-ID: References: <8a5f3b98-1c5a-4d47-aca7-e106d1223fa9@a26g2000yqn.googlegroups.com> <87skg7952j.fsf@jspa-nykredit.sparre-andersen.dk> <1f999bfa99erz$.9b8p6yymr8x7$.dlg@40tude.net> <6f80c882-fa03-4ca9-a53e-fae34cea160d@b15g2000yqd.googlegroups.com> NNTP-Posting-Host: static-69-95-181-76.mad.choiceone.net X-Trace: munin.nbi.dk 1250628567 13527 69.95.181.76 (18 Aug 2009 20:49:27 GMT) X-Complaints-To: news@jacob-sparre.dk NNTP-Posting-Date: Tue, 18 Aug 2009 20:49:27 +0000 (UTC) X-Priority: 3 X-MSMail-Priority: Normal X-Newsreader: Microsoft Outlook Express 6.00.2900.5512 X-RFC2646: Format=Flowed; Original X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.5579 Xref: g2news2.google.com comp.lang.ada:7853 Date: 2009-08-18T15:48:16-05:00 List-Id: "Adam Beneschan" wrote in message news:6f80c882-fa03-4ca9-a53e-fae34cea160d@b15g2000yqd.googlegroups.com... On Aug 17, 3:28 pm, "Randy Brukardt" wrote: >> The problem here is that String really is not the right type, but since >> you >> can't have string literals for private types in Ada, you can't make it a >> private type. (And if you could have string literals, it still couldn't >> be >> used with the existing I/O packages, it would be way too incompatible.) > >That wouldn't even be an issue if UTF-8 were strictly a "storage >format" as you called it above. If that were the case, you wouldn't >need string literals for it. I think the problem is that UTF-8 is >something of a hybrid. If all characters in the string are in the >32..126 range, the "sequence of octets" stored in the UTF-8 string is >identical to the graphic characters stored in a String. (UTF-8 was >designed purposefully so that would happen.) In cases like that, it >makes sense to use a string literal. Well, the problem here is that it *always* makes sense to use a string literal. That's how you specify what you want in storage in Ada. I think Dmitry's point is that he'd rather always see explicit conversions. The problem is that they don't work well -- exhibit A is unbounded strings. That's especially true for the use-adverse like me. I hate having to write: A_Str := Ada.Strings.Unbounded.To_Unbounded_String ("ABC"); and surely UTF8 would be worse: A_Str := Ada.Strings.Unbounded_UTF_8.To_Unbounded_UTF_8_String ("ABC"); .. >Also, I'm afraid that using String can backfire. If I understand it >correctly, the decision was that the Name parameter of Text_IO.Open >should be interpreted as a UTF-8 octet sequence even though it's a >String. But the intent is to allow string literals. At some point, >though, some poor innocent programmer in Germany or Spain is going to >try to use a string literal (or a Latin-1 string variable) with an >umlaut or an accented vowel in it and get totally screwed up since >those characters don't represent themselves in UTF-8 encoding, and >they'll end up puzzling over how their program created a file with a >Chinese character in the middle of the name. (Yeah, I know, that's >very unlikely; most likely the UTF-8 encoding will simply be invalid.) I've been presuming that UTF-8 encoding started with a BOM or something like that, else you couldn't tell it from regular Latin-1 encoding. It would be hard to insert a BOM into a string literal by accident! But I do agree that this issue needs some discussion. (Also note that a major reason for this package is to make ASIS work; there [as with I/O], we're stuck with existing routines that return Wide_Strings that are not enough to handle all possible text.) Randy. -- Adam