From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-2.9 required=5.0 tests=BAYES_00,MAILING_LIST_MULTI autolearn=unavailable autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,UTF8 X-Google-Thread: 103376,e1bb9627c57b7d5b X-Google-Attributes: gid103376,public X-Google-ArrivalTime: 2003-10-09 14:42:59 PST Path: archiver1.google.com!news1.google.com!newsfeed.stanford.edu!newsmi-us.news.garr.it!NewsITBone-GARR!news.mailgate.org!newsfeed.icl.net!newsfeed.fjserv.net!proxad.net!usenet-fr.net!enst.fr!melchior!cuivre.fr.eu.org!melchior.frmug.org!not-for-mail From: "Alexandre E. Kopilovitch" Newsgroups: comp.lang.ada Subject: Re: U : Unbounded_String := "bla bla bla"; (was: Is the Writing...) Date: Fri, 10 Oct 2003 01:35:12 +0400 (MSD) Organization: Cuivre, Argent, Or Message-ID: References: <3F849B4A.2090008@comcast.net> NNTP-Posting-Host: lovelace.ada-france.org Mime-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Transfer-Encoding: 8bit X-Trace: melchior.cuivre.fr.eu.org 1065735569 32248 80.67.180.195 (9 Oct 2003 21:39:29 GMT) X-Complaints-To: usenet@melchior.cuivre.fr.eu.org NNTP-Posting-Date: Thu, 9 Oct 2003 21:39:29 +0000 (UTC) To: comp.lang.ada@ada-france.org Return-Path: In-Reply-To: <3F849B4A.2090008@comcast.net>; from "Robert I. Eachus" at Wed, 08 Oct 2003 23:18:59 GMT X-Mailer: Mail/@ [v2.44 MSDOS] X-Virus-Scanned: by amavisd-new-20030616-p5 (Debian) at ada-france.org X-BeenThere: comp.lang.ada@ada-france.org X-Mailman-Version: 2.1.2 Precedence: list List-Id: Gateway to the comp.lang.ada Usenet newsgroup List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , X-Original-Content-Transfer-Encoding: 8bit Xref: archiver1.google.com comp.lang.ada:559 Date: 2003-10-10T01:35:12+04:00 Robert I. Eachus" wrote: > > BTW, when you mentioned Cyrillic_String you made me smiling grimly. Do you > > know that there are 3 alive Cyrillic encodings? Do you know that, for example, > > in Windows, the final effect of your Cyrillic encoding depends not only upon > > encoding, but upon Regional Settings also? And there are plenty of more subtle > > issues, which may easily hurt you when you deal with a Cyrillic encoding. So, > > don't fancy that your Cyrillic_String will be of much help, especially if you > > want to develop a robust product for actual field use. > > You are just thinking Russian, Well, if you add Ukrainian, Bulgarian and Serbian, not mentioning Belarussian and a bunch of pseudo-Cyrillic languages from Abkhaz to Kazakh (and I can't even get what is happening with Tatar now: I heard recently that there is even a legal case in Constitutional court about that - Tatars want Latin-based alphabet for their language, but federal authorities insist on Cyrillic one only... if I understood all that properly), the situation probably will not become better -;) > there are even more Cyrillic character bindings for other Cyrillic languages. I see very little potential use for them, particularly in Ada world... other than mining raw intelligence data from newspapers, emails and websites -;) . I can't imagine that those nations will use Ada for their accounting purposes... or even for desktop publishing and for computer games. > When it comes to multiple > representations for one language Japanese is by far the worst! I'm not sure, though. Yes, Japanese is quite impressive in this regard, I have seen that in a raw reality (my daughter, being a linguist, had some correspondense by e-mail with several Japanese girls, and I was called for decoding and encoding those emails - well, it took some time and effort). But that is on the surface. When you go deep into real application problems, the situation may change: I know well that there are subtle and unpleasant problems with Russian encodings, and I know nothing about Japanese at that level. > But if > you don't see it, try this. In Ada, I can DEFINE a Cyrillic_String type > and bind it to one of the variants, and add other string types for other > variants, then provide for conversions between them. The fact that > almost all conversions are explicit makes all this possible. Let me add > three types and show you the problem: > > type Unbounded_Cyrillic is new Ada.Strings.Unbounded.Unbounded_String; > -- to make sure you don't get confused. Yeah, I know, in real life > -- you should make the derivation private, and provide Cyrillic_String > -- versions of some of the operations in Ada.Strings.Unbounded. Take > --- all that as given. > type Georgian_String is (...); > type Unbounded_Georgian is new Ada.Strings.Unbounded.Unbounded_String; > -- same as above. > > In Ada as it is now, I can say: > > Some_String: Unbounded_Cyrillic := To_Unbounded("Македонии"); > Other_String: Unbounded_Georgian := To_Unbounded("Македонии"); > > In each case, there is an implicit conversion from the string_literal > "Македонии" to the proper string type, then that type is converted to > the proper unbounded type. But if you add additional implicit > conversions into the mix, it all falls apart: Oh, it seems that I see (at last!) what you mean: you assume that conversions between encodings should be implicit! But this is far from desirable in real applications! > Some_String: Unbounded_Cyrillic := "Македонии"; > > I hope you don't expect the compiler to guess which set of implicit > conversions to apply! I am certainly not going to try to list all the > possibilities, but for example, there is: "Македонии" to String to > Unbounded_String to Cyrillic_String. And yes, in this case, the first > conversion would raise Constraint_Error. But I could choose some other > example where all the characters were in both (Latin1) String and > Cyrillic_String. But I don't have to: "Македонии" to Georgian_String to > Unbounded_Georgian to Unbounded_String to Cyrillic_String. > > Once you introduce new implicit conversions, the compiler is going to > have to assume that they may occur anywhere. If the overloading rules > result in only one possible match, great. But you will find that right > now Ada has about as many implicit conversions as it can without > creating lots of ambiguous situations. And yes, there are situations in > Ada currently where you have to qualify expressions to avoid ambiguity. > The most userul balance point is where everything can be done, and you > don't have to qualify expressions too often. I think that now I understand the difference between our views on the issue. I understand perfectly that there should not be two competing kinds of implicit conversions (one between encodings and another between String and Unbounded_String). So we have to choose between them. You assumed that implicit conversions between encodings are more natural and more desirable than implicit conversions between String and Unbounded_String. My firm opinion is exactly opposite: conversions between encodings should be explicit as a rule, and they all must be done within the "frontier" layer of the application; so, I'm quite sure that while such implicit conversions between encodings may be justified in Visual Basic and sometimes in C++, they are entirely undesirable for Ada (as a standard feature). At the same time I see implicit conversions between String and Unbounded_String as very natural and desirable for real applications. I don't know the reasons for that your assumption and preference... all I can say is that my preference is certainly influenced by substantial experience with strings in real applications, which often involved dealings with various encodings (although there was not Ada - there were Fortran IV/77, COBOL 66, several assemblers, PL/1, C/C++, Pascal/Delphi) > Oh, since I am trying to be fair here, there is one additional implicit > conversion that I would love to figure out how to add to the language. > (Well, I know how to add it, I just don't think I'll ever get enough > interest to make it happen.) That would be to add some pragmas that > allowed character, string, or numeric literals to private types. The > conversion directly from a character literal to Unbounded_Cyrillic > wouldn't break anything. It also wouldn't help if you had a > Cyrillic_String variable to put in an Unbounded_Cyrillic object. I am not sure that I understand properly what you meant here, but anyway, I can repeat that literals are very significant, and making possible to have (non-trivial) literals for private types would be very good thing. For strings (I mean Unbounded_Strings) this is especially important. It is the primary need; full-scale implicit conversions between Strings and Unbounded_Strings are also desirable, but the case of literals is certainly the most important. Alexander Kopilovitch aek@vib.usr.pu.ru Saint-Petersburg Russia