From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-2.9 required=5.0 tests=BAYES_00,MAILING_LIST_MULTI
	autolearn=unavailable autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,UTF8
X-Google-Thread: 103376,e1bb9627c57b7d5b
X-Google-Attributes: gid103376,public
X-Google-ArrivalTime: 2003-10-09 14:42:59 PST
Path: 
 archiver1.google.com!news1.google.com!newsfeed.stanford.edu!newsmi-us.news.garr.it!NewsITBone-GARR!news.mailgate.org!newsfeed.icl.net!newsfeed.fjserv.net!proxad.net!usenet-fr.net!enst.fr!melchior!cuivre.fr.eu.org!melchior.frmug.org!not-for-mail
From: "Alexandre E. Kopilovitch" <aek@vib.usr.pu.ru>
Newsgroups: comp.lang.ada
Subject: Re: U : Unbounded_String := "bla bla bla"; (was: Is the Writing...)
Date: Fri, 10 Oct 2003 01:35:12 +0400 (MSD)
Organization: Cuivre, Argent, Or
Message-ID: <mailman.56.1065735568.25614.comp.lang.ada@ada-france.org>
References: <3F849B4A.2090008@comcast.net>
NNTP-Posting-Host: lovelace.ada-france.org
Mime-Version: 1.0
Content-Type: text/plain; charset=koi8-r
Content-Transfer-Encoding: 8bit
X-Trace: melchior.cuivre.fr.eu.org 1065735569 32248 80.67.180.195 (9 Oct 2003
 21:39:29 GMT)
X-Complaints-To: usenet@melchior.cuivre.fr.eu.org
NNTP-Posting-Date: Thu, 9 Oct 2003 21:39:29 +0000 (UTC)
To: comp.lang.ada@ada-france.org
Return-Path: <aek@vib.usr.pu.ru>
In-Reply-To: <3F849B4A.2090008@comcast.net>;
	from "Robert I. Eachus" at Wed, 08 Oct 2003 23:18:59 GMT
X-Mailer: Mail/@ [v2.44 MSDOS]
X-Virus-Scanned: by amavisd-new-20030616-p5 (Debian) at ada-france.org
X-BeenThere: comp.lang.ada@ada-france.org
X-Mailman-Version: 2.1.2
Precedence: list
List-Id: Gateway to the comp.lang.ada Usenet newsgroup
	<comp.lang.ada.ada-france.org>
List-Unsubscribe: <http://www.ada-france.org/mailman/listinfo/comp.lang.ada>,
	<mailto:comp.lang.ada-request@ada-france.org?subject=unsubscribe>
List-Post: <mailto:comp.lang.ada@ada-france.org>
List-Help: <mailto:comp.lang.ada-request@ada-france.org?subject=help>
List-Subscribe: <http://www.ada-france.org/mailman/listinfo/comp.lang.ada>,
	<mailto:comp.lang.ada-request@ada-france.org?subject=subscribe>
X-Original-Content-Transfer-Encoding: 8bit
Xref: archiver1.google.com comp.lang.ada:559
Date: 2003-10-10T01:35:12+04:00

Robert I. Eachus" wrote:

> > BTW, when you mentioned Cyrillic_String you made me smiling grimly. Do you
> > know that there are 3 alive Cyrillic encodings? Do you know that, for example,
> > in Windows, the final effect of your Cyrillic encoding depends not only upon
> > encoding, but upon Regional Settings also? And there are plenty of more subtle
> > issues, which may easily hurt you when you deal with a Cyrillic encoding. So,
> > don't fancy that your Cyrillic_String will be of much help, especially if you
> > want to develop a robust product for actual field use.
>
> You are just thinking Russian,

Well, if you add Ukrainian, Bulgarian and Serbian, not mentioning Belarussian
and a bunch of pseudo-Cyrillic languages from Abkhaz to Kazakh (and I can't
even get what is happening with Tatar now: I heard recently that there is even
a legal case in Constitutional court about that - Tatars want Latin-based
alphabet for their language, but federal authorities insist on Cyrillic one
only... if I understood all that properly), the situation probably will not
become better -;)

> there are even more Cyrillic character bindings for other Cyrillic languages.

I see very little potential use for them, particularly in Ada world... other
than mining raw intelligence data from newspapers, emails and websites -;) .
I can't imagine that those nations will use Ada for their accounting purposes...
or even for desktop publishing and for computer games.

> When it comes to multiple 
> representations for one language Japanese is by far the worst!

I'm not sure, though. Yes, Japanese is quite impressive in this regard,
I have seen that in a raw reality (my daughter, being a linguist, had some
correspondense by e-mail with several Japanese girls, and I was called for
decoding and encoding those emails - well, it took some time and effort).
But that is on the surface. When you go deep into real application problems,
the situation may change: I know well that there are subtle and unpleasant
problems with Russian encodings, and I know nothing about Japanese at that
level.

>  But if 
> you don't see it, try this.  In Ada, I can DEFINE a Cyrillic_String type 
> and bind it to one of the variants, and add other string types for other 
> variants, then provide for conversions between them.  The fact that 
> almost all conversions are explicit makes all this possible.  Let me add 
> three types and show you the problem:
>
>   type Unbounded_Cyrillic is new Ada.Strings.Unbounded.Unbounded_String;
>   -- to make sure you don't get confused.  Yeah, I know, in real life
>   -- you should make the derivation private, and provide Cyrillic_String
>   -- versions of some of the operations in Ada.Strings.Unbounded.  Take
>   --- all that as given.
>   type Georgian_String is (...);
>   type Unbounded_Georgian is new Ada.Strings.Unbounded.Unbounded_String;
>   -- same as above.
>
>   In Ada as it is now, I can say:
>
>   Some_String: Unbounded_Cyrillic := To_Unbounded("Македонии");
>   Other_String: Unbounded_Georgian := To_Unbounded("Македонии");
>
> In each case, there is an implicit conversion from the string_literal 
> "Македонии" to the proper string type, then that type is converted to 
> the proper unbounded type.  But if you add additional implicit 
> conversions into the mix, it all falls apart:

Oh, it seems that I see (at last!) what you mean: you assume that conversions
between encodings should be implicit! But this is far from desirable in real
applications!

>   Some_String: Unbounded_Cyrillic := "Македонии";
>
> I hope you don't expect the compiler to guess which set of implicit 
> conversions to apply!  I am certainly not going to try to list all the 
> possibilities, but for example, there is: "Македонии" to String to 
> Unbounded_String to Cyrillic_String.  And yes, in this case, the first 
> conversion would raise Constraint_Error.  But I could choose some other 
> example where all the characters were in both (Latin1) String and 
> Cyrillic_String.  But I don't have to: "Македонии" to Georgian_String to 
> Unbounded_Georgian to Unbounded_String to Cyrillic_String.
>
> Once you introduce new implicit conversions, the compiler is going to 
> have to assume that they may occur anywhere.  If the overloading rules 
> result in only one possible match, great.  But you will find that right 
> now Ada has about as many implicit conversions as it can without 
> creating lots of ambiguous situations.  And yes, there are situations in 
> Ada currently where you have to qualify expressions to avoid ambiguity. 
> The most userul balance point is where everything can be done, and you 
> don't have to qualify expressions too often.

I think that now I understand the difference between our views on the issue.

I understand perfectly that there should not be two competing kinds of implicit
conversions (one between encodings and another between String and Unbounded_String).
So we have to choose between them.

You assumed that implicit conversions between encodings are more natural and
more desirable than implicit conversions between String and Unbounded_String.
My firm opinion is exactly opposite: conversions between encodings should be
explicit as a rule, and they all must be done within the "frontier" layer of
the application; so, I'm quite sure that while such implicit conversions between
encodings may be justified in Visual Basic and sometimes in C++, they are
entirely undesirable for Ada (as a standard feature). At the same time I see
implicit conversions between String and Unbounded_String as very natural and
desirable for real applications.

I don't know the reasons for that your assumption and preference... all I can
say is that my preference is certainly influenced by substantial experience
with strings in real applications, which often involved dealings with various
encodings (although there was not Ada - there were Fortran IV/77, COBOL 66,
several assemblers, PL/1, C/C++, Pascal/Delphi)

> Oh, since I am trying to be fair here, there is one additional implicit 
> conversion that I would love to figure out how to add to the language. 
> (Well, I know how to add it, I just don't think I'll ever get enough 
> interest to make it happen.)  That would be to add some pragmas that 
> allowed  character, string, or numeric literals to private types.  The 
> conversion directly from a character literal to Unbounded_Cyrillic 
> wouldn't break anything.  It also wouldn't help if you had a 
> Cyrillic_String variable to put in an Unbounded_Cyrillic object.

I am not sure that I understand properly what you meant here, but anyway, I
can repeat that literals are very significant, and making possible to have
(non-trivial) literals for private types would be very good thing. For strings
(I mean Unbounded_Strings) this is especially important. It is the primary
need; full-scale implicit conversions between Strings and Unbounded_Strings
are also desirable, but the case of literals is certainly the most important. 


Alexander Kopilovitch                      aek@vib.usr.pu.ru
Saint-Petersburg
Russia