From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM
	autolearn=ham autolearn_force=no version=3.4.4
X-Google-Thread: 103376,5bcc293dc5642650
X-Google-NewGroupId: yes
X-Google-Attributes: gida07f3367d7,domainid0,public,usenet
X-Google-Language: ENGLISH,UTF8
Received: by 10.204.136.216 with SMTP id s24mr53981bkt.5.1318906770927;
        Mon, 17 Oct 2011 19:59:30 -0700 (PDT)
Path: 
 l23ni12082bkv.0!nntp.google.com!news1.google.com!goblin2!goblin.stu.neva.ru!aioe.org!.POSTED!not-for-mail
From: =?utf-8?Q?Yannick_Duch=C3=AAne_=28Hibou57?=
 =?utf-8?Q?=29?=
 <yannick_duchene@yahoo.fr>
Newsgroups: comp.lang.ada
Subject: Re: Why no Ada.Wide_Directories?
Date: Tue, 18 Oct 2011 04:59:28 +0200
Organization: Ada @ Home
Message-ID: <op.v3i09evqule2fv@index.ici>
References: <9937871.172.1318575525468.JavaMail.geo-discussion-forums@prib32>
 <418b8140-fafb-442f-b91c-e22cc47f8adb@y22g2000pri.googlegroups.com>
 <j7i6va$nso$1@munin.nbi.dk>
NNTP-Posting-Host: KHj9AOPOidgt0YptnGtG5g.user.speranza.aioe.org
Mime-Version: 1.0
X-Complaints-To: abuse@aioe.org
User-Agent: Opera Mail/11.51 (Linux)
X-Notice: Filtered by postfilter v. 0.8.2
Xref: news1.google.com comp.lang.ada:18533
Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes
Content-Transfer-Encoding: Quoted-Printable
Date: 2011-10-18T04:59:28+02:00
List-Id: <comp.lang.ada>

Le Mon, 17 Oct 2011 23:33:28 +0200, Randy Brukardt <randy@rrsoftware.com=
>  =

a =C3=A9crit:
> Say what?
>
> Ada.Strings.Encoding (new in Ada 2012) uses a subtype of String to sto=
re
> UTF-8 encoded strings.

*Please, note the following in just personal opinion* (just want to tell=
  =

what I feel, don't expect to hurt any one)

Every one know and noticed, while this is still confusing =E2=80=9Cbytes=
 and  =

character=E2=80=9D like C did. Eiffel had an implementation of UTF-8 str=
ing, which  =

was different than the default ASCII string, and you could not access  =

bytes from it, there was proper encapsulation and type check. It happene=
d  =

I used a similar abstraction in a tiny Ada application.

Unless it is required there is a BOM at the beginning of each UTF-8  =

string, and this BOM is required to always be checked --- will have to  =

check the new RM, but feel the answer is No ---, confusing both types in=
to  =

a single one is not that clean --- even if the answer was Yes, this woul=
d  =

only be dynamic check, and not static check. I feel it is more an  =

implementation trick (which was indeed intended by the design of UTF-8  =

targeting some hardly solvable context), than a clean formalization.

Try to iterate over an element of type String. What did you get if it is=
 a  =

proper ISO 8859-1 srtring ? You get Characters. What did you get if it i=
s  =

UTF-8 ? You get garbage and =E2=80=9Crandom who-know-what-it-is=E2=80=9D=
, =E2=80=A6 _and the type  =

system does not catch it_ (*), while it is is one of its primarily inten=
t.

By the way, if ISO/ANSI string and UTF-8 strings are the same, then what=
  =

is Wide_Character ? Unicode Basic Plan or UTF-16LE or UTF-16BE or guess =
?

This will not break Ada values to the eye of most people (**), but I  =

believe these and some other people noticed the same.

(*) Both types are not even structurally compatible.

(**) That's a library design flaw, not a language flaw! The difference  =

between both, is that if a library part is not strongly tight into the  =

language definition like IO attributes or finalization behaviors are, on=
e  =

always has the provision to work it around using its own library. But  =

still lost the interest of a standard library.

-- =

=E2=80=9CSyntactic sugar causes cancer of the semi-colons.=E2=80=9D  [Ep=
igrams on  =

Programming =E2=80=94 Alan J. =E2=80=94 P. Yale University]
=E2=80=9CStructured Programming supports the law of the excluded muddle.=
=E2=80=9D [Idem]
Java: Write once, Never revisit