From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM
	autolearn=ham autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: 103376,ece5a18e6179c51a
X-Google-Attributes: gid103376,public
X-Google-ArrivalTime: 2003-10-23 14:54:59 PST
Path: 
 archiver1.google.com!news2.google.com!news.maxwell.syr.edu!newsfeed.mathworks.com!wn13feed!worldnet.att.net!204.127.198.203!attbi_feed3!attbi_feed4!attbi.com!attbi_s54.POSTED!not-for-mail
Message-ID: <3F984DDD.9040808@comcast.net>
From: "Robert I. Eachus" <rieachus@comcast.net>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US;
 rv:1.0.2) Gecko/20021120 Netscape/7.01
X-Accept-Language: en-us, en
MIME-Version: 1.0
Newsgroups: comp.lang.ada
Subject: Re: Ada, Gnat and Unicode
References: <5d6fdb61.0310230648.62219442@posting.google.com>
 <3F97F83A.6060103@comcast.net> <MPG.1a0230a0729e3ce8989778@News.CIS.DFN.DE>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
NNTP-Posting-Host: 24.34.139.183
X-Complaints-To: abuse@comcast.net
X-Trace: attbi_s54 1066946097 24.34.139.183 (Thu, 23 Oct 2003 21:54:57 GMT)
NNTP-Posting-Date: Thu, 23 Oct 2003 21:54:57 GMT
Organization: Comcast Online
Date: Thu, 23 Oct 2003 21:54:58 GMT
Xref: archiver1.google.com comp.lang.ada:1551
Date: 2003-10-23T21:54:58+00:00
List-Id: <comp.lang.ada>

Jano wrote:
> Robert I. Eachus dice...

> As you may be seeing now, I want to scan a folder and transform the 
> filenames into UTF8. That's fine for me which know that I'm getting 
> Latin1 encoded strings from the Directory_Operations package, and any 
> metadata entered by the user. But I was wondering what would happen to a 
> Chinese user (not that I foresee any usage of my program in wide 
> deployment, but when faced with the problem one *must* know ;)

Remember my advice about canonicalization.  If you get Unicode or UTF-8 
file names from the OS, they may or may not be in a canonical form.  If 
not, get the OS to do it for you.  And of course, this information is OS 
specific. You won't really care what the OS's definition of canonical 
form is, just whether the strings you are getting are in that form, and 
if not how to call the OS to do that.

>>Look again, in the GNAT Users Guide for "Foreign Language Representation."
>  
> Correct me, that refers to source representation? (I had missed it 
> anyway ^_^)

Yes, it refers to source representation, but if you think about it for a 
second, the source representation of non-Latin1 characters is an issue 
for Character and String literals.  Otherwise the compiler doesn't care 
what Character type you use in your program.

> (Of course if my program were to be translated, that applies. I'm not so 
> concerned about this but I should have been clearer).

-- 
                                                     Robert I. Eachus

"Quality is the Buddha. Quality is scientific reality. Quality is the 
goal of Art. It remains to work these concepts into a practical, 
down-to-earth context, and for this there is nothing more practical or 
down-to-earth than what I have been talking about all along...the repair 
of an old motorcycle."  -- from Zen and the Art of Motorcycle 
Maintenance by Robert Pirsig