From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,URI_HEX autolearn=no
	autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: 103376,ece5a18e6179c51a
X-Google-Attributes: gid103376,public
X-Google-ArrivalTime: 2003-10-24 08:09:31 PST
Path: 
 archiver1.google.com!news2.google.com!fu-berlin.de!uni-berlin.de!77144-cm.able.ES!not-for-mail
From: Jano <nono@celes.unizar.es>
Newsgroups: comp.lang.ada
Subject: Re: Ada, Gnat and Unicode
Date: Fri, 24 Oct 2003 17:09:24 +0200
Message-ID: <MPG.1a035f0a212ec08898977d@News.CIS.DFN.DE>
References: <5d6fdb61.0310230648.62219442@posting.google.com>
 <3F97F83A.6060103@comcast.net> <MPG.1a0230a0729e3ce8989778@News.CIS.DFN.DE>
 <3F984DDD.9040808@comcast.net>
NNTP-Posting-Host: 77144-cm.able.es (212.97.177.144)
X-Trace: news.uni-berlin.de 1067008170 33223495 212.97.177.144 (16 [49872])
X-Newsreader: MicroPlanet Gravity v2.50
Xref: archiver1.google.com comp.lang.ada:1605
Date: 2003-10-24T17:09:24+02:00
List-Id: <comp.lang.ada>

Robert I. Eachus dice...
> Jano wrote:
> > Robert I. Eachus dice...
> 
> > As you may be seeing now, I want to scan a folder and transform the 
> > filenames into UTF8. That's fine for me which know that I'm getting 
> > Latin1 encoded strings from the Directory_Operations package, and any 
> > metadata entered by the user. But I was wondering what would happen to a 
> > Chinese user (not that I foresee any usage of my program in wide 
> > deployment, but when faced with the problem one *must* know ;)
> 
> Remember my advice about canonicalization.  If you get Unicode or UTF-8 
> file names from the OS, they may or may not be in a canonical form.  If 
> not, get the OS to do it for you.  And of course, this information is OS 
> specific. You won't really care what the OS's definition of canonical 
> form is, just whether the strings you are getting are in that form, and 
> if not how to call the OS to do that.

Ok, I see. In the end that's the outcome I didn't want to hear but the 
one I expected.

> Yes, it refers to source representation, but if you think about it for a 
> second, the source representation of non-Latin1 characters is an issue 
> for Character and String literals.  Otherwise the compiler doesn't care 
> what Character type you use in your program.

I was referring to that too :)

Thanks!

-- 
-------------------------
Jano
402450.at.cepsz.unizar.es
-------------------------