From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,URI_HEX autolearn=no autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,ece5a18e6179c51a X-Google-Attributes: gid103376,public X-Google-ArrivalTime: 2003-10-24 08:09:31 PST Path: archiver1.google.com!news2.google.com!fu-berlin.de!uni-berlin.de!77144-cm.able.ES!not-for-mail From: Jano Newsgroups: comp.lang.ada Subject: Re: Ada, Gnat and Unicode Date: Fri, 24 Oct 2003 17:09:24 +0200 Message-ID: References: <5d6fdb61.0310230648.62219442@posting.google.com> <3F97F83A.6060103@comcast.net> <3F984DDD.9040808@comcast.net> NNTP-Posting-Host: 77144-cm.able.es (212.97.177.144) X-Trace: news.uni-berlin.de 1067008170 33223495 212.97.177.144 (16 [49872]) X-Newsreader: MicroPlanet Gravity v2.50 Xref: archiver1.google.com comp.lang.ada:1605 Date: 2003-10-24T17:09:24+02:00 List-Id: Robert I. Eachus dice... > Jano wrote: > > Robert I. Eachus dice... > > > As you may be seeing now, I want to scan a folder and transform the > > filenames into UTF8. That's fine for me which know that I'm getting > > Latin1 encoded strings from the Directory_Operations package, and any > > metadata entered by the user. But I was wondering what would happen to a > > Chinese user (not that I foresee any usage of my program in wide > > deployment, but when faced with the problem one *must* know ;) > > Remember my advice about canonicalization. If you get Unicode or UTF-8 > file names from the OS, they may or may not be in a canonical form. If > not, get the OS to do it for you. And of course, this information is OS > specific. You won't really care what the OS's definition of canonical > form is, just whether the strings you are getting are in that form, and > if not how to call the OS to do that. Ok, I see. In the end that's the outcome I didn't want to hear but the one I expected. > Yes, it refers to source representation, but if you think about it for a > second, the source representation of non-Latin1 characters is an issue > for Character and String literals. Otherwise the compiler doesn't care > what Character type you use in your program. I was referring to that too :) Thanks! -- ------------------------- Jano 402450.at.cepsz.unizar.es -------------------------