From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM autolearn=ham autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,ece5a18e6179c51a X-Google-Attributes: gid103376,public X-Google-ArrivalTime: 2003-10-23 14:54:59 PST Path: archiver1.google.com!news2.google.com!news.maxwell.syr.edu!newsfeed.mathworks.com!wn13feed!worldnet.att.net!204.127.198.203!attbi_feed3!attbi_feed4!attbi.com!attbi_s54.POSTED!not-for-mail Message-ID: <3F984DDD.9040808@comcast.net> From: "Robert I. Eachus" User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.0.2) Gecko/20021120 Netscape/7.01 X-Accept-Language: en-us, en MIME-Version: 1.0 Newsgroups: comp.lang.ada Subject: Re: Ada, Gnat and Unicode References: <5d6fdb61.0310230648.62219442@posting.google.com> <3F97F83A.6060103@comcast.net> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit NNTP-Posting-Host: 24.34.139.183 X-Complaints-To: abuse@comcast.net X-Trace: attbi_s54 1066946097 24.34.139.183 (Thu, 23 Oct 2003 21:54:57 GMT) NNTP-Posting-Date: Thu, 23 Oct 2003 21:54:57 GMT Organization: Comcast Online Date: Thu, 23 Oct 2003 21:54:58 GMT Xref: archiver1.google.com comp.lang.ada:1551 Date: 2003-10-23T21:54:58+00:00 List-Id: Jano wrote: > Robert I. Eachus dice... > As you may be seeing now, I want to scan a folder and transform the > filenames into UTF8. That's fine for me which know that I'm getting > Latin1 encoded strings from the Directory_Operations package, and any > metadata entered by the user. But I was wondering what would happen to a > Chinese user (not that I foresee any usage of my program in wide > deployment, but when faced with the problem one *must* know ;) Remember my advice about canonicalization. If you get Unicode or UTF-8 file names from the OS, they may or may not be in a canonical form. If not, get the OS to do it for you. And of course, this information is OS specific. You won't really care what the OS's definition of canonical form is, just whether the strings you are getting are in that form, and if not how to call the OS to do that. >>Look again, in the GNAT Users Guide for "Foreign Language Representation." > > Correct me, that refers to source representation? (I had missed it > anyway ^_^) Yes, it refers to source representation, but if you think about it for a second, the source representation of non-Latin1 characters is an issue for Character and String literals. Otherwise the compiler doesn't care what Character type you use in your program. > (Of course if my program were to be translated, that applies. I'm not so > concerned about this but I should have been clearer). -- Robert I. Eachus "Quality is the Buddha. Quality is scientific reality. Quality is the goal of Art. It remains to work these concepts into a practical, down-to-earth context, and for this there is nothing more practical or down-to-earth than what I have been talking about all along...the repair of an old motorcycle." -- from Zen and the Art of Motorcycle Maintenance by Robert Pirsig