From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,1086bab45b40d4b0 X-Google-Attributes: gid103376,public Path: controlnews3.google.com!news1.google.com!news.glorb.com!fr.ip.ndsoftware.net!nerim.net!news.tiscali.fr!foorum!not-for-mail From: Ludovic Brenta Newsgroups: comp.lang.ada Subject: Re: UTF-8 in strings - a bug? Date: 06 May 2004 09:25:53 GMT Message-ID: <200456-112553-85684@foorum.com> References: NNTP-Posting-Host: 212.190.145.10 NNTP-Posting-Date: 06 May 2004 09:25:53 GMT X-Complaints-To: abuse@foorum.fr X-POSTER: foorum.com X-Foorum_user_id: X-Foorum_user_tmp_id: 200454-93313-705548-212.190.145.10-foorum X-Originating-User: 212.190.145.10 X-Newsreader: Foorum Xref: controlnews3.google.com comp.lang.ada:314 Date: 2004-05-06T09:25:53+00:00 List-Id: Bjorn Persson wrote: > Recompiling is not a workable solution. The encoding isn't known > until run time. Software is frequently distributed in precompiled > form you know, and the users may use many different encodings. It > might even be that different users on the same system use different > encodings. So I guess a transcoding library will have to be wrapped > around Ada.Command_Line, and probably around > Ada.Command_Line.Environment and the standard input, output and > error files too. You are correct: the encoding depends not only on the operating system but also on the particular user who runs the software. You can learn about which encoding is currently in effect using the getlocale(3) library call. glibc also has transcoding facilities, which you can import into your Ada program; the most powerful and general one is iconv. I am not aware of a thick binding to either getlocale or iconv (both are in glibc). If you write such a binding, it would be nice to make it GMGPL. In the general case, though, you do not necessarily have to transcode unless you want to manipulate the string data with algorithms that depend on the internal encoding. Whenever your program interacts with GTK+, it must use UTF-8 as the internal encoding. Even if you don't use GTK+, I'd recommend you use gettext for all user-visible strings and store them in UTF-8 in .po file(s). There is a thick binding to Gettext as part of GtkAda, FWIW. So, I would personally depart from the Ada standard in this respect, and declare that all Strings are in UTF-8, both internally and externally. GtkAda does this explicitly with a separate type, UTF8_String. -- Ludovic Brenta. -- Use our news server 'news.foorum.com' from anywhere. More details at: http://nnrpinfo.go.foorum.com/