From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII X-Google-Thread: 103376,1086bab45b40d4b0 X-Google-Attributes: gid103376,public Path: controlnews3.google.com!news1.google.com!news.glorb.com!news-stoc.telia.net!news-stoa.telia.net!telia.net!masternews.telia.net.!newsb.telia.net.POSTED!not-for-mail From: =?ISO-8859-1?Q?Bj=F6rn_Persson?= User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.1) Gecko/20031114 X-Accept-Language: sv, en-us MIME-Version: 1.0 Newsgroups: comp.lang.ada Subject: Re: UTF-8 in strings - a bug? References: <200456-112553-85684@foorum.com> In-Reply-To: <200456-112553-85684@foorum.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable Message-ID: Date: Thu, 06 May 2004 17:13:49 GMT NNTP-Posting-Host: 217.209.116.179 X-Complaints-To: abuse@telia.com X-Trace: newsb.telia.net 1083863629 217.209.116.179 (Thu, 06 May 2004 19:13:49 CEST) NNTP-Posting-Date: Thu, 06 May 2004 19:13:49 CEST Organization: Telia Internet Xref: controlnews3.google.com comp.lang.ada:327 Date: 2004-05-06T17:13:49+00:00 List-Id: Ludovic Brenta wrote: > You can learn=20 > about which encoding is currently in effect using the getlocale(3)=20 > library call. My understanding from the manpages is that you must first call=20 setlocale(LC_ALL, "") to import the locale settings from the environment = into the program, and then you call either nl_langinfo or localeconv to=20 get information about the locale. I don't seem to have a manpage for=20 getlocale. > I am not aware of a thick binding to either getlocale or iconv (both=20 > are in glibc). If you write such a binding, it would be nice to make=20 > it GMGPL. There are lots of things I'd want to write. And now I can't stop=20 thinking about how such a binding might be written ... :-/ > In the general case, though, you do not necessarily have to transcode=20 > unless you want to manipulate the string data with algorithms that=20 > depend on the internal encoding. Of course. I just wish the OS interface wouldn't use String when the=20 encoding is undefined. Better define a type System_String or something,=20 and state explicitly that this type contains strings in whatever=20 encoding is used in the environment. > GtkAda does this explicitly with a separate type, UTF8_String. That's good. What bothers me is when String is used for anything so you=20 don't know what you really have in your strings. The C programmers can=20 keep that kind of confusion to themselves. Separate types is clearly the = way to go. --=20 Bj=F6rn Persson jor ers @sv ge. b n_p son eri nu