From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-0.9 required=5.0 tests=BAYES_00,FORGED_GMAIL_RCVD, FREEMAIL_FROM autolearn=no autolearn_force=no version=3.4.4 X-Google-Thread: 103376,fbb47f3d0d553681,start X-Google-NewGroupId: yes X-Google-Attributes: gida07f3367d7,domainid0,public,usenet X-Google-Language: ENGLISH,ASCII-7-bit Path: g2news2.google.com!postnews.google.com!w12g2000yqj.googlegroups.com!not-for-mail From: deadlyhead Newsgroups: comp.lang.ada Subject: [GNAT-specific] Using the Form parameter/-gnatW switch Date: Thu, 24 Jun 2010 19:08:21 -0700 (PDT) Organization: http://groups.google.com Message-ID: <15d17632-7377-4c60-9bb2-35f952300d42@w12g2000yqj.googlegroups.com> NNTP-Posting-Host: 216.57.220.9 Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 X-Trace: posting.google.com 1277431701 14411 127.0.0.1 (25 Jun 2010 02:08:21 GMT) X-Complaints-To: groups-abuse@google.com NNTP-Posting-Date: Fri, 25 Jun 2010 02:08:21 +0000 (UTC) Complaints-To: groups-abuse@google.com Injection-Info: w12g2000yqj.googlegroups.com; posting-host=216.57.220.9; posting-account=snJuNwoAAABnc8T9lYkBlDQrDdSjOjG2 User-Agent: G2/1.0 X-HTTP-UserAgent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 (.NET CLR 3.5.30729),gzip(gfe) Xref: g2news2.google.com comp.lang.ada:12891 Date: 2010-06-24T19:08:21-07:00 List-Id: I've been messing around a bit with files of various encodings, and just recently I've become aware of the Form parameter to Open and Create and the -gnatW switch for handling character encoding. This is a pretty big deal to me. For a long time I've been a bit... frustrated? ... by the fact that the Ada standard specifically gives us Wide_ and Wide_Wide_Characters and their associated strings, but actually _using_ them seemed pretty much worthless. I mean, if you can't actually _talk_ with them to a modern system (UTF-8 or UTF-16 encoding seems to be pretty much the way it goes), what's the point in using them? So I'm pretty happy with using either the WCEM=8 or -gnatW8 methods of setting the encoding to get UTF-8 input and output. What I'm wondering now is can I get other UTF outputs to work? I actually have the peculiar case of dealing with UTF-32 encoded files, which need to be translated to UTF-8 for editing, and back to UTF-32 for machine-use again. It seems that it would be pretty straight-forward to just pull the file in with a straight Wide_Wide_Text_IO.Open/Get_Line system, then output via Wide_Wide_Text_IO.Put on a file where Form => "WCEM=8". So far, though, I'm having trouble since the encoding for GNAT defaults to bracket notation, not binary character dumping. As well, if I want output printed to the terminal in UTF-8, I have to set the -gnatW8 switch, which means that _now_ the default encoding for all unspecified files is UTF-8. Any ideas on how to get around this? And, just for giggles, is it _possible_ to use the Upper_Half encoding "WCEM=u" to encode UTF-16? Or is this something completely different (which it seems it might be, from the little that's said in the GNAT Reference Manual). I'm okay with giving up on this method and using the XML/Ada Unicode libraries for the text translation. It'd be nice if I didn't have to, though.