From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: 103376,36b39757d8f8763e X-Google-NewGroupId: yes X-Google-Attributes: gida07f3367d7,domainid0,public,usenet X-Google-Language: ENGLISH,ASCII-7-bit Received: by 10.68.135.231 with SMTP id pv7mr5887332pbb.8.1327359347474; Mon, 23 Jan 2012 14:55:47 -0800 (PST) Path: lh20ni216488pbb.0!nntp.google.com!news1.google.com!postnews.google.com!y10g2000vbn.googlegroups.com!not-for-mail From: mtrenkmann Newsgroups: comp.lang.ada Subject: Re: Does OpenToken support Unicode Date: Mon, 23 Jan 2012 14:48:55 -0800 (PST) Organization: http://groups.google.com Message-ID: References: <2652647e-ef0a-4440-b127-4ddc59620707@4g2000yqu.googlegroups.com> <82vcpgf1zl.fsf@stephe-leake.org> NNTP-Posting-Host: 217.50.230.90 Mime-Version: 1.0 X-Trace: posting.google.com 1327359347 31164 127.0.0.1 (23 Jan 2012 22:55:47 GMT) X-Complaints-To: groups-abuse@google.com NNTP-Posting-Date: Mon, 23 Jan 2012 22:55:47 +0000 (UTC) Complaints-To: groups-abuse@google.com Injection-Info: y10g2000vbn.googlegroups.com; posting-host=217.50.230.90; posting-account=SkT_rQoAAADdG_K0wArhYj2acj1b3Kbm User-Agent: G2/1.0 X-Google-Web-Client: true X-Google-Header-Order: HNKRAUELSC X-HTTP-UserAgent: Mozilla/5.0 (X11; U; Linux x86_64; en-US) AppleWebKit/534.13 (KHTML, like Gecko) Chrome/9.0.597.45 Safari/534.13,gzip(gfe) Content-Type: text/plain; charset=ISO-8859-1 Date: 2012-01-23T14:48:55-08:00 List-Id: Just for closing this thread, here is what I have done. Beginning at the Text_Feeder level I changed all occurences of Character/String variables that are involved in storing parsing data (buffers, lexemes, etc) to the Wide_Wide_Character/Wide_Wide_String type. Then I provided a derivation of Text_Feeder that read UTF-8 (multibyte) characters from Ada.Text_IO and decode them into Wide_Wide_Characters. The decoding is currently based on System.WCh_Con (GNAT). As mentioned by Stephe I also tried to implement a generic solution regarding the character type, but that wasn't completely possible. For instance in the top-level OpenToken package there are constants for EOL and EOF that are of type Character. Text_Feeder.Text_IO uses Ada.Text_IO.Get_Line which is not generic. Furthermore, as far as I know, Ada exceptions cannot carry Wide_Wide_Strings to report the lexemes of unexpected tokens ... To support constants and non-generic Ada procedures one has to turn them into formal parameters of generic OpenToken packages, right? Maybe this could end in an generics instantiation nightmare. This let me come to the question why in Ada are some packages prefixed with Wide_Wide_ and not generic. (Sorry for this question, but a come from the C++ universe.) Ok, thanks again for your previous hints. If there is any interest I will provide the modified OpenToken code with UTF-8 support after finishing my thesis. -- Martin