From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: 103376,36b39757d8f8763e X-Google-NewGroupId: yes X-Google-Attributes: gida07f3367d7,domainid0,public,usenet X-Google-Language: ENGLISH,ASCII-7-bit Received: by 10.68.135.231 with SMTP id pv7mr7338691pbb.8.1327412857027; Tue, 24 Jan 2012 05:47:37 -0800 (PST) Path: lh20ni218793pbb.0!nntp.google.com!news1.google.com!npeer03.iad.highwinds-media.com!news.highwinds-media.com!feed-me.highwinds-media.com!post02.iad.highwinds-media.com!news.flashnewsgroups.com-b7.4zTQh5tI3A!not-for-mail From: Stephen Leake Newsgroups: comp.lang.ada Subject: Re: Does OpenToken support Unicode References: <2652647e-ef0a-4440-b127-4ddc59620707@4g2000yqu.googlegroups.com> <82vcpgf1zl.fsf@stephe-leake.org> Date: Tue, 24 Jan 2012 08:47:51 -0500 Message-ID: <824nvlfbzs.fsf@stephe-leake.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (windows-nt) Cancel-Lock: sha1:mww+EsIHhEvmk06gcqljmMi9FPk= MIME-Version: 1.0 X-Complaints-To: abuse@flashnewsgroups.com Organization: FlashNewsgroups.com X-Trace: d1cbc4f1eb678e029e66106869 Content-Type: text/plain; charset=us-ascii Date: 2012-01-24T08:47:51-05:00 List-Id: mtrenkmann writes: > Just for closing this thread, here is what I have done. Thanks for the update. > Beginning at the Text_Feeder level I changed all occurences of > Character/String variables that are involved in storing parsing data > (buffers, lexemes, etc) to the Wide_Wide_Character/Wide_Wide_String > type. > > Then I provided a derivation of Text_Feeder that read UTF-8 > (multibyte) characters from Ada.Text_IO and decode them into > Wide_Wide_Characters. The decoding is currently based on > System.WCh_Con (GNAT). > > As mentioned by Stephe I also tried to implement a generic solution > regarding the character type, but that wasn't completely possible. For > instance in the top-level OpenToken package there are constants for > EOL and EOF that are of type Character. Yes, that's an annoying hack. You could try moving them down lower. > Text_Feeder.Text_IO uses Ada.Text_IO.Get_Line which is not generic. You'd have to write a generic wrapper for Ada.Text_IO. That might be useful in other contexts, but it is a lot of work. > Furthermore, as far as I know, Ada exceptions cannot carry > Wide_Wide_Strings to report the lexemes of unexpected tokens ... True, but they can carry UTF-8. > To support constants and non-generic Ada procedures one has to turn > them into formal parameters of generic OpenToken packages, right? Right. > Maybe this could end in an generics instantiation nightmare. Well, complicated anyway :). > This let me come to the question why in Ada are some packages prefixed > with Wide_Wide_ and not generic. (Sorry for this question, but a come > from the C++ universe.) Good point. For example, Elementary_Functions is generic, and instantiations are provided for the various float types. There may be a problem with the functions that convert to other string types, but those could be moved to child packages. -- -- Stephe