From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00, REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4 X-Google-Thread: 103376,957580c7ebafc9dd X-Google-Attributes: gid103376,public X-Google-Language: ENGLISH,ASCII-7-bit Path: g2news1.google.com!news4.google.com!news2.volia.net!newsfeed01.sul.t-online.de!t-online.de!newsfeed01.chello.at!newsfeed.arcor.de!news.arcor.de!not-for-mail From: "Dmitry A. Kazakov" Subject: Re: Is there a lex utility for Ada that handles unicode? Newsgroups: comp.lang.ada User-Agent: 40tude_Dialog/2.0.14.1 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Reply-To: mailbox@dmitry-kazakov.de Organization: cbb software GmbH References: <1130433435.410224.186300@g49g2000cwa.googlegroups.com> Date: Thu, 27 Oct 2005 19:57:37 +0200 Message-ID: NNTP-Posting-Date: 27 Oct 2005 19:57:26 MEST NNTP-Posting-Host: a25de247.newsread2.arcor-online.net X-Trace: DXC=e@g;8?EaY]\kVRFeeUa4iQQ5U85hF6f;TjW\KbG]kaMXQ>n?D9BSA]\b?7\m=k6>l[[6LHn;2LCV^[eY X-Complaints-To: abuse@arcor.de Xref: g2news1.google.com comp.lang.ada:6005 Date: 2005-10-27T19:57:26+02:00 List-Id: On 27 Oct 2005 10:17:15 -0700, brian.b.mcguinness@lmco.com wrote: > Is there some equivalent of the lex utility that produces > Ada code rather than C code, and is capable of handling > any character in the Unicode basic code plane? I am > thinking of using it on strings read from a GUI created > with GtkAda, so it would probably be best if it accepted > UTF-8 strings, but I could convert the input to a wide > string if necessary. Why do you wish to convert it to wide? You can parse UTF-8 encoded text as-is. After all that was the idea behind UTF-8. For example, my unit compiler parses directly UTF-8. The advantage is that I can use the same parser for units spelt both in pure ASCII and in full UTF-8. I simply flag UTF-8 tokens from the table if I don't want to recognize them. There is a trick that 8-bit tokes need to be replaced with 2-characters UTF-8 equivalents. But they are rare. BTW, the parser is table-driven, so I don't need lex. For UTF-8 handing in Ada you can take a look at: http://www.dmitry-kazakov.de/ada/strings_edit.htm It and table-driven parsers in Ada are included in components: http://www.dmitry-kazakov.de/ada/components.htm -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de