From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00,
	REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4
X-Google-Thread: 103376,957580c7ebafc9dd
X-Google-Attributes: gid103376,public
X-Google-Language: ENGLISH,ASCII-7-bit
Path: 
 g2news1.google.com!news4.google.com!news2.volia.net!newsfeed01.sul.t-online.de!t-online.de!newsfeed01.chello.at!newsfeed.arcor.de!news.arcor.de!not-for-mail
From: "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de>
Subject: Re: Is there a lex utility for Ada that handles unicode?
Newsgroups: comp.lang.ada
User-Agent: 40tude_Dialog/2.0.14.1
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Reply-To: mailbox@dmitry-kazakov.de
Organization: cbb software GmbH
References: <1130433435.410224.186300@g49g2000cwa.googlegroups.com>
Date: Thu, 27 Oct 2005 19:57:37 +0200
Message-ID: <ndfasdmq13aa$.gi0uh66a0di5$.dlg@40tude.net>
NNTP-Posting-Date: 27 Oct 2005 19:57:26 MEST
NNTP-Posting-Host: a25de247.newsread2.arcor-online.net
X-Trace: 
 DXC=e@g;8?EaY]\kVRFeeUa4iQQ5U85hF6f;TjW\KbG]kaMXQ>n?D9BSA]\b?7\m=k6>l[[6LHn;2LCV^[<mhadbfdU[o_b^L7Nf>eY
X-Complaints-To: abuse@arcor.de
Xref: g2news1.google.com comp.lang.ada:6005
Date: 2005-10-27T19:57:26+02:00
List-Id: <comp.lang.ada>

On 27 Oct 2005 10:17:15 -0700, brian.b.mcguinness@lmco.com wrote:

> Is there some equivalent of the lex utility that produces
> Ada code rather than C code, and is capable of handling
> any character in the Unicode basic code plane?  I am
> thinking of using it on strings read from a GUI created
> with GtkAda, so it would probably be best if it accepted
> UTF-8 strings, but I could convert the input to a wide
> string if necessary.

Why do you wish to convert it to wide? You can parse UTF-8 encoded text
as-is. After all that was the idea behind UTF-8. For example, my unit
compiler parses directly UTF-8. The advantage is that I can use the same
parser for units spelt both in pure ASCII and in full UTF-8. I simply flag
UTF-8 tokens from the table if I don't want to recognize them. There is a
trick that 8-bit tokes need to be replaced with 2-characters UTF-8
equivalents. But they are rare. BTW, the parser is table-driven, so I don't
need lex.

For UTF-8 handing in Ada you can take a look at:
http://www.dmitry-kazakov.de/ada/strings_edit.htm

It and table-driven parsers in Ada are included in components:
http://www.dmitry-kazakov.de/ada/components.htm

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de