comp.lang.ada
 help / color / mirror / Atom feed
From: Ted Dennison <dennison@telepath.com>
Subject: Re: Announce: OpenToken 2.0 released
Date: 2000/02/01
Date: 2000-02-01T00:00:00+00:00	[thread overview]
Message-ID: <876unj$jcs$1@nnrp1.deja.com> (raw)
In-Reply-To: t7n1plq56s.fsf@calumny.jyacc.com

In article <t7n1plq56s.fsf@calumny.jyacc.com>,
  Hyman Rosen <hymie@prolifics.com> wrote:
> Ted Dennison <dennison@telepath.com> writes:
> > Release 2.0 of OpenToken has now been placed on the website
>
> From a quick look at opentoken.ads, I see a declaration for an
> EOF_Character, set to Ada.Characters.Latin_1.EOT. Does this mean
> that OpenToken cannot parse binary files that happen to contain
> this character? It's a rather odd choice in any case, given that
> no system that I know of uses EOT as an end-of-file marker.

That's the marker that the OpenToken text feeders agree put on text to
indicate that there is no more text to read. If you have to parse text
which contains an EOT, its a simple matter to change EOF_Character to
something else.

As for parsing binaries; to my knowledge OT has not been used that way
before. However, I see only one real inpediment. EOF_Character is used
in OpenToken:
   o  In the line comment recognizer (line comments make no sense in
binaries anyway)
   o  In the Text_IO-based text feeder. Using this feeder also makes no
sense in binaries. You'd want to write one based on Sequential_IO or
something.
   o  In the End_Of_File token recognizer. This also makes no sense for
binaries, as a sentinel character which can be tokenized clearly won't
do the job.
   o  By you the user to make sure you don't attempt to read past the
end of the file after a token analysis or parse returns. In this case,
no problem for binaries exists. You just use a different method to
prevent reading past the end of the file.
   o  In the analyzer to prevent reading past the end of file when
matching a token. This *would* be a problem for you, unless none of your
"binary" tokens span an EOT. My suggestions for working around this
problem are follows:
Modify EOF_Character to be a variable so that it can be set by your
custom text feeder. Set it to some good terminating value normally. This
would be a byte value that cannot be anywhere in a token except at the
end. But when you read the last character from the file, you set it to
that value instead.

A better option with a bit more work would be the following:
Modify the root text_feeder package to have a primitive operation for
returning whether we are at the end of the input. Implement that routine
in your custom text feeder (as well as any others that you may use).
Modify the one line in the Analyzer that checks EOF_Character to intead
call that routine on its text feeder.

Proper binary support is not in OT because it has just never come up
before. But as you can see, it could be modified fairly easily to
support parsing binaries. But using a sentinel character for the end of
file has always seemed like a nice simplification. So what are the uses
of parsing binaries? I kinda thought that binaries are, by their very
nature, already parsed.

--
T.E.D.

http://www.telepath.com/~dennison/Ted/TED.html


Sent via Deja.com http://www.deja.com/
Before you buy.




  reply	other threads:[~2000-02-01  0:00 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2000-01-27  0:00 Announce: OpenToken 2.0 released Ted Dennison
2000-01-28  0:00 ` Jürgen Pfeifer
2000-01-28  0:00   ` Ted Dennison
2000-01-31  0:00 ` Hyman Rosen
2000-02-01  0:00   ` Ted Dennison [this message]
2000-02-01  0:00     ` Hyman Rosen
2000-02-01  0:00       ` David Starner
2000-02-01  0:00         ` Brian Rogoff
2000-02-01  0:00           ` Hyman Rosen
2000-02-01  0:00             ` Brian Rogoff
2000-02-02  0:00               ` Hyman Rosen
2000-02-02  0:00             ` Vladimir Olensky
2000-02-01  0:00               ` Hyman Rosen
2000-02-02  0:00             ` Jeff Carter
2000-02-02  0:00       ` Ted Dennison
2000-02-04  0:00         ` Ted Dennison
2000-02-05  0:00           ` Ehud Lamm
2000-02-04  0:00       ` Florian Weimer
2000-02-07  0:00         ` Hyman Rosen
2000-02-07  0:00           ` Florian Weimer
2000-02-07  0:00             ` Hyman Rosen
2000-02-09  0:00           ` Robert A Duff
2000-02-09  0:00             ` Hyman Rosen
2000-02-09  0:00               ` Larry Kilgallen
2000-02-17  0:00               ` Robert A Duff
2000-02-17  0:00                 ` Hyman Rosen
2000-02-17  0:00                   ` Hyman Rosen
2000-02-17  0:00                     ` Robert A Duff
2000-02-17  0:00                       ` Hyman Rosen
2000-02-17  0:00                   ` Robert A Duff
     [not found]                   ` <88iuk2$s6d3@ftp.kvaerner.com>
2000-03-05  0:00                     ` [OT] C and in-band signalling (was: Re: Announce: OpenToken 2.0 released) Florian Weimer
2000-03-06  0:00                       ` Tarjei T. Jensen
2000-03-06  0:00                         ` Keith Thompson
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox