* Text parsing package
@ 2011-03-22 23:34 Syntax Issues
2011-03-23 3:01 ` Shark8
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: Syntax Issues @ 2011-03-22 23:34 UTC (permalink / raw)
I have just finished a simple text parsing package. If anyone is
interested I can post the code (only about 160~ lines).
Example of how its used:
with
Parsing,
Ada.Text_Io,
Ada.Strings.Unbounded;
use
Ada.Text_Io,
Ada.Strings.Unbounded;
procedure Test_Parsing
is
begin
Parsing.Open("Test.txt");
Put_Line(Float'Image(Parsing.Next_Float));
Put_Line(Parsing.Next_String);
Put_Line(Parsing.Next_String);
Put_Line(Parsing.Next_String);
Put_Line(To_String(Parsing.Next_Unbounded_String));
Put_Line(Integer'Image(Parsing.Next_Integer));
Put_Line(Float'Image(Parsing.Next_Float));
Parsing.Close;
end Test_Parsing;
-- Test.txt --
152.15 Test!
TesT
c
ca4
4
12.9
-- End of file --
-- Output --
1.52150E+02
Test!
TesT
c
ca4
4
1.29000E+01
-- End of output --
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Text parsing package
2011-03-22 23:34 Text parsing package Syntax Issues
@ 2011-03-23 3:01 ` Shark8
2011-03-23 6:29 ` Alex Mentis
` (2 subsequent siblings)
3 siblings, 0 replies; 10+ messages in thread
From: Shark8 @ 2011-03-23 3:01 UTC (permalink / raw)
Nice.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Text parsing package
2011-03-22 23:34 Text parsing package Syntax Issues
2011-03-23 3:01 ` Shark8
@ 2011-03-23 6:29 ` Alex Mentis
2011-03-23 6:36 ` J-P. Rosen
2011-03-23 8:32 ` Dmitry A. Kazakov
3 siblings, 0 replies; 10+ messages in thread
From: Alex Mentis @ 2011-03-23 6:29 UTC (permalink / raw)
Syntax Issues wrote:
> I have just finished a simple text parsing package. If anyone is
> interested I can post the code (only about 160~ lines).
>
>
> Example of how its used:
>
> with
> Parsing,
> Ada.Text_Io,
> Ada.Strings.Unbounded;
> use
> Ada.Text_Io,
> Ada.Strings.Unbounded;
> procedure Test_Parsing
> is
> begin
> Parsing.Open("Test.txt");
> Put_Line(Float'Image(Parsing.Next_Float));
> Put_Line(Parsing.Next_String);
> Put_Line(Parsing.Next_String);
> Put_Line(Parsing.Next_String);
> Put_Line(To_String(Parsing.Next_Unbounded_String));
> Put_Line(Integer'Image(Parsing.Next_Integer));
> Put_Line(Float'Image(Parsing.Next_Float));
> Parsing.Close;
> end Test_Parsing;
> -- Test.txt --
> 152.15 Test!
> TesT
> c
>
> ca4
>
> 4
>
> 12.9
> -- End of file --
>
> -- Output --
> 1.52150E+02
> Test!
> TesT
> c
> ca4
> 4
> 1.29000E+01
> -- End of output --
Sure, I'd be interested in seeing the source. Does it only work with
files, or can it be used with standard input, too?
Alex
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Text parsing package
2011-03-22 23:34 Text parsing package Syntax Issues
2011-03-23 3:01 ` Shark8
2011-03-23 6:29 ` Alex Mentis
@ 2011-03-23 6:36 ` J-P. Rosen
2011-03-23 8:32 ` Dmitry A. Kazakov
3 siblings, 0 replies; 10+ messages in thread
From: J-P. Rosen @ 2011-03-23 6:36 UTC (permalink / raw)
Le 23/03/2011 00:34, Syntax Issues a �crit :
> I have just finished a simple text parsing package. If anyone is
> interested I can post the code (only about 160~ lines).
> [...]
Hmmm... out of curiosity, what does your package add to what's offered
by Text_IO ?
--
---------------------------------------------------------
J-P. Rosen (rosen@adalog.fr)
Adalog a d�m�nag� / Adalog has moved:
2 rue du Docteur Lombard, 92441 Issy-les-Moulineaux CEDEX
Tel: +33 1 45 29 21 52, Fax: +33 1 45 29 25 00
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Text parsing package
2011-03-22 23:34 Text parsing package Syntax Issues
` (2 preceding siblings ...)
2011-03-23 6:36 ` J-P. Rosen
@ 2011-03-23 8:32 ` Dmitry A. Kazakov
2011-03-23 11:19 ` Syntax Issues
2011-03-28 0:15 ` Yannick Duchêne (Hibou57)
3 siblings, 2 replies; 10+ messages in thread
From: Dmitry A. Kazakov @ 2011-03-23 8:32 UTC (permalink / raw)
On Tue, 22 Mar 2011 16:34:48 -0700 (PDT), Syntax Issues wrote:
> I have just finished a simple text parsing package.
Congratulations. What are you parsing? CSV?
> If anyone is interested I can post the code (only about 160~ lines).
Some notes to parsing techniques:
1. Don't use unbounded strings. That is an unnecessary overhead.
2. When parsing something you should have a kind of syntax error handling.
Exceptions with error location information is IMO the best choice.
3. As others have mentioned, it is a good idea to abstract the source
formats in order to be able to parse files, strings, streams etc.
4. Encoding issues is a related issue to the above. If you have that source
abstraction layer, you can deal everything Unicode, transcoding things
there and keeping the parser agnostic to encoding.
5. The state of the parser should be encapsulated in an object. Otherwise
you won't be able to reenter the parser or to make a recursively descent
one.
6. You should decide what drives the parser. In your case it is the caller.
That is not a good idea in most cases, because the caller rarely knows what
to expect next. A better choice is semantic call-backs from the parser to
the caller. Abstract primitive operations is IMO the best implementation of
such callbacks.
7. Usually parser is a middleman. It means that you should consider how to
shape the intermediate results of parsing, e.g. the AST. Ada pools are very
nice to keep that stuff in an arena.
--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Text parsing package
2011-03-23 8:32 ` Dmitry A. Kazakov
@ 2011-03-23 11:19 ` Syntax Issues
2011-03-28 0:15 ` Yannick Duchêne (Hibou57)
1 sibling, 0 replies; 10+ messages in thread
From: Syntax Issues @ 2011-03-23 11:19 UTC (permalink / raw)
Thanks for posting the techniques Dmitry. I am going to try
implementing as many as I can -- I am still a fairly new programmer,
and I still have a lot to learn.
Right now it uses text_io and strings.unbounded.text_io (this is
probably bad :/) and only works with files. I tried to make it as
simple as possible with exception checking -- global booleans are set
and checked by the running program.
-- Spec
with
Ada.Exceptions,
Ada.Text_Io,
Ada.Strings.Unbounded.Text_Io,
Ada.Strings.Unbounded;
use
Ada.Exceptions,
Ada.Text_Io,
Ada.Strings.Unbounded.Text_Io,
Ada.Strings.Unbounded;
package Parsing
is
---------------
-- Constants --
---------------
DEBUGGING_ON : constant Boolean := true;
EXCEPTION_PARSE_NOTHING : constant String :=
"Attempted to parse passed the end-of-file.";
EXCEPTION_PARSE_UNOPENED_FILE : constant String :=
"Attempted to parse an unopened file.";
EXCEPTION_PARSE_NON_NUMBER : constant String := "Failed
to parse a number.";
NULL_STRING_FIXED : constant String := "";
NULL_STRING_UNBOUNDED : constant Unbounded_String :=
To_Unbounded_String(NULL_STRING_FIXED);
--------------
-- Packages --
--------------
package Io_Integer
is new Ada.Text_Io.Integer_Io(Integer);
package Io_Float
is new Ada.Text_Io.Float_Io(Float);
---------------
-- Variables --
---------------
Error_On_Recent_Operation : Boolean := false;
Error_Occured_Parsing_File : Boolean := false;
Line : Unbounded_String :=
NULL_STRING_UNBOUNDED;
File : File_Type;
-----------------
-- Subprograms --
-----------------
procedure Open
(Name : in String);
pragma Inline(Open);
procedure Close;
pragma Inline(Close);
function Next_Unbounded_String
return Unbounded_String;
function Next_String
return String;
pragma Inline(Next_String);
function Next_Integer
return Integer;
function Next_Float
return Float;
end Parsing;
--- Body
package body Parsing
is
--
-- Open_File
--
procedure Open
(Name : in String)
is
begin
Ada.Text_Io.Open(File, In_File, Name);
if not End_Of_File(File) then
Line := Get_Line(File);
end if;
end Open;
--
-- Close_File
--
procedure Close
is
begin
if Is_Open(File) then
Ada.Text_Io.Close(File);
Error_Occured_Parsing_File := false;
end if;
end Close;
--
-- Next_Unbounded_String
--
function Next_Unbounded_String
return Unbounded_String
is
Result : Unbounded_String := NULL_UNBOUNDED_STRING;
begin
Error_On_Recent_Operation := false;
Trim(Line, Ada.Strings.Both);
if not Is_Open(File) then
if DEBUGGING_ON then
Put_Line(EXCEPTION_PARSE_UNOPENED_FILE);
end if;
Error_Occured_Parsing_File := true;
Error_On_Recent_Operation := true;
return Result;
end if;
loop
if Length(Line) /= 0 then
for I in 1..Length(Line) loop
if Element(Line, I) = ' ' or I = Length(Line) then
Result := To_Unbounded_String(Slice(Line, 1, I));
Delete(Line, 1, I);
return Result;
end if;
end loop;
else
if End_Of_File(File) then
if DEBUGGING_ON then
Put_Line(EXCEPTION_PARSE_NOTHING);
end if;
Error_Occured_Parsing_File := true;
Error_On_Recent_Operation := true;
return Result;
end if;
Line := Trim(Get_Line(File), Ada.Strings.Both);
end if;
end loop;
end Next_Unbounded_String;
--
-- Next_String
--
function Next_String
return String
is
begin
return To_String(Next_Unbounded_String);
end Next_String;
--
-- Next_Integer
--
function Next_Integer
return Integer
is
Last : Positive;
Result : Integer := 0;
begin
Io_Integer.Get(To_String(Next_Unbounded_String), Result, Last);
return Result;
exception
when Data_Error | Constraint_Error =>
if DEBUGGING_ON then
Put_Line(EXCEPTION_PARSE_NON_NUMBER);
end if;
Error_Occured_Parsing_File := true;
Error_On_Recent_Operation := true;
return Result;
end Next_Integer;
--
-- Next_Float
--
function Next_Float
return Float
is
Last : Positive;
Result : Float := 0.0;
begin
Io_Float.Get(To_String(Next_Unbounded_String), Result, Last);
return Result;
exception
when Data_Error | Constraint_Error =>
if DEBUGGING_ON then
Put_Line(EXCEPTION_PARSE_NON_NUMBER);
end if;
Error_Occured_Parsing_File := true;
Error_On_Recent_Operation := true;
return Result;
end Next_Float;
end Parsing;
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Text parsing package
2011-03-23 8:32 ` Dmitry A. Kazakov
2011-03-23 11:19 ` Syntax Issues
@ 2011-03-28 0:15 ` Yannick Duchêne (Hibou57)
2011-03-28 8:15 ` Dmitry A. Kazakov
1 sibling, 1 reply; 10+ messages in thread
From: Yannick Duchêne (Hibou57) @ 2011-03-28 0:15 UTC (permalink / raw)
Le Wed, 23 Mar 2011 09:32:10 +0100, Dmitry A. Kazakov
<mailbox@dmitry-kazakov.de> a écrit:
> 1. Don't use unbounded strings. That is an unnecessary overhead.
Yes, better limit the maximum size of literals.
> 3. As others have mentioned, it is a good idea to abstract the source
> formats in order to be able to parse files, strings, streams etc.
And Ada already provides a root for that: Ada.Streams.Root_Stream_Type
> 6. You should decide what drives the parser. In your case it is the
> caller.
> That is not a good idea in most cases, because the caller rarely knows
> what
> to expect next. A better choice is semantic call-backs from the parser to
> the caller. Abstract primitive operations is IMO the best implementation
> of
> such callbacks.
What “drives”, is not just a question rising with parsers, it is rising
every where. Here, if Ada had something like a Yield, this would be nice
too. Otherwise, yes, the callback is a good choice and may allow
optimization, especially if some kind of seeking into the source is an
expected option (the parser can then efficiently skip what is not
relevant; on the opposite, if the caller drives, then no such optimization
is possible).
--
Si les chats miaulent et font autant de vocalises bizarres, c’est pas pour
les chiens.
“ c++; /* this makes c bigger but returns the old value */ ” [Anonymous]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Text parsing package
2011-03-28 0:15 ` Yannick Duchêne (Hibou57)
@ 2011-03-28 8:15 ` Dmitry A. Kazakov
2011-03-28 10:18 ` Yannick Duchêne (Hibou57)
0 siblings, 1 reply; 10+ messages in thread
From: Dmitry A. Kazakov @ 2011-03-28 8:15 UTC (permalink / raw)
On Mon, 28 Mar 2011 02:15:31 +0200, Yannick Duchêne (Hibou57) wrote:
> Le Wed, 23 Mar 2011 09:32:10 +0100, Dmitry A. Kazakov
> <mailbox@dmitry-kazakov.de> a écrit:
>> 1. Don't use unbounded strings. That is an unnecessary overhead.
> Yes, better limit the maximum size of literals.
The literal size is limited by the maximal line length
>> 3. As others have mentioned, it is a good idea to abstract the source
>> formats in order to be able to parse files, strings, streams etc.
> And Ada already provides a root for that: Ada.Streams.Root_Stream_Type
Unfortunately Root_Stream_Type is not an interface and string cannot be
inherited from. So in the effect you would need a special root.
>> 6. You should decide what drives the parser. In your case it is the caller.
>> That is not a good idea in most cases, because the caller rarely knows what
>> to expect next. A better choice is semantic call-backs from the parser to
>> the caller. Abstract primitive operations is IMO the best implementation of
>> such callbacks.
> What “drives”, is not just a question rising with parsers, it is rising
> every where. Here, if Ada had something like a Yield, this would be nice
> too.
Yes, it would be nice to have structured multiple co-routines as an
alternative to FSM. There are some cases where tasks look like an overkill.
--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Text parsing package
2011-03-28 8:15 ` Dmitry A. Kazakov
@ 2011-03-28 10:18 ` Yannick Duchêne (Hibou57)
2011-03-28 12:08 ` Dmitry A. Kazakov
0 siblings, 1 reply; 10+ messages in thread
From: Yannick Duchêne (Hibou57) @ 2011-03-28 10:18 UTC (permalink / raw)
Le Mon, 28 Mar 2011 10:15:52 +0200, Dmitry A. Kazakov
<mailbox@dmitry-kazakov.de> a écrit:
>>> 3. As others have mentioned, it is a good idea to abstract the source
>>> formats in order to be able to parse files, strings, streams etc.
>> And Ada already provides a root for that: Ada.Streams.Root_Stream_Type
>
> Unfortunately Root_Stream_Type is not an interface and string cannot be
> inherited from. So in the effect you would need a special root.
I meant implementing a concrete Root_Stream_Type using a string as source.
You would have an open method, getting a string or any kind of reference
to string, just not as a filename this time. But wonder about efficiency
(because of the tagged type), which is important for such low-level and
heavily iterated stuff. Buffering with systematic batch input/output may
still be option to reach efficiency, even with a tagged stream type (batch
often help efficiency I feel).
The advantage is that this would be clean to use, as this may be used
every where something expect a standard Ada Stream. The disadvantage may
be the above.
Also can be done with generics, but will not try to advocate it, as when I
used it, I was not happy with that in the end.
> Yes, it would be nice to have structured multiple co-routines as an
> alternative to FSM. There are some cases where tasks look like an
> overkill.
I remember I made that wish too (can't remember what was said to advocate
against it). Let's hope for Ada 2017 or Ada 2022.
--
Si les chats miaulent et font autant de vocalises bizarres, c’est pas pour
les chiens.
“ c++; /* this makes c bigger but returns the old value */ ” [Anonymous]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Text parsing package
2011-03-28 10:18 ` Yannick Duchêne (Hibou57)
@ 2011-03-28 12:08 ` Dmitry A. Kazakov
0 siblings, 0 replies; 10+ messages in thread
From: Dmitry A. Kazakov @ 2011-03-28 12:08 UTC (permalink / raw)
On Mon, 28 Mar 2011 12:18:38 +0200, Yannick Duch�ne (Hibou57) wrote:
> Le Mon, 28 Mar 2011 10:15:52 +0200, Dmitry A. Kazakov
> <mailbox@dmitry-kazakov.de> a �crit:
>>>> 3. As others have mentioned, it is a good idea to abstract the source
>>>> formats in order to be able to parse files, strings, streams etc.
>>> And Ada already provides a root for that: Ada.Streams.Root_Stream_Type
>>
>> Unfortunately Root_Stream_Type is not an interface and string cannot be
>> inherited from. So in the effect you would need a special root.
> I meant implementing a concrete Root_Stream_Type using a string as source.
> You would have an open method, getting a string or any kind of reference
> to string, just not as a filename this time. But wonder about efficiency
> (because of the tagged type), which is important for such low-level and
> heavily iterated stuff.
Another problem is that it would be difficult to use. Mix-in does not work
because String cannot be a discriminant. You would have a nasty
access-to-string one or have to copy the whole string into the stream
object.
>> Yes, it would be nice to have structured multiple co-routines as an
>> alternative to FSM. There are some cases where tasks look like an
>> overkill.
> I remember I made that wish too (can't remember what was said to advocate
> against it).
The reason is always same, if there is nobody from ARG personally
interested in the concept, it will be blindly rejected. Here I mean not the
implementation, but merely a serious consideration of possible ways to
approach the problem. Alone this require much mental work.
> Let's hope for Ada 2017 or Ada 2022.
I doubt it. This is a "serious" issue, not a patch or yet another kludge.
--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2011-03-28 12:08 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-03-22 23:34 Text parsing package Syntax Issues
2011-03-23 3:01 ` Shark8
2011-03-23 6:29 ` Alex Mentis
2011-03-23 6:36 ` J-P. Rosen
2011-03-23 8:32 ` Dmitry A. Kazakov
2011-03-23 11:19 ` Syntax Issues
2011-03-28 0:15 ` Yannick Duchêne (Hibou57)
2011-03-28 8:15 ` Dmitry A. Kazakov
2011-03-28 10:18 ` Yannick Duchêne (Hibou57)
2011-03-28 12:08 ` Dmitry A. Kazakov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox