comp.lang.ada
 help / color / mirror / Atom feed
* Text parsing package
@ 2011-03-22 23:34 Syntax Issues
  2011-03-23  3:01 ` Shark8
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Syntax Issues @ 2011-03-22 23:34 UTC (permalink / raw)


I have just finished a simple text parsing package. If anyone is
interested I can post the code (only about 160~ lines).


Example of how its used:

with
	Parsing,
	Ada.Text_Io,
	Ada.Strings.Unbounded;
use
	Ada.Text_Io,
	Ada.Strings.Unbounded;
procedure Test_Parsing
	is
	begin
		Parsing.Open("Test.txt");
		Put_Line(Float'Image(Parsing.Next_Float));
		Put_Line(Parsing.Next_String);
		Put_Line(Parsing.Next_String);
		Put_Line(Parsing.Next_String);
		Put_Line(To_String(Parsing.Next_Unbounded_String));
		Put_Line(Integer'Image(Parsing.Next_Integer));
		Put_Line(Float'Image(Parsing.Next_Float));
		Parsing.Close;
	end Test_Parsing;
-- Test.txt --
152.15 Test!
TesT
c

ca4

4

12.9
-- End of file --

-- Output --
 1.52150E+02
Test!
TesT
c
ca4
 4
 1.29000E+01
-- End of output --



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Text parsing package
  2011-03-22 23:34 Text parsing package Syntax Issues
@ 2011-03-23  3:01 ` Shark8
  2011-03-23  6:29 ` Alex Mentis
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Shark8 @ 2011-03-23  3:01 UTC (permalink / raw)


Nice.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Text parsing package
  2011-03-22 23:34 Text parsing package Syntax Issues
  2011-03-23  3:01 ` Shark8
@ 2011-03-23  6:29 ` Alex Mentis
  2011-03-23  6:36 ` J-P. Rosen
  2011-03-23  8:32 ` Dmitry A. Kazakov
  3 siblings, 0 replies; 10+ messages in thread
From: Alex Mentis @ 2011-03-23  6:29 UTC (permalink / raw)


Syntax Issues wrote:

> I have just finished a simple text parsing package. If anyone is
> interested I can post the code (only about 160~ lines).
> 
> 
> Example of how its used:
> 
> with
> 	Parsing,
> 	Ada.Text_Io,
> 	Ada.Strings.Unbounded;
> use
> 	Ada.Text_Io,
> 	Ada.Strings.Unbounded;
> procedure Test_Parsing
> 	is
> 	begin
> 		Parsing.Open("Test.txt");
> 		Put_Line(Float'Image(Parsing.Next_Float));
> 		Put_Line(Parsing.Next_String);
> 		Put_Line(Parsing.Next_String);
> 		Put_Line(Parsing.Next_String);
> 		Put_Line(To_String(Parsing.Next_Unbounded_String));
> 		Put_Line(Integer'Image(Parsing.Next_Integer));
> 		Put_Line(Float'Image(Parsing.Next_Float));
> 		Parsing.Close;
> 	end Test_Parsing;
> -- Test.txt --
> 152.15 Test!
> TesT
> c
> 
> ca4
> 
> 4
> 
> 12.9
> -- End of file --
> 
> -- Output --
>  1.52150E+02
> Test!
> TesT
> c
> ca4
>  4
>  1.29000E+01
> -- End of output --

Sure, I'd be interested in seeing the source. Does it only work with
files, or can it be used with standard input, too?

Alex



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Text parsing package
  2011-03-22 23:34 Text parsing package Syntax Issues
  2011-03-23  3:01 ` Shark8
  2011-03-23  6:29 ` Alex Mentis
@ 2011-03-23  6:36 ` J-P. Rosen
  2011-03-23  8:32 ` Dmitry A. Kazakov
  3 siblings, 0 replies; 10+ messages in thread
From: J-P. Rosen @ 2011-03-23  6:36 UTC (permalink / raw)


Le 23/03/2011 00:34, Syntax Issues a �crit :
> I have just finished a simple text parsing package. If anyone is
> interested I can post the code (only about 160~ lines).
> [...]
Hmmm... out of curiosity, what does your package add to what's offered
by Text_IO ?



-- 
---------------------------------------------------------
           J-P. Rosen (rosen@adalog.fr)
Adalog a d�m�nag� / Adalog has moved:
2 rue du Docteur Lombard, 92441 Issy-les-Moulineaux CEDEX
Tel: +33 1 45 29 21 52, Fax: +33 1 45 29 25 00



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Text parsing package
  2011-03-22 23:34 Text parsing package Syntax Issues
                   ` (2 preceding siblings ...)
  2011-03-23  6:36 ` J-P. Rosen
@ 2011-03-23  8:32 ` Dmitry A. Kazakov
  2011-03-23 11:19   ` Syntax Issues
  2011-03-28  0:15   ` Yannick Duchêne (Hibou57)
  3 siblings, 2 replies; 10+ messages in thread
From: Dmitry A. Kazakov @ 2011-03-23  8:32 UTC (permalink / raw)


On Tue, 22 Mar 2011 16:34:48 -0700 (PDT), Syntax Issues wrote:

> I have just finished a simple text parsing package.

Congratulations. What are you parsing? CSV?

> If anyone is interested I can post the code (only about 160~ lines).

Some notes to parsing techniques:

1. Don't use unbounded strings. That is an unnecessary overhead.

2. When parsing something you should have a kind of syntax error handling.
Exceptions with error location information is IMO the best choice.

3. As others have mentioned, it is a good idea to abstract the source
formats in order to be able to parse files, strings, streams etc.

4. Encoding issues is a related issue to the above. If you have that source
abstraction layer, you can deal everything Unicode, transcoding things
there and keeping the parser agnostic to encoding.

5. The state of the parser should be encapsulated in an object. Otherwise
you won't be able to reenter the parser or to make a recursively descent
one.

6. You should decide what drives the parser. In your case it is the caller.
That is not a good idea in most cases, because the caller rarely knows what
to expect next. A better choice is semantic call-backs from the parser to
the caller. Abstract primitive operations is IMO the best implementation of
such callbacks.

7. Usually parser is a middleman. It means that you should consider how to
shape the intermediate results of parsing, e.g. the AST. Ada pools are very
nice to keep that stuff in an arena.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Text parsing package
  2011-03-23  8:32 ` Dmitry A. Kazakov
@ 2011-03-23 11:19   ` Syntax Issues
  2011-03-28  0:15   ` Yannick Duchêne (Hibou57)
  1 sibling, 0 replies; 10+ messages in thread
From: Syntax Issues @ 2011-03-23 11:19 UTC (permalink / raw)


Thanks for posting the techniques Dmitry. I am going to try
implementing as many as I can -- I am still a fairly new programmer,
and I still have a lot to learn.
Right now it uses text_io and strings.unbounded.text_io (this is
probably bad :/) and only works with files. I tried to make it as
simple as possible with exception checking -- global booleans are set
and checked by the running program.
-- Spec
with
	Ada.Exceptions,
	Ada.Text_Io,
	Ada.Strings.Unbounded.Text_Io,
	Ada.Strings.Unbounded;
use
	Ada.Exceptions,
	Ada.Text_Io,
	Ada.Strings.Unbounded.Text_Io,
	Ada.Strings.Unbounded;
package Parsing
	is
	---------------
	-- Constants --
	---------------
		DEBUGGING_ON                  : constant Boolean          := true;
		EXCEPTION_PARSE_NOTHING       : constant String           :=
"Attempted to parse passed the end-of-file.";
		EXCEPTION_PARSE_UNOPENED_FILE : constant String           :=
"Attempted to parse an unopened file.";
		EXCEPTION_PARSE_NON_NUMBER    : constant String           := "Failed
to parse a number.";
		NULL_STRING_FIXED             : constant String           := "";
		NULL_STRING_UNBOUNDED         : constant Unbounded_String :=
To_Unbounded_String(NULL_STRING_FIXED);
	--------------
	-- Packages --
	--------------
		package Io_Integer
			is new Ada.Text_Io.Integer_Io(Integer);
		package Io_Float
			is new Ada.Text_Io.Float_Io(Float);
	---------------
	-- Variables --
	---------------
		Error_On_Recent_Operation  : Boolean          := false;
		Error_Occured_Parsing_File : Boolean          := false;
		Line                       : Unbounded_String :=
NULL_STRING_UNBOUNDED;
		File                       : File_Type;
	-----------------
	-- Subprograms --
	-----------------
		procedure Open
			(Name : in String);
			pragma Inline(Open);
		procedure Close;
			pragma Inline(Close);
		function Next_Unbounded_String
			return Unbounded_String;
		function Next_String
			return String;
			pragma Inline(Next_String);
		function Next_Integer
			return Integer;
		function Next_Float
			return Float;
	end Parsing;
--- Body
package body Parsing
	is
	--
	-- Open_File
	--
	procedure Open
		(Name : in String)
		is
		begin
			Ada.Text_Io.Open(File, In_File, Name);
			if not End_Of_File(File) then
				Line := Get_Line(File);
			end if;
		end Open;
	--
	-- Close_File
	--
	procedure Close
		is
		begin
			if Is_Open(File) then
				Ada.Text_Io.Close(File);
				Error_Occured_Parsing_File := false;
			end if;
		end Close;
	--
	-- Next_Unbounded_String
	--
	function Next_Unbounded_String
		return Unbounded_String
		is
		Result : Unbounded_String := NULL_UNBOUNDED_STRING;
		begin
			Error_On_Recent_Operation := false;
			Trim(Line, Ada.Strings.Both);
			if not Is_Open(File) then
				if DEBUGGING_ON then
					Put_Line(EXCEPTION_PARSE_UNOPENED_FILE);
				end if;
				Error_Occured_Parsing_File := true;
				Error_On_Recent_Operation  := true;
				return Result;
			end if;
			loop
				if Length(Line) /= 0 then
					for I in 1..Length(Line) loop
						if Element(Line, I) = ' ' or I = Length(Line) then
							Result := To_Unbounded_String(Slice(Line, 1, I));
							Delete(Line, 1, I);
							return Result;
						end if;
					end loop;
				else
					if End_Of_File(File) then
						if DEBUGGING_ON then
							Put_Line(EXCEPTION_PARSE_NOTHING);
						end if;
						Error_Occured_Parsing_File := true;
						Error_On_Recent_Operation  := true;
						return Result;
					end if;
					Line := Trim(Get_Line(File), Ada.Strings.Both);
				end if;
			end loop;
		end Next_Unbounded_String;
	--
	-- Next_String
	--
	function Next_String
		return String
		is
		begin
			return To_String(Next_Unbounded_String);
		end Next_String;
	--
	-- Next_Integer
	--
	function Next_Integer
		return Integer
		is
		Last   : Positive;
		Result : Integer := 0;
		begin
			Io_Integer.Get(To_String(Next_Unbounded_String), Result, Last);
			return Result;
			exception
				when Data_Error | Constraint_Error =>
					if DEBUGGING_ON then
						Put_Line(EXCEPTION_PARSE_NON_NUMBER);
					end if;
					Error_Occured_Parsing_File := true;
					Error_On_Recent_Operation  := true;
					return Result;
		end Next_Integer;
	--
	-- Next_Float
	--
	function Next_Float
		return Float
		is
		Last   : Positive;
		Result : Float := 0.0;
		begin
			Io_Float.Get(To_String(Next_Unbounded_String), Result, Last);
			return Result;
			exception
				when Data_Error | Constraint_Error =>
					if DEBUGGING_ON then
						Put_Line(EXCEPTION_PARSE_NON_NUMBER);
					end if;
					Error_Occured_Parsing_File := true;
					Error_On_Recent_Operation  := true;
					return Result;
		end Next_Float;
	end Parsing;



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Text parsing package
  2011-03-23  8:32 ` Dmitry A. Kazakov
  2011-03-23 11:19   ` Syntax Issues
@ 2011-03-28  0:15   ` Yannick Duchêne (Hibou57)
  2011-03-28  8:15     ` Dmitry A. Kazakov
  1 sibling, 1 reply; 10+ messages in thread
From: Yannick Duchêne (Hibou57) @ 2011-03-28  0:15 UTC (permalink / raw)


Le Wed, 23 Mar 2011 09:32:10 +0100, Dmitry A. Kazakov  
<mailbox@dmitry-kazakov.de> a écrit:
> 1. Don't use unbounded strings. That is an unnecessary overhead.
Yes, better limit the maximum size of literals.

> 3. As others have mentioned, it is a good idea to abstract the source
> formats in order to be able to parse files, strings, streams etc.
And Ada already provides a root for that: Ada.Streams.Root_Stream_Type

> 6. You should decide what drives the parser. In your case it is the  
> caller.
> That is not a good idea in most cases, because the caller rarely knows  
> what
> to expect next. A better choice is semantic call-backs from the parser to
> the caller. Abstract primitive operations is IMO the best implementation  
> of
> such callbacks.
What “drives”, is not just a question rising with parsers, it is rising  
every where. Here, if Ada had something like a Yield, this would be nice  
too. Otherwise, yes, the callback is a good choice and may allow  
optimization, especially if some kind of seeking into the source is an  
expected option (the parser can then efficiently skip what is not  
relevant; on the opposite, if the caller drives, then no such optimization  
is possible).


-- 
Si les chats miaulent et font autant de vocalises bizarres, c’est pas pour  
les chiens.
“ c++; /* this makes c bigger but returns the old value */ ” [Anonymous]



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Text parsing package
  2011-03-28  0:15   ` Yannick Duchêne (Hibou57)
@ 2011-03-28  8:15     ` Dmitry A. Kazakov
  2011-03-28 10:18       ` Yannick Duchêne (Hibou57)
  0 siblings, 1 reply; 10+ messages in thread
From: Dmitry A. Kazakov @ 2011-03-28  8:15 UTC (permalink / raw)


On Mon, 28 Mar 2011 02:15:31 +0200, Yannick Duchêne (Hibou57) wrote:

> Le Wed, 23 Mar 2011 09:32:10 +0100, Dmitry A. Kazakov  
> <mailbox@dmitry-kazakov.de> a écrit:
>> 1. Don't use unbounded strings. That is an unnecessary overhead.
> Yes, better limit the maximum size of literals.

The literal size is limited by the maximal line length

>> 3. As others have mentioned, it is a good idea to abstract the source
>> formats in order to be able to parse files, strings, streams etc.
> And Ada already provides a root for that: Ada.Streams.Root_Stream_Type

Unfortunately Root_Stream_Type is not an interface and string cannot be
inherited from. So in the effect you would need a special root.

>> 6. You should decide what drives the parser. In your case it is the caller.
>> That is not a good idea in most cases, because the caller rarely knows what
>> to expect next. A better choice is semantic call-backs from the parser to
>> the caller. Abstract primitive operations is IMO the best implementation of
>> such callbacks.

> What “drives”, is not just a question rising with parsers, it is rising  
> every where. Here, if Ada had something like a Yield, this would be nice  
> too.

Yes, it would be nice to have structured multiple co-routines as an
alternative to FSM. There are some cases where tasks look like an overkill.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Text parsing package
  2011-03-28  8:15     ` Dmitry A. Kazakov
@ 2011-03-28 10:18       ` Yannick Duchêne (Hibou57)
  2011-03-28 12:08         ` Dmitry A. Kazakov
  0 siblings, 1 reply; 10+ messages in thread
From: Yannick Duchêne (Hibou57) @ 2011-03-28 10:18 UTC (permalink / raw)


Le Mon, 28 Mar 2011 10:15:52 +0200, Dmitry A. Kazakov  
<mailbox@dmitry-kazakov.de> a écrit:
>>> 3. As others have mentioned, it is a good idea to abstract the source
>>> formats in order to be able to parse files, strings, streams etc.
>> And Ada already provides a root for that: Ada.Streams.Root_Stream_Type
>
> Unfortunately Root_Stream_Type is not an interface and string cannot be
> inherited from. So in the effect you would need a special root.
I meant implementing a concrete Root_Stream_Type using a string as source.  
You would have an open method, getting a string or any kind of reference  
to string, just not as a filename this time. But wonder about efficiency  
(because of the tagged type), which is important for such low-level and  
heavily iterated stuff. Buffering with systematic batch input/output may  
still be option to reach efficiency, even with a tagged stream type (batch  
often help efficiency I feel).

The advantage is that this would be clean to use, as this may be used  
every where something expect a standard Ada Stream. The disadvantage may  
be the above.

Also can be done with generics, but will not try to advocate it, as when I  
used it, I was not happy with that in the end.

> Yes, it would be nice to have structured multiple co-routines as an
> alternative to FSM. There are some cases where tasks look like an  
> overkill.
I remember I made that wish too (can't remember what was said to advocate  
against it). Let's hope for Ada 2017 or Ada 2022.

-- 
Si les chats miaulent et font autant de vocalises bizarres, c’est pas pour  
les chiens.
“ c++; /* this makes c bigger but returns the old value */ ” [Anonymous]



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Text parsing package
  2011-03-28 10:18       ` Yannick Duchêne (Hibou57)
@ 2011-03-28 12:08         ` Dmitry A. Kazakov
  0 siblings, 0 replies; 10+ messages in thread
From: Dmitry A. Kazakov @ 2011-03-28 12:08 UTC (permalink / raw)


On Mon, 28 Mar 2011 12:18:38 +0200, Yannick Duch�ne (Hibou57) wrote:

> Le Mon, 28 Mar 2011 10:15:52 +0200, Dmitry A. Kazakov  
> <mailbox@dmitry-kazakov.de> a �crit:
>>>> 3. As others have mentioned, it is a good idea to abstract the source
>>>> formats in order to be able to parse files, strings, streams etc.
>>> And Ada already provides a root for that: Ada.Streams.Root_Stream_Type
>>
>> Unfortunately Root_Stream_Type is not an interface and string cannot be
>> inherited from. So in the effect you would need a special root.
> I meant implementing a concrete Root_Stream_Type using a string as source.  
> You would have an open method, getting a string or any kind of reference  
> to string, just not as a filename this time. But wonder about efficiency  
> (because of the tagged type), which is important for such low-level and  
> heavily iterated stuff.

Another problem is that it would be difficult to use. Mix-in does not work
because String cannot be a discriminant. You would have a nasty
access-to-string one or have to copy the whole string into the stream
object.

>> Yes, it would be nice to have structured multiple co-routines as an
>> alternative to FSM. There are some cases where tasks look like an  
>> overkill.
> I remember I made that wish too (can't remember what was said to advocate  
> against it).

The reason is always same, if there is nobody from ARG personally
interested in the concept, it will be blindly rejected. Here I mean not the
implementation, but merely a serious consideration of possible ways to
approach the problem. Alone this require much mental work.

> Let's hope for Ada 2017 or Ada 2022.

I doubt it. This is a "serious" issue, not a patch or yet another kludge.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-03-28 12:08 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-03-22 23:34 Text parsing package Syntax Issues
2011-03-23  3:01 ` Shark8
2011-03-23  6:29 ` Alex Mentis
2011-03-23  6:36 ` J-P. Rosen
2011-03-23  8:32 ` Dmitry A. Kazakov
2011-03-23 11:19   ` Syntax Issues
2011-03-28  0:15   ` Yannick Duchêne (Hibou57)
2011-03-28  8:15     ` Dmitry A. Kazakov
2011-03-28 10:18       ` Yannick Duchêne (Hibou57)
2011-03-28 12:08         ` Dmitry A. Kazakov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox