* Text parsing package @ 2011-03-22 23:34 Syntax Issues 2011-03-23 3:01 ` Shark8 ` (3 more replies) 0 siblings, 4 replies; 10+ messages in thread From: Syntax Issues @ 2011-03-22 23:34 UTC (permalink / raw) I have just finished a simple text parsing package. If anyone is interested I can post the code (only about 160~ lines). Example of how its used: with Parsing, Ada.Text_Io, Ada.Strings.Unbounded; use Ada.Text_Io, Ada.Strings.Unbounded; procedure Test_Parsing is begin Parsing.Open("Test.txt"); Put_Line(Float'Image(Parsing.Next_Float)); Put_Line(Parsing.Next_String); Put_Line(Parsing.Next_String); Put_Line(Parsing.Next_String); Put_Line(To_String(Parsing.Next_Unbounded_String)); Put_Line(Integer'Image(Parsing.Next_Integer)); Put_Line(Float'Image(Parsing.Next_Float)); Parsing.Close; end Test_Parsing; -- Test.txt -- 152.15 Test! TesT c ca4 4 12.9 -- End of file -- -- Output -- 1.52150E+02 Test! TesT c ca4 4 1.29000E+01 -- End of output -- ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Text parsing package 2011-03-22 23:34 Text parsing package Syntax Issues @ 2011-03-23 3:01 ` Shark8 2011-03-23 6:29 ` Alex Mentis ` (2 subsequent siblings) 3 siblings, 0 replies; 10+ messages in thread From: Shark8 @ 2011-03-23 3:01 UTC (permalink / raw) Nice. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Text parsing package 2011-03-22 23:34 Text parsing package Syntax Issues 2011-03-23 3:01 ` Shark8 @ 2011-03-23 6:29 ` Alex Mentis 2011-03-23 6:36 ` J-P. Rosen 2011-03-23 8:32 ` Dmitry A. Kazakov 3 siblings, 0 replies; 10+ messages in thread From: Alex Mentis @ 2011-03-23 6:29 UTC (permalink / raw) Syntax Issues wrote: > I have just finished a simple text parsing package. If anyone is > interested I can post the code (only about 160~ lines). > > > Example of how its used: > > with > Parsing, > Ada.Text_Io, > Ada.Strings.Unbounded; > use > Ada.Text_Io, > Ada.Strings.Unbounded; > procedure Test_Parsing > is > begin > Parsing.Open("Test.txt"); > Put_Line(Float'Image(Parsing.Next_Float)); > Put_Line(Parsing.Next_String); > Put_Line(Parsing.Next_String); > Put_Line(Parsing.Next_String); > Put_Line(To_String(Parsing.Next_Unbounded_String)); > Put_Line(Integer'Image(Parsing.Next_Integer)); > Put_Line(Float'Image(Parsing.Next_Float)); > Parsing.Close; > end Test_Parsing; > -- Test.txt -- > 152.15 Test! > TesT > c > > ca4 > > 4 > > 12.9 > -- End of file -- > > -- Output -- > 1.52150E+02 > Test! > TesT > c > ca4 > 4 > 1.29000E+01 > -- End of output -- Sure, I'd be interested in seeing the source. Does it only work with files, or can it be used with standard input, too? Alex ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Text parsing package 2011-03-22 23:34 Text parsing package Syntax Issues 2011-03-23 3:01 ` Shark8 2011-03-23 6:29 ` Alex Mentis @ 2011-03-23 6:36 ` J-P. Rosen 2011-03-23 8:32 ` Dmitry A. Kazakov 3 siblings, 0 replies; 10+ messages in thread From: J-P. Rosen @ 2011-03-23 6:36 UTC (permalink / raw) Le 23/03/2011 00:34, Syntax Issues a �crit : > I have just finished a simple text parsing package. If anyone is > interested I can post the code (only about 160~ lines). > [...] Hmmm... out of curiosity, what does your package add to what's offered by Text_IO ? -- --------------------------------------------------------- J-P. Rosen (rosen@adalog.fr) Adalog a d�m�nag� / Adalog has moved: 2 rue du Docteur Lombard, 92441 Issy-les-Moulineaux CEDEX Tel: +33 1 45 29 21 52, Fax: +33 1 45 29 25 00 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Text parsing package 2011-03-22 23:34 Text parsing package Syntax Issues ` (2 preceding siblings ...) 2011-03-23 6:36 ` J-P. Rosen @ 2011-03-23 8:32 ` Dmitry A. Kazakov 2011-03-23 11:19 ` Syntax Issues 2011-03-28 0:15 ` Yannick Duchêne (Hibou57) 3 siblings, 2 replies; 10+ messages in thread From: Dmitry A. Kazakov @ 2011-03-23 8:32 UTC (permalink / raw) On Tue, 22 Mar 2011 16:34:48 -0700 (PDT), Syntax Issues wrote: > I have just finished a simple text parsing package. Congratulations. What are you parsing? CSV? > If anyone is interested I can post the code (only about 160~ lines). Some notes to parsing techniques: 1. Don't use unbounded strings. That is an unnecessary overhead. 2. When parsing something you should have a kind of syntax error handling. Exceptions with error location information is IMO the best choice. 3. As others have mentioned, it is a good idea to abstract the source formats in order to be able to parse files, strings, streams etc. 4. Encoding issues is a related issue to the above. If you have that source abstraction layer, you can deal everything Unicode, transcoding things there and keeping the parser agnostic to encoding. 5. The state of the parser should be encapsulated in an object. Otherwise you won't be able to reenter the parser or to make a recursively descent one. 6. You should decide what drives the parser. In your case it is the caller. That is not a good idea in most cases, because the caller rarely knows what to expect next. A better choice is semantic call-backs from the parser to the caller. Abstract primitive operations is IMO the best implementation of such callbacks. 7. Usually parser is a middleman. It means that you should consider how to shape the intermediate results of parsing, e.g. the AST. Ada pools are very nice to keep that stuff in an arena. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Text parsing package 2011-03-23 8:32 ` Dmitry A. Kazakov @ 2011-03-23 11:19 ` Syntax Issues 2011-03-28 0:15 ` Yannick Duchêne (Hibou57) 1 sibling, 0 replies; 10+ messages in thread From: Syntax Issues @ 2011-03-23 11:19 UTC (permalink / raw) Thanks for posting the techniques Dmitry. I am going to try implementing as many as I can -- I am still a fairly new programmer, and I still have a lot to learn. Right now it uses text_io and strings.unbounded.text_io (this is probably bad :/) and only works with files. I tried to make it as simple as possible with exception checking -- global booleans are set and checked by the running program. -- Spec with Ada.Exceptions, Ada.Text_Io, Ada.Strings.Unbounded.Text_Io, Ada.Strings.Unbounded; use Ada.Exceptions, Ada.Text_Io, Ada.Strings.Unbounded.Text_Io, Ada.Strings.Unbounded; package Parsing is --------------- -- Constants -- --------------- DEBUGGING_ON : constant Boolean := true; EXCEPTION_PARSE_NOTHING : constant String := "Attempted to parse passed the end-of-file."; EXCEPTION_PARSE_UNOPENED_FILE : constant String := "Attempted to parse an unopened file."; EXCEPTION_PARSE_NON_NUMBER : constant String := "Failed to parse a number."; NULL_STRING_FIXED : constant String := ""; NULL_STRING_UNBOUNDED : constant Unbounded_String := To_Unbounded_String(NULL_STRING_FIXED); -------------- -- Packages -- -------------- package Io_Integer is new Ada.Text_Io.Integer_Io(Integer); package Io_Float is new Ada.Text_Io.Float_Io(Float); --------------- -- Variables -- --------------- Error_On_Recent_Operation : Boolean := false; Error_Occured_Parsing_File : Boolean := false; Line : Unbounded_String := NULL_STRING_UNBOUNDED; File : File_Type; ----------------- -- Subprograms -- ----------------- procedure Open (Name : in String); pragma Inline(Open); procedure Close; pragma Inline(Close); function Next_Unbounded_String return Unbounded_String; function Next_String return String; pragma Inline(Next_String); function Next_Integer return Integer; function Next_Float return Float; end Parsing; --- Body package body Parsing is -- -- Open_File -- procedure Open (Name : in String) is begin Ada.Text_Io.Open(File, In_File, Name); if not End_Of_File(File) then Line := Get_Line(File); end if; end Open; -- -- Close_File -- procedure Close is begin if Is_Open(File) then Ada.Text_Io.Close(File); Error_Occured_Parsing_File := false; end if; end Close; -- -- Next_Unbounded_String -- function Next_Unbounded_String return Unbounded_String is Result : Unbounded_String := NULL_UNBOUNDED_STRING; begin Error_On_Recent_Operation := false; Trim(Line, Ada.Strings.Both); if not Is_Open(File) then if DEBUGGING_ON then Put_Line(EXCEPTION_PARSE_UNOPENED_FILE); end if; Error_Occured_Parsing_File := true; Error_On_Recent_Operation := true; return Result; end if; loop if Length(Line) /= 0 then for I in 1..Length(Line) loop if Element(Line, I) = ' ' or I = Length(Line) then Result := To_Unbounded_String(Slice(Line, 1, I)); Delete(Line, 1, I); return Result; end if; end loop; else if End_Of_File(File) then if DEBUGGING_ON then Put_Line(EXCEPTION_PARSE_NOTHING); end if; Error_Occured_Parsing_File := true; Error_On_Recent_Operation := true; return Result; end if; Line := Trim(Get_Line(File), Ada.Strings.Both); end if; end loop; end Next_Unbounded_String; -- -- Next_String -- function Next_String return String is begin return To_String(Next_Unbounded_String); end Next_String; -- -- Next_Integer -- function Next_Integer return Integer is Last : Positive; Result : Integer := 0; begin Io_Integer.Get(To_String(Next_Unbounded_String), Result, Last); return Result; exception when Data_Error | Constraint_Error => if DEBUGGING_ON then Put_Line(EXCEPTION_PARSE_NON_NUMBER); end if; Error_Occured_Parsing_File := true; Error_On_Recent_Operation := true; return Result; end Next_Integer; -- -- Next_Float -- function Next_Float return Float is Last : Positive; Result : Float := 0.0; begin Io_Float.Get(To_String(Next_Unbounded_String), Result, Last); return Result; exception when Data_Error | Constraint_Error => if DEBUGGING_ON then Put_Line(EXCEPTION_PARSE_NON_NUMBER); end if; Error_Occured_Parsing_File := true; Error_On_Recent_Operation := true; return Result; end Next_Float; end Parsing; ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Text parsing package 2011-03-23 8:32 ` Dmitry A. Kazakov 2011-03-23 11:19 ` Syntax Issues @ 2011-03-28 0:15 ` Yannick Duchêne (Hibou57) 2011-03-28 8:15 ` Dmitry A. Kazakov 1 sibling, 1 reply; 10+ messages in thread From: Yannick Duchêne (Hibou57) @ 2011-03-28 0:15 UTC (permalink / raw) Le Wed, 23 Mar 2011 09:32:10 +0100, Dmitry A. Kazakov <mailbox@dmitry-kazakov.de> a écrit: > 1. Don't use unbounded strings. That is an unnecessary overhead. Yes, better limit the maximum size of literals. > 3. As others have mentioned, it is a good idea to abstract the source > formats in order to be able to parse files, strings, streams etc. And Ada already provides a root for that: Ada.Streams.Root_Stream_Type > 6. You should decide what drives the parser. In your case it is the > caller. > That is not a good idea in most cases, because the caller rarely knows > what > to expect next. A better choice is semantic call-backs from the parser to > the caller. Abstract primitive operations is IMO the best implementation > of > such callbacks. What “drives”, is not just a question rising with parsers, it is rising every where. Here, if Ada had something like a Yield, this would be nice too. Otherwise, yes, the callback is a good choice and may allow optimization, especially if some kind of seeking into the source is an expected option (the parser can then efficiently skip what is not relevant; on the opposite, if the caller drives, then no such optimization is possible). -- Si les chats miaulent et font autant de vocalises bizarres, c’est pas pour les chiens. “ c++; /* this makes c bigger but returns the old value */ ” [Anonymous] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Text parsing package 2011-03-28 0:15 ` Yannick Duchêne (Hibou57) @ 2011-03-28 8:15 ` Dmitry A. Kazakov 2011-03-28 10:18 ` Yannick Duchêne (Hibou57) 0 siblings, 1 reply; 10+ messages in thread From: Dmitry A. Kazakov @ 2011-03-28 8:15 UTC (permalink / raw) On Mon, 28 Mar 2011 02:15:31 +0200, Yannick Duchêne (Hibou57) wrote: > Le Wed, 23 Mar 2011 09:32:10 +0100, Dmitry A. Kazakov > <mailbox@dmitry-kazakov.de> a écrit: >> 1. Don't use unbounded strings. That is an unnecessary overhead. > Yes, better limit the maximum size of literals. The literal size is limited by the maximal line length >> 3. As others have mentioned, it is a good idea to abstract the source >> formats in order to be able to parse files, strings, streams etc. > And Ada already provides a root for that: Ada.Streams.Root_Stream_Type Unfortunately Root_Stream_Type is not an interface and string cannot be inherited from. So in the effect you would need a special root. >> 6. You should decide what drives the parser. In your case it is the caller. >> That is not a good idea in most cases, because the caller rarely knows what >> to expect next. A better choice is semantic call-backs from the parser to >> the caller. Abstract primitive operations is IMO the best implementation of >> such callbacks. > What “drives”, is not just a question rising with parsers, it is rising > every where. Here, if Ada had something like a Yield, this would be nice > too. Yes, it would be nice to have structured multiple co-routines as an alternative to FSM. There are some cases where tasks look like an overkill. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Text parsing package 2011-03-28 8:15 ` Dmitry A. Kazakov @ 2011-03-28 10:18 ` Yannick Duchêne (Hibou57) 2011-03-28 12:08 ` Dmitry A. Kazakov 0 siblings, 1 reply; 10+ messages in thread From: Yannick Duchêne (Hibou57) @ 2011-03-28 10:18 UTC (permalink / raw) Le Mon, 28 Mar 2011 10:15:52 +0200, Dmitry A. Kazakov <mailbox@dmitry-kazakov.de> a écrit: >>> 3. As others have mentioned, it is a good idea to abstract the source >>> formats in order to be able to parse files, strings, streams etc. >> And Ada already provides a root for that: Ada.Streams.Root_Stream_Type > > Unfortunately Root_Stream_Type is not an interface and string cannot be > inherited from. So in the effect you would need a special root. I meant implementing a concrete Root_Stream_Type using a string as source. You would have an open method, getting a string or any kind of reference to string, just not as a filename this time. But wonder about efficiency (because of the tagged type), which is important for such low-level and heavily iterated stuff. Buffering with systematic batch input/output may still be option to reach efficiency, even with a tagged stream type (batch often help efficiency I feel). The advantage is that this would be clean to use, as this may be used every where something expect a standard Ada Stream. The disadvantage may be the above. Also can be done with generics, but will not try to advocate it, as when I used it, I was not happy with that in the end. > Yes, it would be nice to have structured multiple co-routines as an > alternative to FSM. There are some cases where tasks look like an > overkill. I remember I made that wish too (can't remember what was said to advocate against it). Let's hope for Ada 2017 or Ada 2022. -- Si les chats miaulent et font autant de vocalises bizarres, c’est pas pour les chiens. “ c++; /* this makes c bigger but returns the old value */ ” [Anonymous] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Text parsing package 2011-03-28 10:18 ` Yannick Duchêne (Hibou57) @ 2011-03-28 12:08 ` Dmitry A. Kazakov 0 siblings, 0 replies; 10+ messages in thread From: Dmitry A. Kazakov @ 2011-03-28 12:08 UTC (permalink / raw) On Mon, 28 Mar 2011 12:18:38 +0200, Yannick Duch�ne (Hibou57) wrote: > Le Mon, 28 Mar 2011 10:15:52 +0200, Dmitry A. Kazakov > <mailbox@dmitry-kazakov.de> a �crit: >>>> 3. As others have mentioned, it is a good idea to abstract the source >>>> formats in order to be able to parse files, strings, streams etc. >>> And Ada already provides a root for that: Ada.Streams.Root_Stream_Type >> >> Unfortunately Root_Stream_Type is not an interface and string cannot be >> inherited from. So in the effect you would need a special root. > I meant implementing a concrete Root_Stream_Type using a string as source. > You would have an open method, getting a string or any kind of reference > to string, just not as a filename this time. But wonder about efficiency > (because of the tagged type), which is important for such low-level and > heavily iterated stuff. Another problem is that it would be difficult to use. Mix-in does not work because String cannot be a discriminant. You would have a nasty access-to-string one or have to copy the whole string into the stream object. >> Yes, it would be nice to have structured multiple co-routines as an >> alternative to FSM. There are some cases where tasks look like an >> overkill. > I remember I made that wish too (can't remember what was said to advocate > against it). The reason is always same, if there is nobody from ARG personally interested in the concept, it will be blindly rejected. Here I mean not the implementation, but merely a serious consideration of possible ways to approach the problem. Alone this require much mental work. > Let's hope for Ada 2017 or Ada 2022. I doubt it. This is a "serious" issue, not a patch or yet another kludge. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2011-03-28 12:08 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-03-22 23:34 Text parsing package Syntax Issues 2011-03-23 3:01 ` Shark8 2011-03-23 6:29 ` Alex Mentis 2011-03-23 6:36 ` J-P. Rosen 2011-03-23 8:32 ` Dmitry A. Kazakov 2011-03-23 11:19 ` Syntax Issues 2011-03-28 0:15 ` Yannick Duchêne (Hibou57) 2011-03-28 8:15 ` Dmitry A. Kazakov 2011-03-28 10:18 ` Yannick Duchêne (Hibou57) 2011-03-28 12:08 ` Dmitry A. Kazakov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox