From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00,
	REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4
X-Google-Thread: 103376,a3b1adae5552af6d
X-Google-NewGroupId: yes
X-Google-Attributes: gida07f3367d7,domainid0,public,usenet
X-Google-Language: ENGLISH,ASCII-7-bit
Path: 
 g2news2.google.com!news4.google.com!feeder.news-service.com!newsfeed.straub-nv.de!noris.net!newsfeed.arcor.de!newsspool4.arcor-online.net!news.arcor.de.POSTED!not-for-mail
From: "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de>
Subject: Re: Text parsing package
Newsgroups: comp.lang.ada
User-Agent: 40tude_Dialog/2.0.15.1
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Reply-To: mailbox@dmitry-kazakov.de
Organization: cbb software GmbH
References: 
 <4fa6c081-910a-4e62-928a-0f6cdd1da951@l18g2000yqm.googlegroups.com>
Date: Wed, 23 Mar 2011 09:32:10 +0100
Message-ID: <v6w93r8pxrcx$.b590s9zimz0o$.dlg@40tude.net>
NNTP-Posting-Date: 23 Mar 2011 09:32:10 CET
NNTP-Posting-Host: d575ba8e.newsspool1.arcor-online.net
X-Trace: 
 DXC=ZOQZ9]nP7L_V;Ef1`Jk54\ic==]BZ:af^4Fo<]lROoRQ<`=YMgDjhgRUVH`WNGKX]V[6LHn;2LCV^7enW;^6ZC`T\`mfM[68DCSaN_jcKoKC\W
X-Complaints-To: usenet-abuse@arcor.de
Xref: g2news2.google.com comp.lang.ada:19366
Date: 2011-03-23T09:32:10+01:00
List-Id: <comp.lang.ada>

On Tue, 22 Mar 2011 16:34:48 -0700 (PDT), Syntax Issues wrote:

> I have just finished a simple text parsing package.

Congratulations. What are you parsing? CSV?

> If anyone is interested I can post the code (only about 160~ lines).

Some notes to parsing techniques:

1. Don't use unbounded strings. That is an unnecessary overhead.

2. When parsing something you should have a kind of syntax error handling.
Exceptions with error location information is IMO the best choice.

3. As others have mentioned, it is a good idea to abstract the source
formats in order to be able to parse files, strings, streams etc.

4. Encoding issues is a related issue to the above. If you have that source
abstraction layer, you can deal everything Unicode, transcoding things
there and keeping the parser agnostic to encoding.

5. The state of the parser should be encapsulated in an object. Otherwise
you won't be able to reenter the parser or to make a recursively descent
one.

6. You should decide what drives the parser. In your case it is the caller.
That is not a good idea in most cases, because the caller rarely knows what
to expect next. A better choice is semantic call-backs from the parser to
the caller. Abstract primitive operations is IMO the best implementation of
such callbacks.

7. Usually parser is a middleman. It means that you should consider how to
shape the intermediate results of parsing, e.g. the AST. Ada pools are very
nice to keep that stuff in an arena.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de