From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00, REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4 X-Google-Thread: 103376,a3b1adae5552af6d X-Google-NewGroupId: yes X-Google-Attributes: gida07f3367d7,domainid0,public,usenet X-Google-Language: ENGLISH,ASCII-7-bit Path: g2news2.google.com!news4.google.com!feeder.news-service.com!newsfeed.straub-nv.de!noris.net!newsfeed.arcor.de!newsspool4.arcor-online.net!news.arcor.de.POSTED!not-for-mail From: "Dmitry A. Kazakov" Subject: Re: Text parsing package Newsgroups: comp.lang.ada User-Agent: 40tude_Dialog/2.0.15.1 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Reply-To: mailbox@dmitry-kazakov.de Organization: cbb software GmbH References: <4fa6c081-910a-4e62-928a-0f6cdd1da951@l18g2000yqm.googlegroups.com> Date: Wed, 23 Mar 2011 09:32:10 +0100 Message-ID: NNTP-Posting-Date: 23 Mar 2011 09:32:10 CET NNTP-Posting-Host: d575ba8e.newsspool1.arcor-online.net X-Trace: DXC=ZOQZ9]nP7L_V;Ef1`Jk54\ic==]BZ:af^4Fo<]lROoRQ<`=YMgDjhgRUVH`WNGKX]V[6LHn;2LCV^7enW;^6ZC`T\`mfM[68DCSaN_jcKoKC\W X-Complaints-To: usenet-abuse@arcor.de Xref: g2news2.google.com comp.lang.ada:19366 Date: 2011-03-23T09:32:10+01:00 List-Id: On Tue, 22 Mar 2011 16:34:48 -0700 (PDT), Syntax Issues wrote: > I have just finished a simple text parsing package. Congratulations. What are you parsing? CSV? > If anyone is interested I can post the code (only about 160~ lines). Some notes to parsing techniques: 1. Don't use unbounded strings. That is an unnecessary overhead. 2. When parsing something you should have a kind of syntax error handling. Exceptions with error location information is IMO the best choice. 3. As others have mentioned, it is a good idea to abstract the source formats in order to be able to parse files, strings, streams etc. 4. Encoding issues is a related issue to the above. If you have that source abstraction layer, you can deal everything Unicode, transcoding things there and keeping the parser agnostic to encoding. 5. The state of the parser should be encapsulated in an object. Otherwise you won't be able to reenter the parser or to make a recursively descent one. 6. You should decide what drives the parser. In your case it is the caller. That is not a good idea in most cases, because the caller rarely knows what to expect next. A better choice is semantic call-backs from the parser to the caller. Abstract primitive operations is IMO the best implementation of such callbacks. 7. Usually parser is a middleman. It means that you should consider how to shape the intermediate results of parsing, e.g. the AST. Ada pools are very nice to keep that stuff in an arena. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de