From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00,
	REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: 103376,4f316de357ae35e9
X-Google-Attributes: gid103376,public
X-Google-ArrivalTime: 2002-08-02 04:23:25 PST
Path: 
 archiver1.google.com!news1.google.com!newsfeed.stanford.edu!skynet.be!skynet.be!fu-berlin.de!uni-berlin.de!dialin-145-254-047-227.arcor-ip.NET!not-for-mail
From: Dmitry A.Kazakov <mailbox@dmitry-kazakov.de>
Newsgroups: comp.lang.ada
Subject: Re: FAQ and string functions
Date: Sat, 3 Aug 2002 01:29:23 +0200
Message-ID: <aidq39$13rmja$1@ID-77047.news.dfncis.de>
References: <20020730093206.A8550@videoproject.kiev.ua>
 <20020731182308.K1083@videoproject.kiev.ua>
 <aib0a6$139lkn$1@ID-77047.news.dfncis.de>
 <20020801161052.M1080@videoproject.kiev.ua>
Reply-To: mailbox@dmitry-kazakov.de
NNTP-Posting-Host: dialin-145-254-047-227.arcor-ip.net (145.254.47.227)
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7Bit
X-Trace: fu-berlin.de 1028287402 37608042 145.254.47.227 (16 [77047])
User-Agent: KNode/0.4
Xref: archiver1.google.com comp.lang.ada:27607
Date: 2002-08-03T01:29:23+02:00
List-Id: <comp.lang.ada>

Oleg Goodyckov wrote:

> On Thu, Aug 01, 2002 at 11:57:04PM +0200, Dmitry A.Kazakov wrote:
>> > Ok! How about write-once-use-always? For text data analyze
>> > applications.
>> 
>> Then, maybe it is worth to consider more advanced parsing techniques than
>> split? There are numerous Ada implementations of pattern matching. There
>> are also Ada subprograms to recognize data types in a string stream. It
>> is relatively easy to parse and evaluate expressions with brackets and
>> prioritized operations in Ada (an implementation of the twin-stack
>> argorithm is quite short) and note no things like split involved.
> 
> May be. In dreams it is possible almost all things.

My implementation (for parsing unit expressions) is about 0.5K lines long. 
Is that much?

>> >> > While for splitting string like
>> >> > "x=2*3" people will must be to write program enstead
>> >> > split("=","x=2*3"), people will write in Perl, not Ada.
>> >> 
>> >> And what would you do in the case "x=/* An error, should be := */ 2*"
>> >> and "3" continues on the next line?
>> > 
>> > Nothing. I know: I have data as described. If no - data is corrupted
>> > and must be throwed out. It's simple.
>> 
>> To do so you should have an ability to recognize errors.
> 
> In couple of next steps of program error will be recognized and exception
> rised. No problem.

Usually at that point, there is nothing to say about the error and its 
location. Like an old Borland Pascal compiler, which promptly reported 
"Line X, error in expression" for almost any error.

>> > But what would you do in the case, when data is correct yet?
>> 
>> In the given example data are correct. /*...*/ was a comment containing a
>> symbol supposed to be a delimiter. My point was that for almost any
>> real-life text parsing application, split is useless.
> 
> Why then most of my tasks are much easier solvable by using split, not
> substring and similar? May be they are not from real life?

Well, one customer does not count. I also wish some (other) things to be 
changed / added in Ada, but as Robert Dewar usually correctly points, my 
wish is no more than mine. You should convince a lot more people than only 
yourself before your wish become a part of the standard. Distressing, but 
how could it be otherwise?

>> > You'll build PROGRAMMMM, instead write "split(/=/,"x=2*3")".
>> 
>> You still need a program to process the output of split. Is its output
>> the final outcome? I suppose it is not. So there should a loop to iterate
>> through the list returned by split. Where is then a difference between
>> split + loop, and loop with Get_Next_Word inside?
> 
> Difference is like difference between RANDOM and SEQUENTIAL acceses to
> data.

This is a good point. There is also a technical term for that. There are 
global and local methods of processing texts, images etc. Global methods 
(split is one) are working good for only small anount of data.

> In many cases it is not necessary to analyze all of string - enought to
> know count of tokens or several from them on several well known positions.

Well, pattern matching does the work. Others have pointed that. Note also, 
that as the complexity of syntax increases it becomes almost impossible at 
some point to write a correct pattern and prove that it is correct.

>> IMO, the difference is
>> that the second is faster and easier to understand.
> 
> Really? Have you seen that program (bcwords.ada)? And you'll assert that
> that Ada's program is easier to understand? :-))))))
> I have nothing to say...
> 
> Sorry me for long quotations. But look everybody and who will risk to say,
> below Ada's program is simpler and easier to understand than equal Perl's
> program (which is more than in 10 times smaller)? Don't think about how
> much time needs that program to be written and debugged. Think, how much
> time it needs to simply type it in text editor correctly. :-)))))

[ example snipped ]

First, the example is not realistic but illustrative. A real-life example 
would take into accout different spellings, typo errors, proper nouns, 
multi-word tokens etc. It would probably work with a data base, it would 
surely avoid unbounded strings (heap allocation) and so on and so far. I 
doubt that a Perl implementation of all that would be simplier or shorter 
than in Ada.

Second, the 80% of the example code is dealing with s/w components like 
containers etc. This has nothing to do with text processing. What is really 
dedicated to parsing is quite short and transparent.

You might argue that Ada should have standard components standard (:-)), it 
is questionable, but as you see (Ada Standard Component Library) there is a 
work going in the direction of having that components, though maybe not as 
a part of the standard.

-- 
Regards,
Dmitry Kazakov
www.dmitry-kazakov.de