comp.lang.ada
 help / color / mirror / Atom feed
From: Dmitry A. Kazakov <mailbox@dmitry-kazakov.de>
Subject: Re: FAQ and string functions
Date: Mon, 05 Aug 2002 13:50:38 +0200
Date: 2002-08-05T13:50:38+02:00	[thread overview]
Message-ID: <b0osku0tktsihgp0hoih183250hq3pjhq5@4ax.com> (raw)
In-Reply-To: 20020802193535.N1101@videoproject.kiev.ua

On Fri, 2 Aug 2002 19:35:35 +0300, Oleg Goodyckov
<og@videoproject.kiev.ua> wrote:

>On Sat, Aug 03, 2002 at 01:29:23AM +0200, Dmitry A.Kazakov wrote:
>> 
>> My implementation (for parsing unit expressions) is about 0.5K lines long. 
>> Is that much?
>
>500 bytes?

How big is the run-time library then?

>It is not right (as for me) to process EVERY error in input data. As for
>me it is more effectively to process only correct data (which are reliably
>recognized) and any other simply to drop nuffig.

Ah, that practice, which makes HTML a disaster because browsers
silently ignore what they do not understand. The results are known.

>> > Difference is like difference between RANDOM and SEQUENTIAL acceses to
>> > data.
>> 
>> This is a good point. There is also a technical term for that. There are 
>> global and local methods of processing texts, images etc. Global methods 
>> (split is one) are working good for only small anount of data.
>
>What here global and local methodes are for? For making conclusion "global
>methods are working good almost never", so they are nuffig need not?

The problem of all global methods is that the parameters they need
cannot be optimal in a  large context. Split is an example. It
requires a separator and a notion of a token which may vary from point
to point, making the approach useless.

>Config files of applications - are they small amount of data? Yes. But it
>exists in every application. And to parse it splitting of string to
>several independent fields is much more effective and convinient way than
>make some sequential syntactical analyzing.

I remember a project with a config file of ~2MBytes big. (it was a
Windows registry folder). I wonder how much time it would take to
parse it using split technique.

>> that as the complexity of syntax increases it becomes almost impossible at 
>> some point to write a correct pattern and prove that it is correct.
>
>Which nuffig "complexity of syntax"? Syntax is - no more simplest: fields
>with separators (of one type) between of them.

It is not a real syntax.

>Take record, split it by separators and enjoy.

Well, how long a record is allowed to be?

>No! Give me a syntax...

An argument in a call of a subroutine in C++.

>> First, the example is not realistic but illustrative. A real-life example 
>> would take into accout different spellings, typo errors, proper nouns, 
>> multi-word tokens etc. It would probably work with a data base, it would 
>> surely avoid unbounded strings (heap allocation) and so on and so far. I 
>> doubt that a Perl implementation of all that would be simplier or shorter 
>> than in Ada.
>
>Really? Empty words. Try and show me. In skipped example I've seen one
>attempt. Show me another - better.
>Task solved in skipped example has name - building hystorgram of words
>implementation. Why you name this task not realistic?

Because histogram is also a global method (used for I suppose sort of
clustering) which also has great limitations and is by no means an end
product of the program.

>> Second, the 80% of the example code is dealing with s/w components like 
>> containers etc. This has nothing to do with text processing. What is really 
>> dedicated to parsing is quite short and transparent.
>
>So, if that 80% of code throw out, then program will work? Or they are
>necessary though?

Not for text processing. I supposed that it does something more than
only that.

Generally, if you have a problem to solve you must first decompose it
into subproblems. You should do it properly. Surely one could use
eigenvalues and vectors to invert a matrix but this would be a *bad*
idea. To decompose some text analysing problem into a bunch of split
operations as also a *bad* idea. This is my point.

>> You might argue that Ada should have standard components standard (:-)), it 
>> is questionable, but as you see (Ada Standard Component Library) there is a 
>> work going in the direction of having that components, though maybe not as 
>> a part of the standard.
>
>So, my words have sence? Why then you argue?

Because I doubt that split should be a part of any standard library.
As I said, I count it for useless.

---
Regards,
Dmitry Kazakov
www.dmitry-kazakov.de



  reply	other threads:[~2002-08-05 11:50 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-07-30  6:32 FAQ and string functions Oleg Goodyckov
2002-07-30  8:52 ` Colin Paul Gloster
2002-07-30 13:48 ` Ted Dennison
2002-07-31  4:52   ` Brian May
2002-08-01 16:09     ` Ted Dennison
2002-08-02  0:21       ` Brian May
2002-08-02  1:56         ` tmoran
2002-08-02 13:59         ` Ted Dennison
2002-07-31  7:46   ` Oleg Goodyckov
2002-07-31  9:04     ` Lutz Donnerhacke
2002-07-31  9:39       ` Pascal Obry
2002-07-31 15:06         ` Oleg Goodyckov
2002-07-31 16:50       ` Oleg Goodyckov
2002-07-31 20:16     ` Simon Wright
2002-07-31 20:56       ` Robert A Duff
2002-08-01  0:11         ` Darren New
2002-08-01  1:08           ` tmoran
2002-08-01  9:25           ` Brian May
2002-08-01 11:20           ` Oleg Goodyckov
2002-08-01 15:43             ` Darren New
2002-08-01 21:37               ` Robert A Duff
2002-08-03  0:42                 ` Ted Dennison
2002-08-03 13:51                   ` Robert A Duff
2002-08-03 16:43                   ` Darren New
2002-08-05 13:37                   ` Stephen Leake
2002-08-02  8:01               ` Oleg Goodyckov
2002-08-02 16:09                 ` Darren New
2002-08-01 11:09         ` Oleg Goodyckov
2002-08-01 14:08           ` Frank J. Lhota
2002-08-01 15:06             ` Robert A Duff
2002-08-01 16:05             ` Oleg Goodyckov
2002-08-01 14:57         ` Georg Bauhaus
2002-07-31 22:04     ` Dmitry A.Kazakov
2002-07-31 15:23       ` Oleg Goodyckov
2002-08-01 21:57         ` Dmitry A.Kazakov
2002-08-01 13:10           ` Oleg Goodyckov
2002-08-02 23:29             ` Dmitry A.Kazakov
2002-08-02 16:35               ` Oleg Goodyckov
2002-08-05 11:50                 ` Dmitry A. Kazakov [this message]
2002-08-05 14:29                   ` Larry Kilgallen
2002-08-05 14:57                     ` Dmitry A. Kazakov
2002-08-05 15:12                   ` Oleg Goodyckov
2002-08-05 16:20                   ` Darren New
2002-08-05 17:01                     ` Georg Bauhaus
2002-08-05 17:48                       ` Darren New
2002-08-05 19:06                         ` tmoran
2002-08-05 20:08                           ` Darren New
     [not found]                     ` <slrnakv3q9.p2.lutz@taranis.iks-jena.de>
     [not found]                       ` <3D4FEFCB.3B74F5E5@san.rr.com>
2002-08-14  0:07                         ` Randy Brukardt
2002-08-01 14:29     ` Ted Dennison
2002-08-01 16:47       ` Oleg Goodyckov
2002-08-02 14:05         ` Ted Dennison
2002-08-02 16:11           ` Darren New
2002-08-03  0:30             ` Ted Dennison
2002-08-03  0:58               ` Darren New
2002-08-03  2:04                 ` Dale Stanbrough
2002-08-03  2:32                 ` Ted Dennison
2002-08-03  2:47                   ` Darren New
2002-08-03 12:41                     ` Ted Dennison
2002-08-03 16:53                       ` Darren New
2002-08-04  1:08                         ` Ted Dennison
2002-08-04 16:23                           ` Darren New
2002-08-05  2:16                             ` Robert Dewar
2002-08-05  3:45                               ` Darren New
2002-08-05  9:56                     ` Lutz Donnerhacke
2002-08-05 16:02                       ` Darren New
2002-08-14  0:42                         ` Randy Brukardt
2002-08-14  1:45                           ` Darren New
2002-08-14 19:37                             ` Randy Brukardt
2002-08-14 20:25                               ` Stephen Leake
2002-08-14 20:22                           ` Stephen Leake
2002-08-15 19:24                             ` Randy Brukardt
     [not found]                         ` <jb1vkustkugeutalhvrhv1n0k9hqn2fpip@4ax.com>
     [not found]                           ` <3D4FF351.8F4A6C0A@san.rr.com>
2002-08-14  1:03                             ` Randy Brukardt
2002-08-14  1:05                       ` Robert A Duff
     [not found]                       ` <3D4EA1AC.80D17170@s <wccofc6b66u.fsf@shell01.TheWorld.com>
2002-08-14 20:29                         ` Stephen Leake
2002-08-26 17:53                           ` Robert A Duff
2002-08-26 18:40                             ` Chad R. Meiners
2002-08-26 18:52                               ` Robert A Duff
2002-08-26 21:46                                 ` Chad R. Meiners
2002-08-05 13:29                     ` Stephen Leake
2002-08-03  5:07                   ` achrist
2002-08-03 12:52                     ` Ted Dennison
2002-08-05 15:34                       ` Ted Dennison
2002-08-05 13:24                 ` Stephen Leake
2002-08-05 16:02                   ` Darren New
2002-08-05  7:18           ` Oleg Goodyckov
2002-08-02  1:04     ` tmoran
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox