From: Dmitry A. Kazakov <mailbox@dmitry-kazakov.de>
Subject: Re: FAQ and string functions
Date: Mon, 05 Aug 2002 13:50:38 +0200
Date: 2002-08-05T13:50:38+02:00 [thread overview]
Message-ID: <b0osku0tktsihgp0hoih183250hq3pjhq5@4ax.com> (raw)
In-Reply-To: 20020802193535.N1101@videoproject.kiev.ua
On Fri, 2 Aug 2002 19:35:35 +0300, Oleg Goodyckov
<og@videoproject.kiev.ua> wrote:
>On Sat, Aug 03, 2002 at 01:29:23AM +0200, Dmitry A.Kazakov wrote:
>>
>> My implementation (for parsing unit expressions) is about 0.5K lines long.
>> Is that much?
>
>500 bytes?
How big is the run-time library then?
>It is not right (as for me) to process EVERY error in input data. As for
>me it is more effectively to process only correct data (which are reliably
>recognized) and any other simply to drop nuffig.
Ah, that practice, which makes HTML a disaster because browsers
silently ignore what they do not understand. The results are known.
>> > Difference is like difference between RANDOM and SEQUENTIAL acceses to
>> > data.
>>
>> This is a good point. There is also a technical term for that. There are
>> global and local methods of processing texts, images etc. Global methods
>> (split is one) are working good for only small anount of data.
>
>What here global and local methodes are for? For making conclusion "global
>methods are working good almost never", so they are nuffig need not?
The problem of all global methods is that the parameters they need
cannot be optimal in a large context. Split is an example. It
requires a separator and a notion of a token which may vary from point
to point, making the approach useless.
>Config files of applications - are they small amount of data? Yes. But it
>exists in every application. And to parse it splitting of string to
>several independent fields is much more effective and convinient way than
>make some sequential syntactical analyzing.
I remember a project with a config file of ~2MBytes big. (it was a
Windows registry folder). I wonder how much time it would take to
parse it using split technique.
>> that as the complexity of syntax increases it becomes almost impossible at
>> some point to write a correct pattern and prove that it is correct.
>
>Which nuffig "complexity of syntax"? Syntax is - no more simplest: fields
>with separators (of one type) between of them.
It is not a real syntax.
>Take record, split it by separators and enjoy.
Well, how long a record is allowed to be?
>No! Give me a syntax...
An argument in a call of a subroutine in C++.
>> First, the example is not realistic but illustrative. A real-life example
>> would take into accout different spellings, typo errors, proper nouns,
>> multi-word tokens etc. It would probably work with a data base, it would
>> surely avoid unbounded strings (heap allocation) and so on and so far. I
>> doubt that a Perl implementation of all that would be simplier or shorter
>> than in Ada.
>
>Really? Empty words. Try and show me. In skipped example I've seen one
>attempt. Show me another - better.
>Task solved in skipped example has name - building hystorgram of words
>implementation. Why you name this task not realistic?
Because histogram is also a global method (used for I suppose sort of
clustering) which also has great limitations and is by no means an end
product of the program.
>> Second, the 80% of the example code is dealing with s/w components like
>> containers etc. This has nothing to do with text processing. What is really
>> dedicated to parsing is quite short and transparent.
>
>So, if that 80% of code throw out, then program will work? Or they are
>necessary though?
Not for text processing. I supposed that it does something more than
only that.
Generally, if you have a problem to solve you must first decompose it
into subproblems. You should do it properly. Surely one could use
eigenvalues and vectors to invert a matrix but this would be a *bad*
idea. To decompose some text analysing problem into a bunch of split
operations as also a *bad* idea. This is my point.
>> You might argue that Ada should have standard components standard (:-)), it
>> is questionable, but as you see (Ada Standard Component Library) there is a
>> work going in the direction of having that components, though maybe not as
>> a part of the standard.
>
>So, my words have sence? Why then you argue?
Because I doubt that split should be a part of any standard library.
As I said, I count it for useless.
---
Regards,
Dmitry Kazakov
www.dmitry-kazakov.de
next prev parent reply other threads:[~2002-08-05 11:50 UTC|newest]
Thread overview: 86+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-07-30 6:32 FAQ and string functions Oleg Goodyckov
2002-07-30 8:52 ` Colin Paul Gloster
2002-07-30 13:48 ` Ted Dennison
2002-07-31 4:52 ` Brian May
2002-08-01 16:09 ` Ted Dennison
2002-08-02 0:21 ` Brian May
2002-08-02 1:56 ` tmoran
2002-08-02 13:59 ` Ted Dennison
2002-07-31 7:46 ` Oleg Goodyckov
2002-07-31 9:04 ` Lutz Donnerhacke
2002-07-31 9:39 ` Pascal Obry
2002-07-31 15:06 ` Oleg Goodyckov
2002-07-31 16:50 ` Oleg Goodyckov
2002-07-31 20:16 ` Simon Wright
2002-07-31 20:56 ` Robert A Duff
2002-08-01 0:11 ` Darren New
2002-08-01 1:08 ` tmoran
2002-08-01 9:25 ` Brian May
2002-08-01 11:20 ` Oleg Goodyckov
2002-08-01 15:43 ` Darren New
2002-08-01 21:37 ` Robert A Duff
2002-08-03 0:42 ` Ted Dennison
2002-08-03 13:51 ` Robert A Duff
2002-08-03 16:43 ` Darren New
2002-08-05 13:37 ` Stephen Leake
2002-08-02 8:01 ` Oleg Goodyckov
2002-08-02 16:09 ` Darren New
2002-08-01 11:09 ` Oleg Goodyckov
2002-08-01 14:08 ` Frank J. Lhota
2002-08-01 15:06 ` Robert A Duff
2002-08-01 16:05 ` Oleg Goodyckov
2002-08-01 14:57 ` Georg Bauhaus
2002-07-31 22:04 ` Dmitry A.Kazakov
2002-07-31 15:23 ` Oleg Goodyckov
2002-08-01 21:57 ` Dmitry A.Kazakov
2002-08-01 13:10 ` Oleg Goodyckov
2002-08-02 23:29 ` Dmitry A.Kazakov
2002-08-02 16:35 ` Oleg Goodyckov
2002-08-05 11:50 ` Dmitry A. Kazakov [this message]
2002-08-05 14:29 ` Larry Kilgallen
2002-08-05 14:57 ` Dmitry A. Kazakov
2002-08-05 15:12 ` Oleg Goodyckov
2002-08-05 16:20 ` Darren New
2002-08-05 17:01 ` Georg Bauhaus
2002-08-05 17:48 ` Darren New
2002-08-05 19:06 ` tmoran
2002-08-05 20:08 ` Darren New
[not found] ` <slrnakv3q9.p2.lutz@taranis.iks-jena.de>
[not found] ` <3D4FEFCB.3B74F5E5@san.rr.com>
2002-08-14 0:07 ` Randy Brukardt
2002-08-01 14:29 ` Ted Dennison
2002-08-01 16:47 ` Oleg Goodyckov
2002-08-02 14:05 ` Ted Dennison
2002-08-02 16:11 ` Darren New
2002-08-03 0:30 ` Ted Dennison
2002-08-03 0:58 ` Darren New
2002-08-03 2:04 ` Dale Stanbrough
2002-08-03 2:32 ` Ted Dennison
2002-08-03 2:47 ` Darren New
2002-08-03 12:41 ` Ted Dennison
2002-08-03 16:53 ` Darren New
2002-08-04 1:08 ` Ted Dennison
2002-08-04 16:23 ` Darren New
2002-08-05 2:16 ` Robert Dewar
2002-08-05 3:45 ` Darren New
2002-08-05 9:56 ` Lutz Donnerhacke
2002-08-05 16:02 ` Darren New
2002-08-14 0:42 ` Randy Brukardt
2002-08-14 1:45 ` Darren New
2002-08-14 19:37 ` Randy Brukardt
2002-08-14 20:25 ` Stephen Leake
2002-08-14 20:22 ` Stephen Leake
2002-08-15 19:24 ` Randy Brukardt
[not found] ` <jb1vkustkugeutalhvrhv1n0k9hqn2fpip@4ax.com>
[not found] ` <3D4FF351.8F4A6C0A@san.rr.com>
2002-08-14 1:03 ` Randy Brukardt
2002-08-14 1:05 ` Robert A Duff
[not found] ` <3D4EA1AC.80D17170@s <wccofc6b66u.fsf@shell01.TheWorld.com>
2002-08-14 20:29 ` Stephen Leake
2002-08-26 17:53 ` Robert A Duff
2002-08-26 18:40 ` Chad R. Meiners
2002-08-26 18:52 ` Robert A Duff
2002-08-26 21:46 ` Chad R. Meiners
2002-08-05 13:29 ` Stephen Leake
2002-08-03 5:07 ` achrist
2002-08-03 12:52 ` Ted Dennison
2002-08-05 15:34 ` Ted Dennison
2002-08-05 13:24 ` Stephen Leake
2002-08-05 16:02 ` Darren New
2002-08-05 7:18 ` Oleg Goodyckov
2002-08-02 1:04 ` tmoran
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox