From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,4f316de357ae35e9 X-Google-Attributes: gid103376,public X-Google-ArrivalTime: 2002-08-05 04:39:58 PST Path: archiver1.google.com!news1.google.com!newsfeed.stanford.edu!news.tele.dk!small.news.tele.dk!130.133.1.3!fu-berlin.de!uni-berlin.de!tar-alcarin.cbb-automation.DE!not-for-mail From: Dmitry A. Kazakov Newsgroups: comp.lang.ada Subject: Re: FAQ and string functions Date: Mon, 05 Aug 2002 13:50:38 +0200 Message-ID: References: <20020730093206.A8550@videoproject.kiev.ua> <20020731182308.K1083@videoproject.kiev.ua> <20020801161052.M1080@videoproject.kiev.ua> <20020802193535.N1101@videoproject.kiev.ua> NNTP-Posting-Host: tar-alcarin.cbb-automation.de (212.79.194.111) Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Trace: fu-berlin.de 1028547596 39644926 212.79.194.111 (16 [77047]) X-Newsreader: Forte Agent 1.8/32.548 Xref: archiver1.google.com comp.lang.ada:27692 Date: 2002-08-05T13:50:38+02:00 List-Id: On Fri, 2 Aug 2002 19:35:35 +0300, Oleg Goodyckov wrote: >On Sat, Aug 03, 2002 at 01:29:23AM +0200, Dmitry A.Kazakov wrote: >> >> My implementation (for parsing unit expressions) is about 0.5K lines long. >> Is that much? > >500 bytes? How big is the run-time library then? >It is not right (as for me) to process EVERY error in input data. As for >me it is more effectively to process only correct data (which are reliably >recognized) and any other simply to drop nuffig. Ah, that practice, which makes HTML a disaster because browsers silently ignore what they do not understand. The results are known. >> > Difference is like difference between RANDOM and SEQUENTIAL acceses to >> > data. >> >> This is a good point. There is also a technical term for that. There are >> global and local methods of processing texts, images etc. Global methods >> (split is one) are working good for only small anount of data. > >What here global and local methodes are for? For making conclusion "global >methods are working good almost never", so they are nuffig need not? The problem of all global methods is that the parameters they need cannot be optimal in a large context. Split is an example. It requires a separator and a notion of a token which may vary from point to point, making the approach useless. >Config files of applications - are they small amount of data? Yes. But it >exists in every application. And to parse it splitting of string to >several independent fields is much more effective and convinient way than >make some sequential syntactical analyzing. I remember a project with a config file of ~2MBytes big. (it was a Windows registry folder). I wonder how much time it would take to parse it using split technique. >> that as the complexity of syntax increases it becomes almost impossible at >> some point to write a correct pattern and prove that it is correct. > >Which nuffig "complexity of syntax"? Syntax is - no more simplest: fields >with separators (of one type) between of them. It is not a real syntax. >Take record, split it by separators and enjoy. Well, how long a record is allowed to be? >No! Give me a syntax... An argument in a call of a subroutine in C++. >> First, the example is not realistic but illustrative. A real-life example >> would take into accout different spellings, typo errors, proper nouns, >> multi-word tokens etc. It would probably work with a data base, it would >> surely avoid unbounded strings (heap allocation) and so on and so far. I >> doubt that a Perl implementation of all that would be simplier or shorter >> than in Ada. > >Really? Empty words. Try and show me. In skipped example I've seen one >attempt. Show me another - better. >Task solved in skipped example has name - building hystorgram of words >implementation. Why you name this task not realistic? Because histogram is also a global method (used for I suppose sort of clustering) which also has great limitations and is by no means an end product of the program. >> Second, the 80% of the example code is dealing with s/w components like >> containers etc. This has nothing to do with text processing. What is really >> dedicated to parsing is quite short and transparent. > >So, if that 80% of code throw out, then program will work? Or they are >necessary though? Not for text processing. I supposed that it does something more than only that. Generally, if you have a problem to solve you must first decompose it into subproblems. You should do it properly. Surely one could use eigenvalues and vectors to invert a matrix but this would be a *bad* idea. To decompose some text analysing problem into a bunch of split operations as also a *bad* idea. This is my point. >> You might argue that Ada should have standard components standard (:-)), it >> is questionable, but as you see (Ada Standard Component Library) there is a >> work going in the direction of having that components, though maybe not as >> a part of the standard. > >So, my words have sence? Why then you argue? Because I doubt that split should be a part of any standard library. As I said, I count it for useless. --- Regards, Dmitry Kazakov www.dmitry-kazakov.de