From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00, REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4 X-Google-Thread: 103376,b4b864fa2b61bbba X-Google-Attributes: gid103376,public,usenet X-Google-Language: ENGLISH,ASCII-7-bit X-FeedAbuse: http://nntpfeed.proxad.net/abuse.pl feeded by 88.191.71.11 Path: g2news1.google.com!news4.google.com!feeder1-2.proxad.net!proxad.net!feeder2-2.proxad.net!nntpfeed.proxad.net!news.netfinity.fr!news.albasani.net!feeder06.uucp-net.de!news.uucp.at!uucp.gnuu.de!newsfeed.arcor.de!newsspool1.arcor-online.net!news.arcor.de.POSTED!not-for-mail From: "Dmitry A. Kazakov" Subject: Re: Parallel Text Corpus Processing with Ada? Newsgroups: comp.lang.ada User-Agent: 40tude_Dialog/2.0.15.1 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Reply-To: mailbox@dmitry-kazakov.de Organization: cbb software GmbH References: <1194735959.240323.38210@v2g2000hsf.googlegroups.com> <1t1ab1hzsng9p.101gcl2uomeoy.dlg@40tude.net> <1194796479.6547.13.camel@K72> Date: Mon, 12 Nov 2007 14:31:37 +0100 Message-ID: NNTP-Posting-Date: 12 Nov 2007 14:24:39 CET NNTP-Posting-Host: 6cb98e44.newsspool2.arcor-online.net X-Trace: DXC=CfaQDW0>aoj<6cDJZfMd_cA9EHlD;3Ycb4Fo<]lROoRa4nDHegD_]Re:mkgb`4VmfHTnYHa X-Complaints-To: usenet-abuse@arcor.de Xref: g2news1.google.com comp.lang.ada:18312 Date: 2007-11-12T14:24:39+01:00 List-Id: On Sun, 11 Nov 2007 16:54:40 +0100, Georg Bauhaus wrote: > On Sun, 2007-11-11 at 09:23 +0100, Dmitry A. Kazakov wrote: > >> Why necessarily RE? Or else why patterns? Patterns come at a high price. >> They are sufficiently slower than tailored string processing algorithms. > > SPITBOL patterns serve "tailored string processing algorithms"; > I don't quite understand how you could have more targetted algorithms > for large corpora. What do you have in mind? There is wide class of problems which patterns do not solve or solve inefficiently. Like building dictionaries finding longest common substring etc. (Compiling Ada programs is also in this class. (:-)) > A few RE packages have a Boyer-Moore algorithm built in, and more. > How can string processing be substantially faster? That depends on the problem. RE is not just about string search. Searching for a substring is itself a very specialized problem which IMO is rarely needed. In the other hand string processing often is more than pure matching (i.e. cursor moving + failure/success outcome). SNOBOL had immediate assignment to respond the problem. As a general note, pattern matching languages are declarative with all disadvantage of. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de