From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: 103376,b4b864fa2b61bbba X-Google-Attributes: gid103376,public,usenet X-Google-Language: ENGLISH,ASCII-7-bit Path: g2news1.google.com!news3.google.com!feeder1-2.proxad.net!proxad.net!feeder2-2.proxad.net!newsfeed.arcor.de!newsspool2.arcor-online.net!news.arcor.de.POSTED!not-for-mail Newsgroups: comp.lang.ada Subject: Re: Parallel Text Corpus Processing with Ada? From: Georg Bauhaus In-Reply-To: <1t1ab1hzsng9p.101gcl2uomeoy.dlg@40tude.net> References: <1194735959.240323.38210@v2g2000hsf.googlegroups.com> <1t1ab1hzsng9p.101gcl2uomeoy.dlg@40tude.net> Content-Type: text/plain Content-Transfer-Encoding: 7bit Message-Id: <1194796479.6547.13.camel@K72> Mime-Version: 1.0 X-Mailer: Evolution 2.12.0 Date: Sun, 11 Nov 2007 16:54:40 +0100 Organization: Arcor NNTP-Posting-Date: 11 Nov 2007 16:54:44 CET NNTP-Posting-Host: 00e25a9e.newsspool4.arcor-online.net X-Trace: DXC=[2FcKhl_Ab7@1PCY\c7>ejV8?MMLMG4L4:0g=QLbWY7a>< X-Complaints-To: usenet-abuse@arcor.de Xref: g2news1.google.com comp.lang.ada:18279 Date: 2007-11-11T16:54:44+01:00 List-Id: On Sun, 2007-11-11 at 09:23 +0100, Dmitry A. Kazakov wrote: > Why necessarily RE? Or else why patterns? Patterns come at a high price. > They are sufficiently slower than tailored string processing algorithms. SPITBOL patterns serve "tailored string processing algorithms"; I don't quite understand how you could have more targetted algorithms for large corpora. What do you have in mind? For example, BreakX is hard to beat; I believe that your assessment that "patterns are more powerful, but slower in matching"[1] is really true. A few RE packages have a Boyer-Moore algorithm built in, and more. How can string processing be substantially faster? (I guess that the new Ada.Strings.Fixed.Index/5 subprograms are an improvement, maybe depending on how string slices would otherwise be passed to searches starting at some Position.) [1] http://www.dmitry-kazakov.de/match/match.htm