From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-0.9 required=5.0 tests=BAYES_00,FORGED_GMAIL_RCVD, FREEMAIL_FROM autolearn=no autolearn_force=no version=3.4.4 X-Google-Thread: 103376,b4b864fa2b61bbba,start X-Google-Attributes: gid103376,public,usenet X-Google-Language: ENGLISH,ASCII-7-bit Path: g2news1.google.com!postnews.google.com!v2g2000hsf.googlegroups.com!not-for-mail From: braver Newsgroups: comp.lang.ada Subject: Parallel Text Corpus Processing with Ada? Date: Sat, 10 Nov 2007 15:05:59 -0800 Organization: http://groups.google.com Message-ID: <1194735959.240323.38210@v2g2000hsf.googlegroups.com> NNTP-Posting-Host: 85.30.231.120 Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" X-Trace: posting.google.com 1194735959 8817 127.0.0.1 (10 Nov 2007 23:05:59 GMT) X-Complaints-To: groups-abuse@google.com NNTP-Posting-Date: Sat, 10 Nov 2007 23:05:59 +0000 (UTC) User-Agent: G2/1.0 X-HTTP-UserAgent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en) AppleWebKit/522.11.1 (KHTML, like Gecko) Version/3.0.3 Safari/522.12.1,gzip(gfe),gzip(gfe) Complaints-To: groups-abuse@google.com Injection-Info: v2g2000hsf.googlegroups.com; posting-host=85.30.231.120; posting-account=ps2QrAMAAAA6_jCuRt2JEIpn5Otqf_w0 Xref: g2news1.google.com comp.lang.ada:18258 Date: 2007-11-10T15:05:59-08:00 List-Id: Greetings -- I'm working with large text corpora, and am wondering what tools are there for implementing parallel apps working with corpora. E.g., one could imagine a parallel grep. This is for a single Linux box with multiple CPUs and shared memory -- an ideal setup for Ada concurrency. What tools do we have to use things like Python and Ruby, also widely used for text processing, and what's the state of regexps? Cheers, Alexy