comp.lang.ada
 help / color / mirror / Atom feed
From: "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de>
Subject: Re: Parallel Text Corpus Processing with Ada?
Date: Sun, 11 Nov 2007 09:23:34 +0100
Date: 2007-11-11T09:23:39+01:00	[thread overview]
Message-ID: <1t1ab1hzsng9p.101gcl2uomeoy.dlg@40tude.net> (raw)
In-Reply-To: 1194735959.240323.38210@v2g2000hsf.googlegroups.com

On Sat, 10 Nov 2007 15:05:59 -0800, braver wrote:

> Greetings -- I'm working with large text corpora, and am wondering
> what tools are there for implementing parallel apps working with
> corpora.  E.g., one could imagine a parallel grep.  This is for a
> single Linux box with multiple CPUs and shared memory -- an ideal
> setup for Ada concurrency.  What tools do we have to use things like
> Python and Ruby, also widely used for text processing, and what's the
> state of regexps?

Why necessarily RE? Or else why patterns? Patterns come at a high price.
They are sufficiently slower than tailored string processing algorithms.
More power you get, slower it works. Especially for parallel processing I
would consider a specialized implementation first.

As for patterns, GNAT has both RE and SNOBOL ones. SNOBOL patterns were
mentioned by Georg. REs are in the package GNAT.Regexp. I have Ada bindings
to different SNOBOL-like patterns
http://www.dmitry-kazakov.de/match/match.htm.

But see above. What kind of processing you have?

1. Do you run one complex pattern along a long text?
2. Multiple patterns matching the same (long) text?
3. Multiple patterns matching different texts?

I.e. what is concurrent and how well you can split it into different tasks.
For example matching alternatives concurrently like in the pattern
"green"|"red"|"blue" would likely be slower than doing it sequential, too
much overhead, impossible to implement heuristics etc.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



  parent reply	other threads:[~2007-11-11  8:23 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-11-10 23:05 Parallel Text Corpus Processing with Ada? braver
2007-11-11  0:11 ` tmoran
2007-11-11  1:10 ` Georg Bauhaus
2007-11-11  8:23 ` Dmitry A. Kazakov [this message]
2007-11-11 15:54   ` Georg Bauhaus
2007-11-11 16:13     ` Georg Bauhaus
2007-11-12 13:31     ` Dmitry A. Kazakov
2007-11-12 15:07       ` Georg Bauhaus
2007-11-12 16:11         ` Dmitry A. Kazakov
2007-11-11 22:49   ` braver
2007-11-12 16:17     ` Dmitry A. Kazakov
2007-11-13 22:45     ` Simon Wright
2007-11-14 23:38       ` braver
2007-11-15  9:39         ` Ludovic Brenta
2007-11-15 11:12           ` Dmitry A. Kazakov
2007-11-15 21:11         ` Simon Wright
2007-11-17  1:05           ` Randy Brukardt
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox