comp.lang.ada
 help / color / mirror / Atom feed
From: "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de>
Subject: Re: Parallel Text Corpus Processing with Ada?
Date: Mon, 12 Nov 2007 17:17:31 +0100
Date: 2007-11-12T17:10:33+01:00	[thread overview]
Message-ID: <8s767qqrk0iw.x5fwu5eaj345$.dlg@40tude.net> (raw)
In-Reply-To: 1194821365.830120.106600@o3g2000hsb.googlegroups.com

On Sun, 11 Nov 2007 14:49:25 -0800, braver wrote:

> On Nov 11, 11:23 am, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
> wrote:
>> But see above. What kind of processing you have?
>>
>> 1. Do you run one complex pattern along a long text?
>> 2. Multiple patterns matching the same (long) text?
>> 3. Multiple patterns matching different texts?
> 
> I do large corpora research, finding all kinds of n-grams in millions
> of files.  I'm primarily interested in utilizing all 8 cores of my
> current Linux server to speed up things like grepping those files, so
> would be curious to see Ada 2005 code doing both
> 
> -- tasking
> -- dictionary counting of occurrences -- n-gram counting
> 
> Tasking is definitely more interesting as I see already from
> Ada.Containers I can use hash maps, the questions is how to split a
> corpus and unleash 8 tasks on it so they occupy their own cores.

I would concentrate on prevention of memory access collisions. Memory
access should certainly be the bottleneck. So choosing the algorithm of
recognition and counting I would move memory / computing trade-off towards
memory in order to get as many things as possible into the processor's
caches.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



  reply	other threads:[~2007-11-12 16:17 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-11-10 23:05 Parallel Text Corpus Processing with Ada? braver
2007-11-11  0:11 ` tmoran
2007-11-11  1:10 ` Georg Bauhaus
2007-11-11  8:23 ` Dmitry A. Kazakov
2007-11-11 15:54   ` Georg Bauhaus
2007-11-11 16:13     ` Georg Bauhaus
2007-11-12 13:31     ` Dmitry A. Kazakov
2007-11-12 15:07       ` Georg Bauhaus
2007-11-12 16:11         ` Dmitry A. Kazakov
2007-11-11 22:49   ` braver
2007-11-12 16:17     ` Dmitry A. Kazakov [this message]
2007-11-13 22:45     ` Simon Wright
2007-11-14 23:38       ` braver
2007-11-15  9:39         ` Ludovic Brenta
2007-11-15 11:12           ` Dmitry A. Kazakov
2007-11-15 21:11         ` Simon Wright
2007-11-17  1:05           ` Randy Brukardt
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox