From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-0.9 required=5.0 tests=BAYES_00,FORGED_GMAIL_RCVD, FREEMAIL_FROM autolearn=no autolearn_force=no version=3.4.4 X-Google-Thread: 103376,b4b864fa2b61bbba X-Google-Attributes: gid103376,public,usenet X-Google-Language: ENGLISH,ASCII-7-bit Path: g2news1.google.com!postnews.google.com!o3g2000hsb.googlegroups.com!not-for-mail From: braver Newsgroups: comp.lang.ada Subject: Re: Parallel Text Corpus Processing with Ada? Date: Sun, 11 Nov 2007 14:49:25 -0800 Organization: http://groups.google.com Message-ID: <1194821365.830120.106600@o3g2000hsb.googlegroups.com> References: <1194735959.240323.38210@v2g2000hsf.googlegroups.com> <1t1ab1hzsng9p.101gcl2uomeoy.dlg@40tude.net> NNTP-Posting-Host: 85.30.231.120 Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" X-Trace: posting.google.com 1194821366 23494 127.0.0.1 (11 Nov 2007 22:49:26 GMT) X-Complaints-To: groups-abuse@google.com NNTP-Posting-Date: Sun, 11 Nov 2007 22:49:26 +0000 (UTC) In-Reply-To: <1t1ab1hzsng9p.101gcl2uomeoy.dlg@40tude.net> User-Agent: G2/1.0 X-HTTP-UserAgent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en) AppleWebKit/522.11.1 (KHTML, like Gecko) Version/3.0.3 Safari/522.12.1,gzip(gfe),gzip(gfe) Complaints-To: groups-abuse@google.com Injection-Info: o3g2000hsb.googlegroups.com; posting-host=85.30.231.120; posting-account=ps2QrAMAAAA6_jCuRt2JEIpn5Otqf_w0 Xref: g2news1.google.com comp.lang.ada:18289 Date: 2007-11-11T14:49:25-08:00 List-Id: On Nov 11, 11:23 am, "Dmitry A. Kazakov" wrote: > But see above. What kind of processing you have? > > 1. Do you run one complex pattern along a long text? > 2. Multiple patterns matching the same (long) text? > 3. Multiple patterns matching different texts? I do large corpora research, finding all kinds of n-grams in millions of files. I'm primarily interested in utilizing all 8 cores of my current Linux server to speed up things like grepping those files, so would be curious to see Ada 2005 code doing both -- tasking -- dictionary counting of occurrences -- n-gram counting Tasking is definitely more interesting as I see already from Ada.Containers I can use hash maps, the questions is how to split a corpus and unleash 8 tasks on it so they occupy their own cores. Cheers, Alexy