comp.lang.ada
 help / color / mirror / Atom feed
From: Markus E Leypold <development-2006-8ecbb5cc8aREMOVETHIS@ANDTHATm-e-leypold.de>
Subject: Re: Reading and writing a big file in Ada (GNAT) on Windows XP
Date: Fri, 04 May 2007 02:28:46 +0200
Date: 2007-05-04T02:28:46+02:00	[thread overview]
Message-ID: <3eabwlo2ip.fsf@hod.lan.m-e-leypold.de> (raw)
In-Reply-To: f1dpg0$cgg$1@jacob-sparre.dk



"Randy Brukardt" <randy@rrsoftware.com> writes:

> "Adam Beneschan" <adam@irvine.com> wrote in message
> news:1178224048.034635.39010@l77g2000hsb.googlegroups.com...
> ...
>> It strikes me that Index is the kind of function that really ought to
>> be written in assembly language, at least partially.  I notice that
>> the version of Linux that I'm using has a built-in function to search
>> memory for a substring; this is very descriptively called memmem() and
>> has the amusing profile
>>
>>    void *memmem (const void *needle, size_t needlelen,
>>                             const void *haystack, size_t haystacklen);
>>
>> according to the man page.  But I assume this is written to use
>> registers optimally and take advantage of the REP instructions (on an
>> x86 or Pentium).  I don't know how GNAT implements Index---I haven't
>> looked into it.
>
> The big expense in Index is the mapping set or function, not the actual
> compare. For Janus/Ada, I had seen a similar problem (a big deal as Index
> was used to look for spam patterns), and finally special-cased a number of
> common cases (no mapping, single character patterns, and so on). I also
> spent a bit of time on the code generator, figuring that this sort of string
> manipulation code is common enough that it might as well be generated well.
> The updates helped a lot, although they don't quite generate a single
> instruction such as is possible. (OTOH, Intel used to recommend avoiding the
> block move and compare instructions because they fouled up the pipeline and
> thus slowed the overall execution. I don't know if that is still true, but
> ifi it is, there might be less benefit to hand-coded assembler than you are
> thinking...)


I'd like to add, that some years ago I read a rather well written
paper about this kind of text searches. Just repeating comparisons in
two loops (one to move the offset in haystack, the other to compare
need against haystack+offset) is actually not the best method. I.e. if
one fails when comparing this

     haystack: ... asjkajkaewudajksdkwueqeqweadasdqw3adkadkakd
     needle:                                 adasdqwsdfklsdf
                                                    ^
                                                   comparison fails

one can immediately increase the offset by 7, since the needle does
not contain a '3' in the part that is before the point where the
comparison failed. The technique I read about, used considerations of
this kind to compile the 'needle' into some automaton which could skip
more than 1 character at a time in haystack and then executed /
interpreted this automaton. (Actually the algorithm was a bit more
tricky than this, I think: If the automaton sees a character that is
not in needle at all, it can immediately skip for the length of needle
and I think part of the trick was to start comparing from the end of
needle).

So in this case a better algorithm is probably the way to go (and I
agree, asembler and REP, which would just be the naive algorithm with
2 loops, won't cut it, at least I'd not bet on it)

Regards -- Markus






  reply	other threads:[~2007-05-04  0:28 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-04-28  5:03 Reading and writing a big file in Ada (GNAT) on Windows XP Fionn Mac Cumhaill
2007-04-28  5:20 ` Gautier
2007-04-29 21:17   ` Fionn Mac Cumhaill
2007-04-28  5:25 ` tmoran
2007-04-28  6:56 ` Martin Krischik
2007-04-28 17:12   ` tmoran
2007-04-28 12:41 ` Jeffrey Creem
2007-04-29 21:35   ` Fionn Mac Cumhaill
2007-04-28 13:22 ` (see below)
2007-04-28 17:56 ` Simon Wright
2007-04-28 18:28   ` Jeffrey Creem
2007-04-29  7:20     ` Simon Wright
2007-04-29 21:44     ` Fionn Mac Cumhaill
2007-04-29 21:42   ` Fionn Mac Cumhaill
2007-04-30  0:48     ` Jeffrey R. Carter
2007-04-30  2:30       ` Fionn Mac Cumhaill
2007-04-30  4:21         ` tmoran
2007-04-28 19:12 ` Jeffrey R. Carter
2007-04-29 21:46   ` Fionn Mac Cumhaill
2007-05-01 14:10     ` Fionn Mac Cumhaill
2007-05-06 21:55     ` Quarc
2007-05-02  7:46 ` george
2007-05-03  6:31   ` Fionn Mac Cumhaill
2007-05-03 20:00     ` Simon Wright
2007-05-04  4:35       ` Jeffrey R. Carter
2007-05-04  4:45       ` Fionn Mac Cumhaill
2007-05-04  6:53       ` Alternative Index implementation? (Was: Reading and writing a big file in Ada (GNAT) on Windows XP) Jacob Sparre Andersen
2007-05-04  7:41         ` Dmitry A. Kazakov
2007-05-04  9:16           ` Copying string slices before calling subroutines? (Was: Alternative Index implementation?) Jacob Sparre Andersen
2007-05-04  9:44             ` Copying string slices before calling subroutines? Jacob Sparre Andersen
2007-05-04 10:14               ` Dmitry A. Kazakov
2007-05-04 12:07                 ` Jeffrey Creem
2007-05-04 12:46                   ` Dmitry A. Kazakov
2007-05-04 22:27                   ` Simon Wright
2007-05-05  7:33                     ` Jacob Sparre Andersen
2007-05-05  7:47                       ` Dmitry A. Kazakov
2007-05-05  7:41                     ` Dmitry A. Kazakov
2007-05-03 20:27     ` Reading and writing a big file in Ada (GNAT) on Windows XP Adam Beneschan
2007-05-03 23:01       ` Randy Brukardt
2007-05-04  0:28         ` Markus E Leypold [this message]
2007-05-05 16:26           ` Adam Beneschan
2007-05-05 17:27             ` Markus E Leypold
2007-05-15 23:03               ` Randy Brukardt
2007-05-04 20:04         ` Adam Beneschan
2007-05-05 16:36           ` tmoran
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox