From: Markus E Leypold <development-2006-8ecbb5cc8aREMOVETHIS@ANDTHATm-e-leypold.de>
Subject: Re: Reading and writing a big file in Ada (GNAT) on Windows XP
Date: Fri, 04 May 2007 02:28:46 +0200
Date: 2007-05-04T02:28:46+02:00 [thread overview]
Message-ID: <3eabwlo2ip.fsf@hod.lan.m-e-leypold.de> (raw)
In-Reply-To: f1dpg0$cgg$1@jacob-sparre.dk
"Randy Brukardt" <randy@rrsoftware.com> writes:
> "Adam Beneschan" <adam@irvine.com> wrote in message
> news:1178224048.034635.39010@l77g2000hsb.googlegroups.com...
> ...
>> It strikes me that Index is the kind of function that really ought to
>> be written in assembly language, at least partially. I notice that
>> the version of Linux that I'm using has a built-in function to search
>> memory for a substring; this is very descriptively called memmem() and
>> has the amusing profile
>>
>> void *memmem (const void *needle, size_t needlelen,
>> const void *haystack, size_t haystacklen);
>>
>> according to the man page. But I assume this is written to use
>> registers optimally and take advantage of the REP instructions (on an
>> x86 or Pentium). I don't know how GNAT implements Index---I haven't
>> looked into it.
>
> The big expense in Index is the mapping set or function, not the actual
> compare. For Janus/Ada, I had seen a similar problem (a big deal as Index
> was used to look for spam patterns), and finally special-cased a number of
> common cases (no mapping, single character patterns, and so on). I also
> spent a bit of time on the code generator, figuring that this sort of string
> manipulation code is common enough that it might as well be generated well.
> The updates helped a lot, although they don't quite generate a single
> instruction such as is possible. (OTOH, Intel used to recommend avoiding the
> block move and compare instructions because they fouled up the pipeline and
> thus slowed the overall execution. I don't know if that is still true, but
> ifi it is, there might be less benefit to hand-coded assembler than you are
> thinking...)
I'd like to add, that some years ago I read a rather well written
paper about this kind of text searches. Just repeating comparisons in
two loops (one to move the offset in haystack, the other to compare
need against haystack+offset) is actually not the best method. I.e. if
one fails when comparing this
haystack: ... asjkajkaewudajksdkwueqeqweadasdqw3adkadkakd
needle: adasdqwsdfklsdf
^
comparison fails
one can immediately increase the offset by 7, since the needle does
not contain a '3' in the part that is before the point where the
comparison failed. The technique I read about, used considerations of
this kind to compile the 'needle' into some automaton which could skip
more than 1 character at a time in haystack and then executed /
interpreted this automaton. (Actually the algorithm was a bit more
tricky than this, I think: If the automaton sees a character that is
not in needle at all, it can immediately skip for the length of needle
and I think part of the trick was to start comparing from the end of
needle).
So in this case a better algorithm is probably the way to go (and I
agree, asembler and REP, which would just be the naive algorithm with
2 loops, won't cut it, at least I'd not bet on it)
Regards -- Markus
next prev parent reply other threads:[~2007-05-04 0:28 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-04-28 5:03 Reading and writing a big file in Ada (GNAT) on Windows XP Fionn Mac Cumhaill
2007-04-28 5:20 ` Gautier
2007-04-29 21:17 ` Fionn Mac Cumhaill
2007-04-28 5:25 ` tmoran
2007-04-28 6:56 ` Martin Krischik
2007-04-28 17:12 ` tmoran
2007-04-28 12:41 ` Jeffrey Creem
2007-04-29 21:35 ` Fionn Mac Cumhaill
2007-04-28 13:22 ` (see below)
2007-04-28 17:56 ` Simon Wright
2007-04-28 18:28 ` Jeffrey Creem
2007-04-29 7:20 ` Simon Wright
2007-04-29 21:44 ` Fionn Mac Cumhaill
2007-04-29 21:42 ` Fionn Mac Cumhaill
2007-04-30 0:48 ` Jeffrey R. Carter
2007-04-30 2:30 ` Fionn Mac Cumhaill
2007-04-30 4:21 ` tmoran
2007-04-28 19:12 ` Jeffrey R. Carter
2007-04-29 21:46 ` Fionn Mac Cumhaill
2007-05-01 14:10 ` Fionn Mac Cumhaill
2007-05-06 21:55 ` Quarc
2007-05-02 7:46 ` george
2007-05-03 6:31 ` Fionn Mac Cumhaill
2007-05-03 20:00 ` Simon Wright
2007-05-04 4:35 ` Jeffrey R. Carter
2007-05-04 4:45 ` Fionn Mac Cumhaill
2007-05-04 6:53 ` Alternative Index implementation? (Was: Reading and writing a big file in Ada (GNAT) on Windows XP) Jacob Sparre Andersen
2007-05-04 7:41 ` Dmitry A. Kazakov
2007-05-04 9:16 ` Copying string slices before calling subroutines? (Was: Alternative Index implementation?) Jacob Sparre Andersen
2007-05-04 9:44 ` Copying string slices before calling subroutines? Jacob Sparre Andersen
2007-05-04 10:14 ` Dmitry A. Kazakov
2007-05-04 12:07 ` Jeffrey Creem
2007-05-04 12:46 ` Dmitry A. Kazakov
2007-05-04 22:27 ` Simon Wright
2007-05-05 7:33 ` Jacob Sparre Andersen
2007-05-05 7:47 ` Dmitry A. Kazakov
2007-05-05 7:41 ` Dmitry A. Kazakov
2007-05-03 20:27 ` Reading and writing a big file in Ada (GNAT) on Windows XP Adam Beneschan
2007-05-03 23:01 ` Randy Brukardt
2007-05-04 0:28 ` Markus E Leypold [this message]
2007-05-05 16:26 ` Adam Beneschan
2007-05-05 17:27 ` Markus E Leypold
2007-05-15 23:03 ` Randy Brukardt
2007-05-04 20:04 ` Adam Beneschan
2007-05-05 16:36 ` tmoran
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox