comp.lang.ada
 help / color / mirror / Atom feed
From: mfb@mbunix.mitre.org (Michael F Brenner)
Subject: Re: Anyone help develop an algorythm?
Date: 1997/04/22
Date: 1997-04-22T00:00:00+00:00	[thread overview]
Message-ID: <5jidul$mru@top.mitre.org> (raw)
In-Reply-To: 335AF137.54D7@bix.com


Several of the fastest ways to sort your array of strings include not
sorting them at all. In a prior post I gave code for partially balanced
tree searches which do a partial sort in less time than true sorts, 
balancing (pardon the pun) the requirements for small sort time versus
the time for small lookup time. Three faster methods include an address
finite state machine, an calculation sort, and a hash code sort. Hash
codes are just imperfect address calculation sorts, and will not be
discussed here. Address calculations are becoming more and more possible
as machine addresses get larger, and virtual hardware with translation
lookaside buffers become the rule rather than the exception on modern
risc CPUs. However, I will assume that you have a tiny PC such as
a Pentium like I have, with a pitiful few dozen megs of RAM to fit
the arrays and the workspace. Therefore, we come to the FSA possibility.
FSAs at first glance seem to take K**(L+1) words of memory, where
K is the size of the alphabet used for the lookup strings, and L is
the number of lookup strings. However, this matrix is extremely
sparse. First, we notice that K can be reduces from 256 down to 
approximately 27 by collapsing all special characters and making
everything lower case, if the words are in English. Second, we notice
that most of the potential states are unused, so we only need to store
the matrix for m states, where m is the sum of the lengths of the 
lookup strings. So we now have a reduced matrix of 27m words length.
This can be implemented directly in code instead of in matrix form, using
such constructs as CASE statements or GOTO statements. One last 
possibility to mention is the dream of perfect hash codes. If you
happen to come across a fast-to-compute formula that happens to
compute the lookup function of your list of words, then by all means
use it. It is not always possible to derive such a magic formula
for all possible inputs, however. 





  reply	other threads:[~1997-04-22  0:00 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
1997-04-20  0:00 Anyone help develop an algorythm? Matthew Givens
1997-04-20  0:00 ` Tucker Taft
1997-04-20  0:00 ` Tom Moran
1997-04-22  0:00   ` Michael F Brenner [this message]
1997-04-20  0:00 ` Robert A Duff
1997-04-22  0:00   ` Steve Doiel
1997-04-20  0:00 ` Robert Dewar
1997-04-23  0:00   ` Matthew Givens
1997-04-23  0:00     ` Robert Dewar
1997-04-20  0:00 ` Joel VanLaven
1997-04-22  0:00   ` Robert Dewar
     [not found] ` <e8yijs.fg1.0.-s@inmet.camb.inmet.com>
1997-04-23  0:00   ` Matthew Givens
1997-04-23  0:00     ` Tucker Taft
1997-04-23  0:00     ` Robert Dewar
1997-04-28  0:00       ` Matthew Givens
     [not found] ` <335af137.54d7@bix.com>
1997-04-28  0:00   ` Matthew Givens
1997-04-28  0:00     ` Robert Dewar
1997-04-29  0:00       ` Matthew Givens
  -- strict thread matches above, loose matches on Subject: below --
1997-04-24  0:00 Marin David Condic, 561.796.8997, M/S 731-93
1997-04-25  0:00 ` Michael F Brenner
1997-04-28  0:00   ` Matthew Givens
1997-04-28  0:00 ` Matthew Givens
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox