From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.3 required=5.0 tests=BAYES_00,INVALID_MSGID autolearn=no autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,50880f040eb869b4 X-Google-Attributes: gid103376,public From: mfb@mbunix.mitre.org (Michael F Brenner) Subject: Re: Anyone help develop an algorythm? Date: 1997/04/22 Message-ID: <5jidul$mru@top.mitre.org>#1/1 X-Deja-AN: 236593776 References: <5jddg7$uf0@newssvr01-int.news.prodigy.com> <335AF137.54D7@bix.com> Organization: The MITRE Corporation, Bedford Mass. Newsgroups: comp.lang.ada Summary: Use a finite state machine Date: 1997-04-22T00:00:00+00:00 List-Id: Several of the fastest ways to sort your array of strings include not sorting them at all. In a prior post I gave code for partially balanced tree searches which do a partial sort in less time than true sorts, balancing (pardon the pun) the requirements for small sort time versus the time for small lookup time. Three faster methods include an address finite state machine, an calculation sort, and a hash code sort. Hash codes are just imperfect address calculation sorts, and will not be discussed here. Address calculations are becoming more and more possible as machine addresses get larger, and virtual hardware with translation lookaside buffers become the rule rather than the exception on modern risc CPUs. However, I will assume that you have a tiny PC such as a Pentium like I have, with a pitiful few dozen megs of RAM to fit the arrays and the workspace. Therefore, we come to the FSA possibility. FSAs at first glance seem to take K**(L+1) words of memory, where K is the size of the alphabet used for the lookup strings, and L is the number of lookup strings. However, this matrix is extremely sparse. First, we notice that K can be reduces from 256 down to approximately 27 by collapsing all special characters and making everything lower case, if the words are in English. Second, we notice that most of the potential states are unused, so we only need to store the matrix for m states, where m is the sum of the lengths of the lookup strings. So we now have a reduced matrix of 27m words length. This can be implemented directly in code instead of in matrix form, using such constructs as CASE statements or GOTO statements. One last possibility to mention is the dream of perfect hash codes. If you happen to come across a fast-to-compute formula that happens to compute the lookup function of your list of words, then by all means use it. It is not always possible to derive such a magic formula for all possible inputs, however.