From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=0.7 required=5.0 tests=BAYES_00,INVALID_DATE, MSGID_SHORT,REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4 Path: utzoo!mnetor!uunet!lll-winken!lll-lcc!ames!pasteur!ucbvax!hplabs!sdcrdcf!trwrb!trwrc!agnew From: agnew@trwrc.UUCP (R.A. Agnew) Newsgroups: comp.lang.ada Subject: Re: Hashing Function wanted Message-ID: <239@trwrc.UUCP> Date: 29 Feb 88 21:29:05 GMT References: <8802231548.AA16212@ajpo.sei.cmu.edu> Reply-To: agnew@trwrc.UUCP (R.A. Agnew) Organization: TRW/MEAD San Diego, Ca. List-Id: In article <8802231548.AA16212@ajpo.sei.cmu.edu> "EDWARD CRAGG" writes: >Does anyone have either ideas or experience related to a hashing >function in Ada for converting a string of unspecified length >the type conversion using attributes (to keep totaly machine >independent) is eating me alive. Does anybody know what kind of > If using character'pos(s(i)) is generating slow code, you need a new compiler. This should only be overhead for the compiler but should generate very little code. I generally define my own enumeration type, say Identifier_Char and type convert with Identifier_Char'Pos(s(i)). I just benched a minimally perfect hash code for Ada reserved words using this technique on a Sun3/75 with the Telesoft Telegen2 compiler and got 133 microseconds worst case (case where tokens are always Ada reserved words). Just to check the above, I put 10000 character'pos conversions in a loop and timed them under Unix and got 0.0 seconds on all three times. In order to get a better comparison, I wrote the following two subprograms: procedure test_mod is i: positive; n: natural := 0; c: Character; begin for i in 1..10000 loop for c in 'A' .. 'Z' loop n := n mod 16381; end loop; end loop; end test_mod; procedure test_pos is i: positive; n: natural := 0; c: Character; begin for i in 1..10000 loop for c in 'A' .. 'Z' loop n := (n + Character'Pos(c)) mod 16381; end loop; end loop; end test_pos; The only difference between the two is the character to integer conversion and so the difference in execution times between them is the cost of ten thousand conversions. I did not use the optimizer. They both timed out between 1.9 and 2.0 seconds elapsed time on the Sun3/75 so I still claim that the compiler generates no code due to the type conversion. Perhaps your hash agolrithm is inherently slow. I have developed a very efficient hashcode algorithm for arbitrary length strings which is based on Fermat and Mersenne primes and uses only shifts and integer addition. And no, that's not the reason why the Ada compilers are so slow. The language that they are compiling is just a couple orders of magnitude more complex than most others. To appreciate this, visualize writing a compiler to optimize code generated by nested generics instantiated on a data type like a complex tensor!