From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: a07f3367d7,918406c619dd104d X-Google-Attributes: gida07f3367d7,public,usenet X-Google-NewGroupId: yes X-Google-Language: ENGLISH,ASCII-7-bit Path: g2news2.google.com!news3.google.com!feeder.news-service.com!xlned.com!feeder3.xlned.com!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!194.109.133.84.MISMATCH!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!news.stack.nl!ipv6.urchin.earth.li!twic From: Tom Anderson Newsgroups: comp.lang.java.programmer,comp.lang.ada Subject: Re: find words that contains some specific letters Date: Sun, 7 Jun 2009 00:58:32 +0100 Organization: Stack Usenet News Service Message-ID: References: <78i1lbF1m69gkU1@mid.individual.net> <78i2ajF1m4nglU1@mid.individual.net> <78i4i2F1magkfU1@mid.individual.net> <3f1d007f-bcae-42b4-afb0-215b18f51b9c@n21g2000vba.googlegroups.com> <78ib19F1mfh7qU1@mid.individual.net> <3b519936-3db9-4dad-85ba-371fa4b29c8f@z5g2000vba.googlegroups.com> <78quiaF1n95fsU1@mid.individual.net> <78rpbkF1mqrc7U1@mid.individual.net> NNTP-Posting-Host: ipv6.urchin.earth.li Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Trace: mud.stack.nl 1244332712 4422 2a01:348:1a9::1 (6 Jun 2009 23:58:32 GMT) X-Complaints-To: abuse@stack.nl NNTP-Posting-Date: Sat, 6 Jun 2009 23:58:32 +0000 (UTC) In-Reply-To: User-Agent: Alpine 1.10 (DEB 962 2008-03-14) Xref: g2news2.google.com comp.lang.java.programmer:42939 comp.lang.ada:6346 Date: 2009-06-07T00:58:32+01:00 List-Id: On Sat, 6 Jun 2009, Lew wrote: > John B. Matthews wrote: >> The jumble algorithm relies on a hashed map for efficiency, but a >> perfect hash is not essential. I recently implemented the algorithm >> in both Java and Ada using library routines: >> >> >> >> >> >> I was intrigued to compare the two languages' library implementations of a >> general purpose string hash. Using a multiplicative function with an >> initial value of zero, both offer excellent performance. Java uses a >> multiplier of 31 on a signed 32-bit integer; Ada uses a multiplier of eight >> on an unsigned (modular) value whose size is implementation defined (e.g. >> mod 2**32): >> >> >> John, i think you've missed a trick there - doesn't Rotate_Left do a barrel shift, where the bits that fall off the left-hand end get shoved back in at the right, rather than a plain shift? If so, the basic algebraic structure is actually a bit different to java's hash. > The Java String hash is just about verbatim from Knuth. It has the virtue of > being a well-studied algorithm that has been shown to have good distribution > characteristics. > > I'm not so sure about the Ada one. I haven't studied this in a while, > but I seem to recall that power-of-two multipliers mod 2**32 don't have > as good a distribution. Perhaps someone could correct me on that. If Rotate_Left is a plain shift, then the Ada hash is based only on the last 10 characters of the string. Multiplying by eight is like shifting left three places; modding by 2*32 is like anding with 0xffffffff (if you'll pardon my verbs). When you add in a character, the rightmost bit you can affect is bit 0, the LSB; after ten shifts of three bits, that is now at bit thirty, and after one more, it's at bit thirty-three, and has vanished. If Rotate_Left is a barrel shift, then our bit of interest doesn't vanish, because it comes back to the right-hand end of the hash after the eleventh shift in position 1. Because 3 is coprime with 32, it will then proceed to make a complete circuit of all positions in the hash over the following twenty-one shifts, winding up back in position 0 after the thirty-second. I have a vague worry that this means that strings built from the same 32-character blocks arranged in any order will have the same hash, but i suspect that this is not the case, because of carries. My maths isn't up to proving it, though. I still think this is a shoddy hash. It has no diffusion: if you hash two strings which differ at one character position, the difference between the hashes will always be confined to the sixteen bits which correspond to where that character ended up in the rotating hash. Java's multiply-and-mod approach quickly spreads the bits of each character out over the whole hash. tom -- Science of a sufficiently advanced form is indistinguishable from magic