From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: a07f3367d7,918406c619dd104d X-Google-Attributes: gida07f3367d7,public,usenet X-Google-NewGroupId: yes X-Google-Language: ENGLISH,ASCII-7-bit Path: g2news2.google.com!news3.google.com!feeder.news-service.com!de-l.enfer-du-nord.net!gegeweb.org!aioe.org!nospam From: "John B. Matthews" Newsgroups: comp.lang.java.programmer,comp.lang.ada Subject: Re: find words that contains some specific letters Followup-To: comp.lang.ada Date: Sun, 07 Jun 2009 15:30:31 -0400 Organization: The Wasteland Message-ID: References: <78i1lbF1m69gkU1@mid.individual.net> <78i2ajF1m4nglU1@mid.individual.net> <78i4i2F1magkfU1@mid.individual.net> <3f1d007f-bcae-42b4-afb0-215b18f51b9c@n21g2000vba.googlegroups.com> <78ib19F1mfh7qU1@mid.individual.net> <3b519936-3db9-4dad-85ba-371fa4b29c8f@z5g2000vba.googlegroups.com> <78quiaF1n95fsU1@mid.individual.net> <78rpbkF1mqrc7U1@mid.individual.net> NNTP-Posting-Host: ib4TTflHUauJidfWP/+Rjw.user.aioe.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Complaints-To: abuse@aioe.org X-Notice: Filtered by postfilter v. 0.7.9 Cancel-Lock: sha1:JmqlyMYE9/LqqWVbZ86vMYpbLOw= User-Agent: MT-NewsWatcher/3.5.3b3 (Intel Mac OS X) Xref: g2news2.google.com comp.lang.java.programmer:42978 comp.lang.ada:6359 Date: 2009-06-07T15:30:31-04:00 List-Id: In article , Tom Anderson wrote: > > John B. Matthews wrote: > >> The jumble algorithm relies on a hashed map for efficiency, but a > >> perfect hash is not essential. I recently implemented the > >> algorithm in both Java and Ada using library routines: > >> > >> > >> > >> > >> > >> I was intrigued to compare the two languages' library > >> implementations of a general purpose string hash. Using a > >> multiplicative function with an initial value of zero, both offer > >> excellent performance. Java uses a multiplier of 31 on a signed > >> 32-bit integer; Ada uses a multiplier of eight on an unsigned > >> (modular) value whose size is implementation defined (e.g. mod > >> 2**32): > >> > >> > >> > > John, i think you've missed a trick there - doesn't Rotate_Left do a > barrel shift, where the bits that fall off the left-hand end get > shoved back in at the right, rather than a plain shift? If so, the > basic algebraic structure is actually a bit different to java's hash. D'oh, you're absolutely right. I overlooked this when I was tinkering with my own hash function, too. Thank you for looking at this. [...] > I still think this is a shoddy hash. Well, it's a legacy from GNAT.HTable.Hash, but it does very well in the context of Ada.Containers.Hashed_Maps. Java uses a bucket list whose size is an integral power of two, while (this implementation of) Ada uses the next largest prime. Modeling an Ada collision counter on your example , I got these results for collision count, number of buckets having that count, and the percentage of the total: $ ./collisions 0: 218586 (55.6%) 1: 126250 (32.1%) 2: 38432 (9.8%) 3: 8362 (2.1%) 4: 1354 (0.3%) 5: 229 (0.1%) 6: 24 (0.0%) 7: 4 (0.0%) 8: 1 (0.0%) The code is here: [Followup-To set to comp.lang.ada.] -- John B. Matthews trashgod at gmail dot com