From: "John B. Matthews" <nospam@nospam.invalid>
Subject: Re: find words that contains some specific letters
Date: Sun, 07 Jun 2009 15:30:31 -0400
Date: 2009-06-07T15:30:31-04:00 [thread overview]
Message-ID: <nospam-5B13F6.15303107062009@news.aioe.org> (raw)
In-Reply-To: alpine.DEB.1.10.0906070035110.5405@urchin.earth.li
In article <alpine.DEB.1.10.0906070035110.5405@urchin.earth.li>,
Tom Anderson <twic@urchin.earth.li> wrote:
> > John B. Matthews wrote:
> >> The jumble algorithm relies on a hashed map for efficiency, but a
> >> perfect hash is not essential. I recently implemented the
> >> algorithm in both Java and Ada using library routines:
> >>
> >> <http://en.wikipedia.org/wiki/Jumble>
> >> <http://sites.google.com/site/drjohnbmatthews/jumble>
> >> <http://home.roadrunner.com/~jbmatthews/jumble.html>
> >>
> >> I was intrigued to compare the two languages' library
> >> implementations of a general purpose string hash. Using a
> >> multiplicative function with an initial value of zero, both offer
> >> excellent performance. Java uses a multiplier of 31 on a signed
> >> 32-bit integer; Ada uses a multiplier of eight on an unsigned
> >> (modular) value whose size is implementation defined (e.g. mod
> >> 2**32):
> >>
> >> <http://www.docjar.org/html/api/java/lang/String.java.html>
> >> <http://gcc.gnu.org/viewcvs/trunk/gcc/ada/a-strhas.adb>
>
> John, i think you've missed a trick there - doesn't Rotate_Left do a
> barrel shift, where the bits that fall off the left-hand end get
> shoved back in at the right, rather than a plain shift? If so, the
> basic algebraic structure is actually a bit different to java's hash.
D'oh, you're absolutely right. I overlooked this when I was tinkering
with my own hash function, too. Thank you for looking at this.
[...]
> I still think this is a shoddy hash.
Well, it's a legacy from GNAT.HTable.Hash, but it does very well in the
context of Ada.Containers.Hashed_Maps. Java uses a bucket list whose
size is an integral power of two, while (this implementation of) Ada
uses the next largest prime. Modeling an Ada collision counter on your
example <http://urchin.earth.li/~twic/tmp/Collisions.java>, I got these
results for collision count, number of buckets having that count, and
the percentage of the total:
$ ./collisions
0: 218586 (55.6%)
1: 126250 (32.1%)
2: 38432 (9.8%)
3: 8362 (2.1%)
4: 1354 (0.3%)
5: 229 (0.1%)
6: 24 (0.0%)
7: 4 (0.0%)
8: 1 (0.0%)
The code is here: <http://home.roadrunner.com/~jbmatthews/jumble.html>
[Followup-To set to comp.lang.ada.]
--
John B. Matthews
trashgod at gmail dot com
<http://sites.google.com/site/drjohnbmatthews>
prev parent reply other threads:[~2009-06-07 19:30 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <edaeda6b-3793-48f2-8f8f-6c9701759de5@r33g2000yqn.googlegroups.com>
[not found] ` <f350aaae-d045-450d-b255-308eb7fb8f8a@e24g2000vbe.googlegroups.com>
[not found] ` <d121e6f6-6afa-43e7-bcd1-c9a770e12704@z9g2000yqi.googlegroups.com>
[not found] ` <nospam-83CE7B.07382201062009@news.aioe.org>
[not found] ` <78i1lbF1m69gkU1@mid.individual.net>
[not found] ` <78i2ajF1m4nglU1@mid.individual.net>
[not found] ` <h00l40$r5h$1@news.albasani.net>
[not found] ` <78i4i2F1magkfU1@mid.individual.net>
[not found] ` <3f1d007f-bcae-42b4-afb0-215b18f51b9c@n21g2000vba.googlegroups.com>
[not found] ` <78ib19F1mfh7qU1@mid.individual.net>
[not found] ` <3b519936-3db9-4dad-85ba-371fa4b29c8f@z5g2000vba.googlegroups.com>
[not found] ` <78quiaF1n95fsU1@mid.individual.net>
[not found] ` <h09q6f$41t$1@news.albasani.net>
[not found] ` <MJWdnUzVI6hADrXXnZ2dnUVZ_q2dnZ2d@earthlink.com>
[not found] ` <78rpbkF1mqrc7U1@mid.individual.net>
2009-06-06 16:13 ` find words that contains some specific letters John B. Matthews
2009-06-06 16:22 ` Patricia Shanahan
2009-06-06 20:42 ` Giovanni Azua
2009-06-06 23:43 ` Giovanni Azua
2009-06-06 17:11 ` Mark Space
2009-06-06 18:23 ` Lew
2009-06-06 23:58 ` Tom Anderson
2009-06-07 19:30 ` John B. Matthews [this message]
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox