hashing

comp.lang.ada
 help / color / mirror / Atom feed

* hashing
@ 2002-04-10 21:31 chris.danx
  2002-04-10 21:24 ` hashing Steven Deller
                   ` (7 more replies)
  0 siblings, 8 replies; 19+ messages in thread
From: chris.danx @ 2002-04-10 21:31 UTC (permalink / raw)


Hi,

Does anyone have a good hash function for strings (length is arbitary,
number of buckets is determined when package is instantiated and collisions
are handled with chaining)?  I'm new to writing hash tables and functions
(used them in libs, but this has to be all my own work!), and have chosen
this technique to implement in an assignment -- had enough linked lists, and
want to expand my horizons.

I'm searching for a function just now, so this isn't laziness just wanted to
utilise another resource (your brains) in case nothing turns up.  After the
assignment submission in a few weeks, I'll tidy up the sources for all the
useful adts I've written this term and GMGPL them (lists (singly, doubly),
hash tables, and whatever else comes up) if ppl express an interest (all
adts are generic so far).



Thanks,
Chris

p.s. does anyone have any uses for circular lists?  Tom Swans Tp5.5 book
mentions them, what are they used for?  Just curious...


p.p.s. should performance/complexity information be included?  I seem to
recall a conversation on this a while back, but can't remember if ppl
thought it was a good idea.





^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: hashing
  2002-04-10 21:31 hashing chris.danx
@ 2002-04-10 21:24 ` Steven Deller
  2002-04-10 22:41   ` hashing Nick Williams
                     ` (2 more replies)
  2002-04-10 21:56 ` hashing Chad R. Meiners
                   ` (6 subsequent siblings)
  7 siblings, 3 replies; 19+ messages in thread
From: Steven Deller @ 2002-04-10 21:24 UTC (permalink / raw)


> -----Original Message-----
> [mailto:comp.lang.ada-admin@ada.eu.org] On Behalf Of chris.danx
> Sent: Wednesday, April 10, 2002 5:31 PM
> To: comp.lang.ada@ada.eu.org
> Subject: hashing
> Hi,
> 
> Does anyone have a good hash function for strings (length is 
> arbitary, number of buckets is determined when package is 

I'd check out the PERL 5 sources.  The hashing function there seems to
be excellent for numerous cases I have looked into.  Not sure, but the
internal documentation might just explain the algorithm without having
to reverse engineer C code.

Other than that, I'd try one of the numerous ISO-defined hash functions,
such as CRC-32, etc.

I did a web search on "hash function algorithms" and came up with
NUMEROUS interesting sites.  

>...> 
> p.s. does anyone have any uses for circular lists?  Tom Swans 
> Tp5.5 book mentions them, what are they used for?  Just curious...

I have used circular lists in numerous buffering situations where a
simple double-buffer is insufficient.  Typically the situation is one
where the list is created once and the "head" and "tail" positions move
as data is put in and taken out. Using a circular list allows different
sizes to be put in and taken out.   I've also used a 10ary-buffer system
(the buffers themselves formed a circular list) to meet some system
transfer and error-correction requirements.  Some transmission systems
use a (short) counter of items sent and look for return ACK's for the
items; that algorithm is easy to write using a circular list.

> p.p.s. should performance/complexity information be included? 
>  I seem to recall a conversation on this a while back, but 
> can't remember if ppl thought it was a good idea.

Complexity information always raises its head at some point.  Best to
include it.

Regards,
Steve




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: hashing
  2002-04-10 21:24 ` hashing Steven Deller
@ 2002-04-10 22:41   ` Nick Williams
  2002-04-10 23:36   ` hashing chris.danx
  2002-04-11 23:18   ` hashing(HATs. A more efficient algorithm?) Wannabe h4x0r
  2 siblings, 0 replies; 19+ messages in thread
From: Nick Williams @ 2002-04-10 22:41 UTC (permalink / raw)


Could I just register my objection to the use of CRC-32 as a hashing
algorithm?

Cyclic redundancy checks are not designed to distribute through their range,
but rather to detect multiple bit errors. They are much less than ideal when
hashing potentially similar strings.

Cheers,

Nick.

"Steven Deller" <deller@smsail.com> wrote in message
news:mailman.1018477621.31315.comp.lang.ada@ada.eu.org...
> Other than that, I'd try one of the numerous ISO-defined hash functions,
> such as CRC-32, etc.







^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: hashing
  2002-04-10 21:24 ` hashing Steven Deller
  2002-04-10 22:41   ` hashing Nick Williams
@ 2002-04-10 23:36   ` chris.danx
  2002-04-11 14:18     ` hashing joevl
  2002-04-15 15:32     ` hashing Ken Burtch
  2002-04-11 23:18   ` hashing(HATs. A more efficient algorithm?) Wannabe h4x0r
  2 siblings, 2 replies; 19+ messages in thread
From: chris.danx @ 2002-04-10 23:36 UTC (permalink / raw)



"Steven Deller" <deller@smsail.com> wrote in message
news:mailman.1018477621.31315.comp.lang.ada@ada.eu.org...
> > Subject: hashing
> > Hi,
> >
> > Does anyone have a good hash function for strings (length is
> > arbitary, number of buckets is determined when package is

I'll maybe look into that (if it doesn't involve reverse engineering C).  I
found one that may suffice for the purposes of the exercise, it's nothing
fancy but it should work.

Sum all ascii vals of chars in string, then take modulo bucket size.



> Other than that, I'd try one of the numerous ISO-defined hash functions,
> such as CRC-32, etc.

ISO define hash functions?


> I did a web search on "hash function algorithms" and came up with
> NUMEROUS interesting sites.

oops wrong search string, "hash function strings".  Got loads of stuff in
code, but code is not as useful.  Algorithm/function descriptions
(mathematics included) are better than code IMO, since they express
algorithms functions in a programming language independant manner.  When we
were learning about bstrees they gave us Ada code which made no sense at the
time due to poor descriptions of the algorithms and considerations involved.
However when we learned how to do bstrees in Haskell, just before easter
break it made perfect sense since the lecturer took the time to describe the
algorithms.  The code was secondary to the algorithm.

[snip]

> > p.p.s. should performance/complexity information be included?
> >  I seem to recall a conversation on this a while back, but
> > can't remember if ppl thought it was a good idea.
>
> Complexity information always raises its head at some point.  Best to
> include it.

it does makes sense to include it and it'll definitely go in the gmgpl
version...  But does complexity information come under the header of an
implementation detail?  i don't think it does but if it does and I include
it in the spec, my tutor might disagree and I'll get penalised.  This is one
of my grumps about uni, not knowing what they want with things.  In first
year, you'd get penalised for *not* putting pointless comments in.

Ignore that, I'm probably worrying about nothing!!!


Thanks,
Chris





^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: hashing
  2002-04-10 23:36   ` hashing chris.danx
@ 2002-04-11 14:18     ` joevl
  2002-04-15 15:32     ` hashing Ken Burtch
  1 sibling, 0 replies; 19+ messages in thread
From: joevl @ 2002-04-11 14:18 UTC (permalink / raw)


>> "Steven Deller" <deller@smsail.com> wrote in message 
>> Other than that, I'd try one of the numerous ISO-defined hash
>> functions, such as CRC-32, etc.

"chris.danx" replied:
> ISO define hash functions?

I think Steve is referring to the CRC-32 (and also CRC-16)
algorithms defined by the ITU (formerly CCITT).
Can't recall the specific ITU-R recommendation (V.33?).



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: hashing
  2002-04-10 23:36   ` hashing chris.danx
  2002-04-11 14:18     ` hashing joevl
@ 2002-04-15 15:32     ` Ken Burtch
  1 sibling, 0 replies; 19+ messages in thread
From: Ken Burtch @ 2002-04-15 15:32 UTC (permalink / raw)



I believe there's a simple string hash value generator (HashOf) included
in the strings package of my drawtools suite

http://www.pegasoft.ca

Ken B.


"chris.danx" wrote:
> 
> "Steven Deller" <deller@smsail.com> wrote in message
> news:mailman.1018477621.31315.comp.lang.ada@ada.eu.org...
> > > Subject: hashing
> > > Hi,
> > >
> > > Does anyone have a good hash function for strings (length is
> > > arbitary, number of buckets is determined when package is
> 
> I'll maybe look into that (if it doesn't involve reverse engineering C).  I
> found one that may suffice for the purposes of the exercise, it's nothing
> fancy but it should work.
> 
> Sum all ascii vals of chars in string, then take modulo bucket size.
> 
> > Other than that, I'd try one of the numerous ISO-defined hash functions,
> > such as CRC-32, etc.
> 
> ISO define hash functions?
> 
> > I did a web search on "hash function algorithms" and came up with
> > NUMEROUS interesting sites.
> 
> oops wrong search string, "hash function strings".  Got loads of stuff in
> code, but code is not as useful.  Algorithm/function descriptions
> (mathematics included) are better than code IMO, since they express
> algorithms functions in a programming language independant manner.  When we
> were learning about bstrees they gave us Ada code which made no sense at the
> time due to poor descriptions of the algorithms and considerations involved.
> However when we learned how to do bstrees in Haskell, just before easter
> break it made perfect sense since the lecturer took the time to describe the
> algorithms.  The code was secondary to the algorithm.
> 
> [snip]
> 
> > > p.p.s. should performance/complexity information be included?
> > >  I seem to recall a conversation on this a while back, but
> > > can't remember if ppl thought it was a good idea.
> >
> > Complexity information always raises its head at some point.  Best to
> > include it.
> 
> it does makes sense to include it and it'll definitely go in the gmgpl
> version...  But does complexity information come under the header of an
> implementation detail?  i don't think it does but if it does and I include
> it in the spec, my tutor might disagree and I'll get penalised.  This is one
> of my grumps about uni, not knowing what they want with things.  In first
> year, you'd get penalised for *not* putting pointless comments in.
> 
> Ignore that, I'm probably worrying about nothing!!!
> 
> Thanks,
> Chris

-- 
Ken O. Burtch: http://www.pegasoft.ca                 : Pegasoft
System Manager in a Box / Business Shell              : R.R.#1
Bio: 36;Bsc,UI,Lang,Games;Toons,Elves,SF,Pizza;Xian   : Jordan Station,
ON
``````````````````````````````````````````````````````` Canada L0R 1S0



^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: hashing(HATs. A more efficient algorithm?)
  2002-04-10 21:24 ` hashing Steven Deller
  2002-04-10 22:41   ` hashing Nick Williams
  2002-04-10 23:36   ` hashing chris.danx
@ 2002-04-11 23:18   ` Wannabe h4x0r
  2002-04-11 23:29     ` chris.danx
  2 siblings, 1 reply; 19+ messages in thread
From: Wannabe h4x0r @ 2002-04-11 23:18 UTC (permalink / raw)


Lately I've been using Hashed Array Trees(or Tables), 
which have cut the execution time by about %40, and has 
given me a memory savings of about %20-%30. Note that I'm
just progressing from the beginner to the intermediate 
stage in my "skillz".

Basically, you start with an initial array of n number of 
elements. 
The first element is a "pointer"(however one may choose to
define that) to the last element of the array. When the 
array is filled to n-2 elements, automatically add x 
additional elements to the array, and them set the 
pointer in the first element to point to the end of the 
array again(which should be n+x_l).
One can use any number of methods to decide how many 
elements to add at any time, it's not fixed.
For an additional bit of buffer overflow protection, one
could set the pointer to point to n-2 (or n-1 or
whatever) depending on whatever makes sense at the time.

This is about the simplest damn algorithm in the world,
at least to me it is. Heh.

Chris



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: hashing(HATs. A more efficient algorithm?)
  2002-04-11 23:18   ` hashing(HATs. A more efficient algorithm?) Wannabe h4x0r
@ 2002-04-11 23:29     ` chris.danx
  2002-04-12  7:24       ` Wannabe h4x0r
  0 siblings, 1 reply; 19+ messages in thread
From: chris.danx @ 2002-04-11 23:29 UTC (permalink / raw)



"Wannabe h4x0r" <chris@dont.spam.me> wrote in message
news:mRot8.209679$Yv2.66927@rwcrnsc54...
> Lately I've been using Hashed Array Trees(or Tables),
> which have cut the execution time by about %40, and has
> given me a memory savings of about %20-%30. Note that I'm
> just progressing from the beginner to the intermediate
> stage in my "skillz".
>
> Basically, you start with an initial array of n number of
> elements.
> The first element is a "pointer"(however one may choose to
> define that) to the last element of the array. When the
> array is filled to n-2 elements, automatically add x
> additional elements to the array, and them set the
> pointer in the first element to point to the end of the
> array again(which should be n+x_l).
> One can use any number of methods to decide how many
> elements to add at any time, it's not fixed.
> For an additional bit of buffer overflow protection, one
> could set the pointer to point to n-2 (or n-1 or
> whatever) depending on whatever makes sense at the time.
>
> This is about the simplest damn algorithm in the world,
> at least to me it is. Heh.

I don't really get it, but this sounds like chains to me.  Can you explain
it a bitty more?

bye,
Chris





^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: hashing(HATs. A more efficient algorithm?)
  2002-04-11 23:29     ` chris.danx
@ 2002-04-12  7:24       ` Wannabe h4x0r
  2002-04-12 15:00         ` Chad R. Meiners
  2002-04-12 17:22         ` tmoran
  0 siblings, 2 replies; 19+ messages in thread
From: Wannabe h4x0r @ 2002-04-12  7:24 UTC (permalink / raw)

On Thu, 11 Apr 2002 19:29:16 -0400, chris.danx wrote:

> "Wannabe h4x0r" <chris@dont.spam.me> wrote in message
> news:mRot8.209679$Yv2.66927@rwcrnsc54...
>> Lately I've been using Hashed Array Trees(or Tables), which have cut
>> the execution time by about %40, and has given me a memory savings of
>> about %20-%30. Note that I'm just progressing from the beginner to the
>> intermediate stage in my "skillz".
>>
>> Basically, you start with an initial array of n number of elements. The
>> first element is a "pointer"(however one may choose to define that) to
>> the last element of the array. When the array is filled to n-2
>> elements, automatically add x additional elements to the array, and
>> them set the pointer in the first element to point to the end of the
>> array again(which should be n+x_l).
>> One can use any number of methods to decide how many elements to add at
>> any time, it's not fixed. For an additional bit of buffer overflow
>> protection, one could set the pointer to point to n-2 (or n-1 or
>> whatever) depending on whatever makes sense at the time.
>>
>> This is about the simplest damn algorithm in the world, at least to me
>> it is. Heh.
> 
> I don't really get it, but this sounds like chains to me.  Can you
> explain it a bitty more?
> 
> bye,
> Chris
> 
> 

Alright.

Suppose you have some numbers like 1,2,3,4(or could be
characters, or whatever).
The actual array would look like (using psuedocode)

	array := (pointer(array'range - 1),1,2,3,4,NULL);
	-- The pointer points to the NULL at the end of
	-- The array. Of course I'm using NULL here as
	-- a placeholder for the psuedocode. The NULL 
	-- element could hold anything to indicate that
	-- says "This is the end of the array"

Now you want extend that array to hold four more values.
Look at where the pointer in the first element is
pointing, jump to that element in the array, add five
more elements from that point(one to hold the reference 
element), then change the pointer to point at the end 
of the array again. The extended array will look like
this...

	array := (pointer(array'range-1),1,2,3,4, , , , ,NULL --The pointer now
points here.--);

Then you just fill in the elements.

	array := (pointer(array'range-1),1,2,3,4,5,6,7,8,NULL);

This beats making a new larger array and copying all the elements over,
plus the new ones. Plus you can determine with fair degree of certainty
how much memory you'll need at any one time. Since your only working with
the specific elements you need, it saves processor time. And because your
not duplicating data in various moves and copies, it saves
memory(especially on very large files.)
It could be said that one does not need a reference pointer, but that
becomes problematic if you have more than routine modifying data in the
array.
To make the array smaller, just do the reverse. Move the reference
character(or integer, or bit, or whatever) to a point in the array, and
free all memory above that point.

Amy I making any better sense. Dr. Dobbs had an article on this. I'll see
if I can find it.

Chris

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: hashing(HATs. A more efficient algorithm?)
  2002-04-12  7:24       ` Wannabe h4x0r
@ 2002-04-12 15:00         ` Chad R. Meiners
  2002-04-12 17:22         ` tmoran
  1 sibling, 0 replies; 19+ messages in thread
From: Chad R. Meiners @ 2002-04-12 15:00 UTC (permalink / raw)


So what does it have to do with hash tables?  All you have here is an
efficient method of expanding the size of an array.

-CRM

"Wannabe h4x0r" <chris@dont.spam.me> wrote in message
news:YYvt8.11521$Gl6.6265@sccrnsc01...
> On Thu, 11 Apr 2002 19:29:16 -0400, chris.danx wrote:
>
>
> > "Wannabe h4x0r" <chris@dont.spam.me> wrote in message
> > news:mRot8.209679$Yv2.66927@rwcrnsc54...
> >> Lately I've been using Hashed Array Trees(or Tables), which have cut
> >> the execution time by about %40, and has given me a memory savings of
> >> about %20-%30. Note that I'm just progressing from the beginner to the
> >> intermediate stage in my "skillz".
> >>
> >> Basically, you start with an initial array of n number of elements. The
> >> first element is a "pointer"(however one may choose to define that) to
> >> the last element of the array. When the array is filled to n-2
> >> elements, automatically add x additional elements to the array, and
> >> them set the pointer in the first element to point to the end of the
> >> array again(which should be n+x_l).
> >> One can use any number of methods to decide how many elements to add at
> >> any time, it's not fixed. For an additional bit of buffer overflow
> >> protection, one could set the pointer to point to n-2 (or n-1 or
> >> whatever) depending on whatever makes sense at the time.
> >>
> >> This is about the simplest damn algorithm in the world, at least to me
> >> it is. Heh.
> >
> > I don't really get it, but this sounds like chains to me.  Can you
> > explain it a bitty more?
> >
> > bye,
> > Chris
> >
> >
>
> Alright.
>
> Suppose you have some numbers like 1,2,3,4(or could be
> characters, or whatever).
> The actual array would look like (using psuedocode)
>
> array := (pointer(array'range - 1),1,2,3,4,NULL);
> -- The pointer points to the NULL at the end of
> -- The array. Of course I'm using NULL here as
> -- a placeholder for the psuedocode. The NULL
> -- element could hold anything to indicate that
> -- says "This is the end of the array"
>
> Now you want extend that array to hold four more values.
> Look at where the pointer in the first element is
> pointing, jump to that element in the array, add five
> more elements from that point(one to hold the reference
> element), then change the pointer to point at the end
> of the array again. The extended array will look like
> this...
>
> array := (pointer(array'range-1),1,2,3,4, , , , ,NULL --The pointer now
> points here.--);
>
> Then you just fill in the elements.
>
> array := (pointer(array'range-1),1,2,3,4,5,6,7,8,NULL);
>
>
> This beats making a new larger array and copying all the elements over,
> plus the new ones. Plus you can determine with fair degree of certainty
> how much memory you'll need at any one time. Since your only working with
> the specific elements you need, it saves processor time. And because your
> not duplicating data in various moves and copies, it saves
> memory(especially on very large files.)
> It could be said that one does not need a reference pointer, but that
> becomes problematic if you have more than routine modifying data in the
> array.
> To make the array smaller, just do the reverse. Move the reference
> character(or integer, or bit, or whatever) to a point in the array, and
> free all memory above that point.
>
> Amy I making any better sense. Dr. Dobbs had an article on this. I'll see
> if I can find it.
>
> Chris





^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: hashing(HATs. A more efficient algorithm?)
  2002-04-12  7:24       ` Wannabe h4x0r
  2002-04-12 15:00         ` Chad R. Meiners
@ 2002-04-12 17:22         ` tmoran
  1 sibling, 0 replies; 19+ messages in thread
From: tmoran @ 2002-04-12 17:22 UTC (permalink / raw)



I surely don't understand.  There's no hashing indicated, and the
array+pointer structure, if you make the "pointer" the index of
the last full cell, is simply an (array, Last) structure.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: hashing
  2002-04-10 21:31 hashing chris.danx
  2002-04-10 21:24 ` hashing Steven Deller
@ 2002-04-10 21:56 ` Chad R. Meiners
  2002-04-10 23:43 ` hashing Ted Dennison
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Chad R. Meiners @ 2002-04-10 21:56 UTC (permalink / raw)


Look in "The Art of Computer Programming" Volume 3 Sorting and Searching by
Donald E. Knuth.  Chapter 6 section 4 is about hashing.

-CRM

"chris.danx" <chris.danx@ntlworld.com> wrote in message
news:1b2t8.507$na.19833@news8-gui.server.ntli.net...
> Hi,
>
> Does anyone have a good hash function for strings (length is arbitary,
> number of buckets is determined when package is instantiated and
collisions
> are handled with chaining)?  I'm new to writing hash tables and functions
> (used them in libs, but this has to be all my own work!), and have chosen
> this technique to implement in an assignment -- had enough linked lists,
and
> want to expand my horizons.
>
> I'm searching for a function just now, so this isn't laziness just wanted
to
> utilise another resource (your brains) in case nothing turns up.  After
the
> assignment submission in a few weeks, I'll tidy up the sources for all the
> useful adts I've written this term and GMGPL them (lists (singly, doubly),
> hash tables, and whatever else comes up) if ppl express an interest (all
> adts are generic so far).
>
>
>
> Thanks,
> Chris
>
> p.s. does anyone have any uses for circular lists?  Tom Swans Tp5.5 book
> mentions them, what are they used for?  Just curious...
>
>
> p.p.s. should performance/complexity information be included?  I seem to
> recall a conversation on this a while back, but can't remember if ppl
> thought it was a good idea.
>
>





^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: hashing
  2002-04-10 21:31 hashing chris.danx
  2002-04-10 21:24 ` hashing Steven Deller
  2002-04-10 21:56 ` hashing Chad R. Meiners
@ 2002-04-10 23:43 ` Ted Dennison
  2002-04-11 16:12 ` hashing Georg Bauhaus
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Ted Dennison @ 2002-04-10 23:43 UTC (permalink / raw)

chris.danx wrote:

> p.s. does anyone have any uses for circular lists?  Tom Swans Tp5.5 book
> mentions them, what are they used for?  Just curious...

I use them all the time in my real-time work, and in device drivers. 
They are awesome for implementing queues using a fixed set of nodes. In 
real-time work, you really can't go around dynamicly allocating nodes 
after initilization, so this is very important.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: hashing
  2002-04-10 21:31 hashing chris.danx
                   ` (2 preceding siblings ...)
  2002-04-10 23:43 ` hashing Ted Dennison
@ 2002-04-11 16:12 ` Georg Bauhaus
  2002-04-11 22:36 ` hashing Simon Wright
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Georg Bauhaus @ 2002-04-11 16:12 UTC (permalink / raw)


chris.danx <chris.danx@ntlworld.com> wrote:
: Does anyone have a good hash function for strings 

There is a wealth of implementations of languages with extensive
table support, and with source code to look at. So there is something
to compare.
(ICON, Tcl, SPITBOL, lua, PostScript, ...)



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: hashing
  2002-04-10 21:31 hashing chris.danx
                   ` (3 preceding siblings ...)
  2002-04-11 16:12 ` hashing Georg Bauhaus
@ 2002-04-11 22:36 ` Simon Wright
  2002-05-03 10:16 ` hashing Antonio Duran
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Simon Wright @ 2002-04-11 22:36 UTC (permalink / raw)


"chris.danx" <chris.danx@ntlworld.com> writes:

> Does anyone have a good hash function for strings (length is
> arbitary, number of buckets is determined when package is
> instantiated and collisions are handled with chaining)?  I'm new to
> writing hash tables and functions (used them in libs, but this has
> to be all my own work!), and have chosen this technique to implement
> in an assignment -- had enough linked lists, and want to expand my
> horizons.

Not sure they're _good_, but see
http://www.pushface.org/components/bc/coldframe-hash.zip

(if that fails, mail me)



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: hashing
  2002-04-10 21:31 hashing chris.danx
                   ` (4 preceding siblings ...)
  2002-04-11 22:36 ` hashing Simon Wright
@ 2002-05-03 10:16 ` Antonio Duran
  2002-05-05 13:55 ` hashing Robert Dewar
  2002-05-06 20:42 ` hashing Antonio Duran
  7 siblings, 0 replies; 19+ messages in thread
From: Antonio Duran @ 2002-05-03 10:16 UTC (permalink / raw)


I've been using for that purposes the same hash algorithm used in java
that is, as a matter of fact, the hash algorithm cited on Robert
Sedgewick's Algorithms book.

Look at the source of the hash method for Java String class and
translate it into Ada.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: hashing
  2002-04-10 21:31 hashing chris.danx
                   ` (5 preceding siblings ...)
  2002-05-03 10:16 ` hashing Antonio Duran
@ 2002-05-05 13:55 ` Robert Dewar
  2002-05-06 20:42 ` hashing Antonio Duran
  7 siblings, 0 replies; 19+ messages in thread
From: Robert Dewar @ 2002-05-05 13:55 UTC (permalink / raw)


"chris.danx" <chris.danx@ntlworld.com> wrote in message news:<1b2t8.507$na.19833@news8-gui.server.ntli.net>...
> p.s. does anyone have any uses for circular lists?  Tom 
> Swans Tp5.5 book mentions them, what are they used for?  > Just curious...

Lots and lots of good uses for circular lists. I regard
them as one of the fundamental data strucures for queues.


Consider the following. Implement a circular list where
the tail of the list points to the head (singly linked).

Now point the head pointer to the LAST item on the list,
not the FIRST. That way, since the LAST is linked to the
FIRST you have immediate access to both and it is easy
to remove the head or add to the tail, without needing
double links, or a separate pointer to the head and tail.

There are many uses, but in the operating systems I wrote
for Honeywell, all queues were represented this way (space
was very important in those systems which were for machines
that by todays standards had small memories -- the large
members of the family had 32K bytes of memory, and these
were full featured real time executives and time sharing
operating systems :-)



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: hashing
  2002-04-10 21:31 hashing chris.danx
                   ` (6 preceding siblings ...)
  2002-05-05 13:55 ` hashing Robert Dewar
@ 2002-05-06 20:42 ` Antonio Duran
  2002-05-06 23:36   ` hashing Jeffrey Carter
  7 siblings, 1 reply; 19+ messages in thread
From: Antonio Duran @ 2002-05-06 20:42 UTC (permalink / raw)


"chris.danx" <chris.danx@ntlworld.com> wrote in message news:<1b2t8.507$na.19833@news8-gui.server.ntli.net>...
> Hi,
> 
> Does anyone have a good hash function for strings (length is arbitary,
> number of buckets is determined when package is instantiated and collisions
> are handled with chaining)?  I'm new to writing hash tables and functions
> (used them in libs, but this has to be all my own work!), and have chosen
> this technique to implement in an assignment -- had enough linked lists, and
> want to expand my horizons.
> 
> I'm searching for a function just now, so this isn't laziness just wanted to
> utilise another resource (your brains) in case nothing turns up.  After the
> assignment submission in a few weeks, I'll tidy up the sources for all the
> useful adts I've written this term and GMGPL them (lists (singly, doubly),
> hash tables, and whatever else comes up) if ppl express an interest (all
> adts are generic so far).
> 
> 
> 
> Thanks,
> Chris
> 
> p.s. does anyone have any uses for circular lists?  Tom Swans Tp5.5 book
> mentions them, what are they used for?  Just curious...
> 
> 
> p.p.s. should performance/complexity information be included?  I seem to
> recall a conversation on this a while back, but can't remember if ppl
> thought it was a good idea.

I've found the hash function:

with Interfaces;

function    Hash(
               What        : in     String)
   return   Interfaces.Unsigned_32
is
   Hash_Code      : Interfaces.Unsigned_32 := 0;
begin
   for I in What'Range loop
      Hash_Code := (31 * Hash_Code) + 
                   Interfaces.Unsigned_32(Character'Pos(What(I)));
   end loop;

   return Hash_Code;
end Hash;

The returned value must be mod'ed by the number of buckets in your
hash table (that number should be a prime number).



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: hashing
  2002-05-06 20:42 ` hashing Antonio Duran
@ 2002-05-06 23:36   ` Jeffrey Carter
  0 siblings, 0 replies; 19+ messages in thread
From: Jeffrey Carter @ 2002-05-06 23:36 UTC (permalink / raw)


Antonio Duran wrote:
> 
> "chris.danx" <chris.danx@ntlworld.com> wrote in message news:<1b2t8.507$na.19833@news8-gui.server.ntli.net>...
> > Hi,
> >
> > Does anyone have a good hash function for strings (length is arbitary,
> > number of buckets is determined when package is instantiated and collisions
> > are handled with chaining)?  I'm new to writing hash tables and functions
> > (used them in libs, but this has to be all my own work!), and have chosen
> > this technique to implement in an assignment -- had enough linked lists, and
> > want to expand my horizons.

You might want to look at P. K. Pearson, "Fast Hashing of
Variable-Length Text Strings," Comm. ACM, 1990 Jun. It's a fast "hashing
function specifically tailored to variable-length text strings."
"Similar strings are not likely to collide." It produces an unsigned
8-bit value for the hash.

An implementation is available as part of the PragmAda Reusable
Components, available from

http://home.earthlink.net/~jrcarter010/pragmarc.htm

and the mirror at www.adapower.com. Look for
PragmARC.Hash_Fast_Variable_Length.

-- 
Jeff Carter
"You empty-headed animal-food-trough wiper."
Monty Python & the Holy Grail



^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2002-05-06 23:36 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-04-10 21:31 hashing chris.danx
2002-04-10 21:24 ` hashing Steven Deller
2002-04-10 22:41   ` hashing Nick Williams
2002-04-10 23:36   ` hashing chris.danx
2002-04-11 14:18     ` hashing joevl
2002-04-15 15:32     ` hashing Ken Burtch
2002-04-11 23:18   ` hashing(HATs. A more efficient algorithm?) Wannabe h4x0r
2002-04-11 23:29     ` chris.danx
2002-04-12  7:24       ` Wannabe h4x0r
2002-04-12 15:00         ` Chad R. Meiners
2002-04-12 17:22         ` tmoran
2002-04-10 21:56 ` hashing Chad R. Meiners
2002-04-10 23:43 ` hashing Ted Dennison
2002-04-11 16:12 ` hashing Georg Bauhaus
2002-04-11 22:36 ` hashing Simon Wright
2002-05-03 10:16 ` hashing Antonio Duran
2002-05-05 13:55 ` hashing Robert Dewar
2002-05-06 20:42 ` hashing Antonio Duran
2002-05-06 23:36   ` hashing Jeffrey Carter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox