comp.lang.ada
 help / color / mirror / Atom feed
* Hashing on System.Address
@ 2005-06-13 20:18 Duncan Sands
  2005-06-13 21:31 ` Mark Lorenzen
  2005-06-13 21:57 ` Matthew Heaney
  0 siblings, 2 replies; 26+ messages in thread
From: Duncan Sands @ 2005-06-13 20:18 UTC (permalink / raw)
  To: comp.lang.ada

Does anyone have an efficient portable algorithm for hashing
on a System.Address?

The Ada 2005 package Ada.Containers defines Hash_Type as

	type Hash_Type is mod implementation-defined;

This is the target type I would like for the hash.

Thanks a lot,

Duncan.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hashing on System.Address
  2005-06-13 20:18 Hashing on System.Address Duncan Sands
@ 2005-06-13 21:31 ` Mark Lorenzen
  2005-06-14  7:23   ` Duncan Sands
  2005-06-13 21:57 ` Matthew Heaney
  1 sibling, 1 reply; 26+ messages in thread
From: Mark Lorenzen @ 2005-06-13 21:31 UTC (permalink / raw)


Duncan Sands <baldrick@free.fr> writes:

> Does anyone have an efficient portable algorithm for hashing
> on a System.Address?
> 
> The Ada 2005 package Ada.Containers defines Hash_Type as
> 
> 	type Hash_Type is mod implementation-defined;
> 
> This is the target type I would like for the hash.
> 
> Thanks a lot,
> 
> Duncan.

You can always convert a value of type System.Address to a value of a
signed or modular (implementation-defined) type. That should help when
you want to hash it into a value of type Hash_Type.

- Mark Lorenzen



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hashing on System.Address
  2005-06-13 20:18 Hashing on System.Address Duncan Sands
  2005-06-13 21:31 ` Mark Lorenzen
@ 2005-06-13 21:57 ` Matthew Heaney
  2005-06-13 23:41   ` Robert A Duff
  2005-06-14  7:18   ` Duncan Sands
  1 sibling, 2 replies; 26+ messages in thread
From: Matthew Heaney @ 2005-06-13 21:57 UTC (permalink / raw)




Duncan Sands wrote:
> Does anyone have an efficient portable algorithm for hashing
> on a System.Address?
>
> The Ada 2005 package Ada.Containers defines Hash_Type as
>
> 	type Hash_Type is mod implementation-defined;
>
> This is the target type I would like for the hash.

Use System.Storage_Elements.To_Integer to convert the address to an
Integer_Address, and then convert that to type Hash_Type.




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hashing on System.Address
  2005-06-13 21:57 ` Matthew Heaney
@ 2005-06-13 23:41   ` Robert A Duff
  2005-06-14  2:22     ` Matthew Heaney
                       ` (2 more replies)
  2005-06-14  7:18   ` Duncan Sands
  1 sibling, 3 replies; 26+ messages in thread
From: Robert A Duff @ 2005-06-13 23:41 UTC (permalink / raw)


"Matthew Heaney" <mheaney@on2.com> writes:

> Duncan Sands wrote:
> > Does anyone have an efficient portable algorithm for hashing
> > on a System.Address?
> >
> > The Ada 2005 package Ada.Containers defines Hash_Type as
> >
> > 	type Hash_Type is mod implementation-defined;
> >
> > This is the target type I would like for the hash.
> 
> Use System.Storage_Elements.To_Integer to convert the address to an
> Integer_Address, and then convert that to type Hash_Type.

It might be a good idea to divide that by 8, since most addresses have
zeros in the low three bits.

- Bob



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hashing on System.Address
  2005-06-13 23:41   ` Robert A Duff
@ 2005-06-14  2:22     ` Matthew Heaney
  2005-06-14  7:27       ` Duncan Sands
                         ` (3 more replies)
  2005-06-14  8:42     ` Larry Kilgallen
  2005-06-14 11:18     ` Duncan Sands
  2 siblings, 4 replies; 26+ messages in thread
From: Matthew Heaney @ 2005-06-14  2:22 UTC (permalink / raw)


Robert A Duff <bobduff@shell01.TheWorld.com> writes:

> "Matthew Heaney" <mheaney@on2.com> writes:
> > Use System.Storage_Elements.To_Integer to convert the address to an
> > Integer_Address, and then convert that to type Hash_Type.
> 
> It might be a good idea to divide that by 8, since most addresses have
> zeros in the low three bits.

Good idea.  The other thing I haven't really figured out is what to do
if you have a 64-bit address, and Hash_Type is only 32.  Ideally you'd
like to compute a 32-bit hash value that uses all 64 bits of the
address.

Alternatively, you could use the ordered container, since type
System.Address has a less than ("<") relational operator.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hashing on System.Address
  2005-06-13 21:57 ` Matthew Heaney
  2005-06-13 23:41   ` Robert A Duff
@ 2005-06-14  7:18   ` Duncan Sands
  2005-06-14 15:18     ` Matthew Heaney
  1 sibling, 1 reply; 26+ messages in thread
From: Duncan Sands @ 2005-06-14  7:18 UTC (permalink / raw)
  To: comp.lang.ada; +Cc: Matthew Heaney

Hi Matthew,

> Use System.Storage_Elements.To_Integer to convert the address to an
> Integer_Address, and then convert that to type Hash_Type.

that's what I'm doing.  But there's no guarantee that they have the
same size, so it's not portable.  It's easy enough to thunk Integer_Address
to a modular type (which I'll also call Integer_Address here).  If
Integer_Address is smaller than or equal to Hash_Type then there's no
problem.  If it's larger then you can do some bit rotation, but it's surprisingly
awkward to do when you don't know the sizes in advance.  I was hoping
someone knew how to do this in a clean way.

All the best,

Duncan.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hashing on System.Address
  2005-06-13 21:31 ` Mark Lorenzen
@ 2005-06-14  7:23   ` Duncan Sands
  2005-06-14 20:18     ` Randy Brukardt
  0 siblings, 1 reply; 26+ messages in thread
From: Duncan Sands @ 2005-06-14  7:23 UTC (permalink / raw)
  To: comp.lang.ada; +Cc: Mark Lorenzen

Hi Mark,

> You can always convert a value of type System.Address to a value of a
> signed or modular (implementation-defined) type. That should help when
> you want to hash it into a value of type Hash_Type.

yes that helps - thanks for the suggestion.  In fact that is what I'm already
doing.  The problem is the next part...  That can be simplified to: I have two
modular types (binary modulus), but I don't know their modulus.  Values of
the first type should be hashed into values of the second type.  This can be
done (I've done it) but somehow the code is much more horrible than I feel
it should be.  One annoying point is that a program with a static expression
that would raise Constraint_Error is illegal, even if you would never get to the
Contraint_Error raising part in run-time.  If you don't understand the relevance
of this remark, try to code this kind of hash thing yourself and you will soon
see :)

Thanks again,

Duncan.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hashing on System.Address
  2005-06-14  2:22     ` Matthew Heaney
@ 2005-06-14  7:27       ` Duncan Sands
  2005-06-14  8:52         ` Alex R. Mosteo
  2005-06-14  8:44       ` Larry Kilgallen
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 26+ messages in thread
From: Duncan Sands @ 2005-06-14  7:27 UTC (permalink / raw)
  To: comp.lang.ada; +Cc: Matthew Heaney

> Good idea.  The other thing I haven't really figured out is what to do
> if you have a 64-bit address, and Hash_Type is only 32.  Ideally you'd
> like to compute a 32-bit hash value that uses all 64 bits of the
> address.

Yes, that's the problem.  In fact to be portable the code should work
regardless of which type has the bigger modulus.

> Alternatively, you could use the ordered container, since type
> System.Address has a less than ("<") relational operator.

I considered that, but since I'm sure to want to hash on addresses one
day, it seemed like a good idea to bite the bullet and come up with a
good solution now.

By the way, is there any reason why Ada.Containers doesn't provide some
standard hash functions?  I reckon hashing on strings and on addresses
would be pretty handy.

Ciao,

Duncan.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hashing on System.Address
  2005-06-13 23:41   ` Robert A Duff
  2005-06-14  2:22     ` Matthew Heaney
@ 2005-06-14  8:42     ` Larry Kilgallen
  2005-06-14 11:18     ` Duncan Sands
  2 siblings, 0 replies; 26+ messages in thread
From: Larry Kilgallen @ 2005-06-14  8:42 UTC (permalink / raw)


In article <wcc3brlrfj5.fsf@shell01.TheWorld.com>, Robert A Duff <bobduff@shell01.TheWorld.com> writes:
> "Matthew Heaney" <mheaney@on2.com> writes:
> 
>> Duncan Sands wrote:
>> > Does anyone have an efficient portable algorithm for hashing
>> > on a System.Address?
>> >
>> > The Ada 2005 package Ada.Containers defines Hash_Type as
>> >
>> > 	type Hash_Type is mod implementation-defined;
>> >
>> > This is the target type I would like for the hash.
>> 
>> Use System.Storage_Elements.To_Integer to convert the address to an
>> Integer_Address, and then convert that to type Hash_Type.
> 
> It might be a good idea to divide that by 8, since most addresses have
> zeros in the low three bits.

Or it might not be, since he said portable.  Looking at addresses of
_instructions_ (which are the closest to that pattern) on the machine
architectures on which VMS runs:

	VAX - No restriction, 1 bit (odd and even) just as likely to be
		one as the other
	Alpha - 4 byte boundary, 4 bit (100) just as likely to be one
		as the other
	Itanium - 8 byte boundary for VLIW instructions, but in software
		the last octet of the address is 0, 1 or 2 depending on
		which instruction in the bundle is addressed.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hashing on System.Address
  2005-06-14  2:22     ` Matthew Heaney
  2005-06-14  7:27       ` Duncan Sands
@ 2005-06-14  8:44       ` Larry Kilgallen
  2005-06-14 10:03       ` Marius Amado Alves
       [not found]       ` <abb2a2c53e3708803aa68bb87834b7bc@netcabo.pt>
  3 siblings, 0 replies; 26+ messages in thread
From: Larry Kilgallen @ 2005-06-14  8:44 UTC (permalink / raw)


In article <u8y1dllt8.fsf@earthlink.net>, Matthew Heaney <matthewjheaney@earthlink.net> writes:
> Robert A Duff <bobduff@shell01.TheWorld.com> writes:
> 
>> "Matthew Heaney" <mheaney@on2.com> writes:
>> > Use System.Storage_Elements.To_Integer to convert the address to an
>> > Integer_Address, and then convert that to type Hash_Type.
>> 
>> It might be a good idea to divide that by 8, since most addresses have
>> zeros in the low three bits.
> 
> Good idea.  The other thing I haven't really figured out is what to do
> if you have a 64-bit address, and Hash_Type is only 32.  Ideally you'd
> like to compute a 32-bit hash value that uses all 64 bits of the
> address.

I think ignoring the high bits is better than ignoring the low bits,
since most data sets are likely to be somewhat localized.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hashing on System.Address
  2005-06-14  7:27       ` Duncan Sands
@ 2005-06-14  8:52         ` Alex R. Mosteo
  2005-06-14  9:06           ` Duncan Sands
  2005-06-14  9:09           ` Duncan Sands
  0 siblings, 2 replies; 26+ messages in thread
From: Alex R. Mosteo @ 2005-06-14  8:52 UTC (permalink / raw)


Duncan Sands wrote:

> By the way, is there any reason why Ada.Containers doesn't provide some
> standard hash functions?  I reckon hashing on strings and on addresses
> would be pretty handy.

Out of Containers hierarchy: Ada.Strings.Hash



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hashing on System.Address
  2005-06-14  8:52         ` Alex R. Mosteo
@ 2005-06-14  9:06           ` Duncan Sands
  2005-06-14  9:09           ` Duncan Sands
  1 sibling, 0 replies; 26+ messages in thread
From: Duncan Sands @ 2005-06-14  9:06 UTC (permalink / raw)
  To: comp.lang.ada; +Cc: Alex R. Mosteo

> > By the way, is there any reason why Ada.Containers doesn't provide some
> > standard hash functions?  I reckon hashing on strings and on addresses
> > would be pretty handy.
> 
> Out of Containers hierarchy: Ada.Strings.Hash

Thanks a lot - I hadn't spotted that.

Best wishes,

Duncan.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hashing on System.Address
  2005-06-14  8:52         ` Alex R. Mosteo
  2005-06-14  9:06           ` Duncan Sands
@ 2005-06-14  9:09           ` Duncan Sands
  2005-06-14 17:00             ` Pascal Obry
  2005-06-14 20:03             ` Randy Brukardt
  1 sibling, 2 replies; 26+ messages in thread
From: Duncan Sands @ 2005-06-14  9:09 UTC (permalink / raw)
  To: comp.lang.ada; +Cc: Alex R. Mosteo

> > By the way, is there any reason why Ada.Containers doesn't provide some
> > standard hash functions?  I reckon hashing on strings and on addresses
> > would be pretty handy.
> 
> Out of Containers hierarchy: Ada.Strings.Hash

You know, this implementation advice sounds pretty silly:

Implementation Advice
7/2
     The various Hash functions should be good hash functions, returning a wide spread of values for different string values. It should be unlikely for similar strings to return the same value.

As if any implementor needs to be told that they should use a "good" rather than a "bad"
hash function!

Ciao,

d.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hashing on System.Address
  2005-06-14  2:22     ` Matthew Heaney
  2005-06-14  7:27       ` Duncan Sands
  2005-06-14  8:44       ` Larry Kilgallen
@ 2005-06-14 10:03       ` Marius Amado Alves
       [not found]       ` <abb2a2c53e3708803aa68bb87834b7bc@netcabo.pt>
  3 siblings, 0 replies; 26+ messages in thread
From: Marius Amado Alves @ 2005-06-14 10:03 UTC (permalink / raw)
  To: comp.lang.ada

> ... The other thing I haven't really figured out is what to do
> if you have a 64-bit address, and Hash_Type is only 32...

I gave a general solution to this class of cases before:

<<
Treat both the input and output values as bit strings X (1 .. M) and Y 
(1 .. N) respectively. (Use representation clauses, packed arrays, and 
unchecked conversion for this.) Make N small, say 16. Set Y bits to the 
value of X bits as follows:

if M > N, Y (J) = X (1 + (J - 1) * M / N), for J in 1 .. N

if M = N, Y = X

if M < N, initialize Y to all zeros,
  then let Y (1 + (I - 1) * N / M) = X (I), for I in 1 .. M
 >>




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hashing on System.Address
       [not found]       ` <abb2a2c53e3708803aa68bb87834b7bc@netcabo.pt>
@ 2005-06-14 10:14         ` Duncan Sands
       [not found]         ` <200506141214.14213.baldrick@free.fr>
  1 sibling, 0 replies; 26+ messages in thread
From: Duncan Sands @ 2005-06-14 10:14 UTC (permalink / raw)
  To: comp.lang.ada; +Cc: Marius Amado Alves

On Tuesday 14 June 2005 12:03, Marius Amado Alves wrote:
> > ... The other thing I haven't really figured out is what to do
> > if you have a 64-bit address, and Hash_Type is only 32...
> 
> I gave a general solution to this class of cases before:
> 
> <<
> Treat both the input and output values as bit strings X (1 .. M) and Y 
> (1 .. N) respectively. (Use representation clauses, packed arrays, and 
> unchecked conversion for this.) Make N small, say 16. Set Y bits to the 
> value of X bits as follows:
> 
> if M > N, Y (J) = X (1 + (J - 1) * M / N), for J in 1 .. N
> 
> if M = N, Y = X
> 
> if M < N, initialize Y to all zeros,
>   then let Y (1 + (I - 1) * N / M) = X (I), for I in 1 .. M
>  >>

Hi Marius, thanks for the suggestion.  A few comments though:
if M > N (say M = 2*N) then you're skipping every second bit
in X.  Why not use them - mix them in somehow?  If M < N then
you are trying to spread the bits of X evenly throughout Y.  Are
there any theoretical reasons to think this is a good strategy?
Finally, I'm betting the code produced is kind of yucky.  It would
be nice to have a solution that resulted in a small number of
simple bit operations.

All the best,

Duncan.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hashing on System.Address
       [not found]         ` <200506141214.14213.baldrick@free.fr>
@ 2005-06-14 10:29           ` Marius Amado Alves
       [not found]           ` <64f19b6707e4d19f5362900bbfb80b76@netcabo.pt>
  1 sibling, 0 replies; 26+ messages in thread
From: Marius Amado Alves @ 2005-06-14 10:29 UTC (permalink / raw)
  To: comp.lang.ada


On 14 Jun 2005, at 11:14, Duncan Sands wrote:

> On Tuesday 14 June 2005 12:03, Marius Amado Alves wrote:
>>> ... The other thing I haven't really figured out is what to do
>>> if you have a 64-bit address, and Hash_Type is only 32...
>>
>> I gave a general solution to this class of cases before:
>>
>> <<
>> Treat both the input and output values as bit strings X (1 .. M) and Y
>> (1 .. N) respectively. (Use representation clauses, packed arrays, and
>> unchecked conversion for this.) Make N small, say 16. Set Y bits to 
>> the
>> value of X bits as follows:
>>
>> if M > N, Y (J) = X (1 + (J - 1) * M / N), for J in 1 .. N
>>
>> if M = N, Y = X
>>
>> if M < N, initialize Y to all zeros,
>>   then let Y (1 + (I - 1) * N / M) = X (I), for I in 1 .. M
>>>>
>
> Hi Marius, thanks for the suggestion.  A few comments though:
> if M > N (say M = 2*N) then you're skipping every second bit
> in X.  Why not use them - mix them in somehow?

Yes, I thought of that, namely picking every slice of M / N bits of the 
input and convert it to 1 bit. Example for M = 2 * N: 00 => 0, 01 => 0, 
10 => 1, 11 => 1. But see below.

>   If M < N then
> you are trying to spread the bits of X evenly throughout Y.  Are
> there any theoretical reasons to think this is a good strategy?

Intuition + experimentation. "Guess first, prove later" (Poincaré) I 
tested my function against GNAT's. Dispersion was equivalent. Speed 
also I if I remember correctly. That's proof enough for me. That's why 
I didn't advance to the slice variant just above.




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hashing on System.Address
       [not found]           ` <64f19b6707e4d19f5362900bbfb80b76@netcabo.pt>
@ 2005-06-14 10:49             ` Marius Amado Alves
  0 siblings, 0 replies; 26+ messages in thread
From: Marius Amado Alves @ 2005-06-14 10:49 UTC (permalink / raw)
  To: comp.lang.ada

Now more "theoretically", for my bit-based hash function, output 
dispersion is clearly dependent on input dispersion. I tested it with 
text strings (Latin-1) and it did well. But for skewed domains like 
memory addresses it may not work as good. In these cases it is indeed 
smarter to use whatever knowledge you have of the skewdness e.g. cut 
out invariant bits. Note the two methods can work in tandem: first cut 
out invariant bits, then use my function to zoom in or out to the 
output size.




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hashing on System.Address
  2005-06-13 23:41   ` Robert A Duff
  2005-06-14  2:22     ` Matthew Heaney
  2005-06-14  8:42     ` Larry Kilgallen
@ 2005-06-14 11:18     ` Duncan Sands
  2 siblings, 0 replies; 26+ messages in thread
From: Duncan Sands @ 2005-06-14 11:18 UTC (permalink / raw)
  To: comp.lang.ada; +Cc: Robert A Duff

> It might be a good idea to divide that by 8, since most addresses have
> zeros in the low three bits.

Hi Bob, if the number of hash buckets is a prime number (not equal to 2),
so the actual bucket chosen is the hash value mod a prime, then I guess
dividing by 8 won't make any difference...

Ciao,

D.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hashing on System.Address
  2005-06-14  7:18   ` Duncan Sands
@ 2005-06-14 15:18     ` Matthew Heaney
  2005-06-14 16:22       ` Duncan Sands
  0 siblings, 1 reply; 26+ messages in thread
From: Matthew Heaney @ 2005-06-14 15:18 UTC (permalink / raw)




Duncan Sands wrote:
> > Use System.Storage_Elements.To_Integer to convert the address to an
> > Integer_Address, and then convert that to type Hash_Type.
>
> that's what I'm doing.  But there's no guarantee that they have the
> same size, so it's not portable.

Even if the Integer_Address type is 64 bits, that doesn't necessarily
mean To_Address will return values > 2**32.  The easiest thing is to do
is just throw away the upper 32 bits of of the integer value, and then
convert that to type Hash_Type.

Yes, then means you won't have a perfect hash function, but that's
typical for hash functions anyway.  (The Birthday Paradox applies to
hash functions too, I think.)

You might want to post your question on comp.compilers or one of the OS
newgroups.  This is most probably a solved problem.

-Matt




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hashing on System.Address
  2005-06-14 15:18     ` Matthew Heaney
@ 2005-06-14 16:22       ` Duncan Sands
  0 siblings, 0 replies; 26+ messages in thread
From: Duncan Sands @ 2005-06-14 16:22 UTC (permalink / raw)
  To: comp.lang.ada; +Cc: Matthew Heaney

Hi Matthew,

> Even if the Integer_Address type is 64 bits, that doesn't necessarily
> mean To_Address will return values > 2**32.  The easiest thing is to do
> is just throw away the upper 32 bits of of the integer value, and then
> convert that to type Hash_Type.

that's exactly what I did, with some tricks to get around RM4.9.34.
 
> ...You might want to post your question on comp.compilers or one of the OS
> newgroups.  This is most probably a solved problem.

I'm sure it is solved.  I had expected solutions to be well known, but
apparently they are not.

Thanks for your help,

Duncan.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hashing on System.Address
  2005-06-14  9:09           ` Duncan Sands
@ 2005-06-14 17:00             ` Pascal Obry
  2005-06-14 20:03             ` Randy Brukardt
  1 sibling, 0 replies; 26+ messages in thread
From: Pascal Obry @ 2005-06-14 17:00 UTC (permalink / raw)



Duncan Sands <baldrick@free.fr> writes:

> As if any implementor needs to be told that they should use a "good" rather
> than a "bad" hash function!

It is always good to state things clearly even if evident at some point by
some people :)

Pascal.

-- 

--|------------------------------------------------------
--| Pascal Obry                           Team-Ada Member
--| 45, rue Gabriel Peri - 78114 Magny Les Hameaux FRANCE
--|------------------------------------------------------
--|              http://www.obry.net
--| "The best way to travel is by means of imagination"
--|
--| gpg --keyserver wwwkeys.pgp.net --recv-key C1082595



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hashing on System.Address
  2005-06-14  9:09           ` Duncan Sands
  2005-06-14 17:00             ` Pascal Obry
@ 2005-06-14 20:03             ` Randy Brukardt
  2005-06-14 23:20               ` Robert A Duff
  2005-06-15  7:13               ` Duncan Sands
  1 sibling, 2 replies; 26+ messages in thread
From: Randy Brukardt @ 2005-06-14 20:03 UTC (permalink / raw)


"Duncan Sands" <baldrick@free.fr> wrote in message
news:mailman.25.1118740226.17633.comp.lang.ada@ada-france.org...
> > > By the way, is there any reason why Ada.Containers doesn't provide
some
> > > standard hash functions?  I reckon hashing on strings and on addresses
> > > would be pretty handy.
> >
> > Out of Containers hierarchy: Ada.Strings.Hash
>
> You know, this implementation advice sounds pretty silly:
>
> Implementation Advice
> 7/2
>      The various Hash functions should be good hash functions, returning a
wide spread of values for different string values. It should be unlikely for
similar strings to return the same value.
>
> As if any implementor needs to be told that they should use a "good"
rather than a "bad"
> hash function!

We'd have preferred to say something stronger, and normatively, but there
doesn't seem to be a way to describe what a hash function does, other than
returning a value of Hash_Type that depends of the argument. So we use IA to
describe what the intent is; IA can be informal and use undefined terms like
"good".

Without the IA, the only indication that this is a hash function is the
name. That wouldn't be enough. And if you don't know what a hash function
is, the IA will be helpful.

                                    Randy.






^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hashing on System.Address
  2005-06-14  7:23   ` Duncan Sands
@ 2005-06-14 20:18     ` Randy Brukardt
  2005-06-15  7:27       ` Duncan Sands
  0 siblings, 1 reply; 26+ messages in thread
From: Randy Brukardt @ 2005-06-14 20:18 UTC (permalink / raw)


"Duncan Sands" <baldrick@free.fr> wrote in message
news:mailman.22.1118733848.17633.comp.lang.ada@ada-france.org...
> Hi Mark,
>
> > You can always convert a value of type System.Address to a value of a
> > signed or modular (implementation-defined) type. That should help when
> > you want to hash it into a value of type Hash_Type.
>
> yes that helps - thanks for the suggestion.  In fact that is what I'm
already
> doing.  The problem is the next part...  That can be simplified to: I have
two
> modular types (binary modulus), but I don't know their modulus.  Values of
> the first type should be hashed into values of the second type.  This can
be
> done (I've done it) but somehow the code is much more horrible than I feel
> it should be.  One annoying point is that a program with a static
expression
> that would raise Constraint_Error is illegal, even if you would never get
to the
> Contraint_Error raising part in run-time.  If you don't understand the
relevance
> of this remark, try to code this kind of hash thing yourself and you will
soon
> see :)

That's one of the reasons that Ada 2006 adds the "Mod" attribute. You could
use that to discard any extra bits. (If you want to keep them, well, that
gets messier.).

So I'd use something like:

    if Integer_Address'Last < Hash_Type'Modulus then
            return Hash_Type'Mod(Addr);
    elsif Integer_Address'Last / Hash_Type'Modulus < Hash_Type'Modulus then
            return Hash_Type'Mod(Addr) xor Hash_Type'Mod(Addr /
Hash_Type'Modulus);
    else
            raise Program_Error;
    end if;

You could of course continue to handle larger multiples in the same way.

I don't remember if those static expressions are illegal because of typing.
That can be fixed by making sure the entire expression is of type
Universal_Integer:

    if Integer_Address'Pos(Integer_Address'Last) < Hash_Type'Modulus then
            return Hash_Type'Mod(Addr);
    elsif Integer_Address'Pos(Integer_Address'Last) / Hash_Type'Modulus <
Hash_Type'Modulus then
            return Hash_Type'Mod(Addr) xor Hash_Type'Mod(Addr /
Hash_Type'Modulus);
    else
            raise Program_Error;
    end if;

                                   Randy.






^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hashing on System.Address
  2005-06-14 20:03             ` Randy Brukardt
@ 2005-06-14 23:20               ` Robert A Duff
  2005-06-15  7:13               ` Duncan Sands
  1 sibling, 0 replies; 26+ messages in thread
From: Robert A Duff @ 2005-06-14 23:20 UTC (permalink / raw)


"Randy Brukardt" <randy@rrsoftware.com> writes:

> We'd have preferred to say something stronger, and normatively, but there
> doesn't seem to be a way to describe what a hash function does, other than
> returning a value of Hash_Type that depends of the argument. So we use IA to
> describe what the intent is; IA can be informal and use undefined terms like
> "good".

Right.  Given any Hash function, I can construct a data set that will
make it "bad".  Therefore, we have to use informal terms in describing
"goodness".

- Bob



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hashing on System.Address
  2005-06-14 20:03             ` Randy Brukardt
  2005-06-14 23:20               ` Robert A Duff
@ 2005-06-15  7:13               ` Duncan Sands
  1 sibling, 0 replies; 26+ messages in thread
From: Duncan Sands @ 2005-06-15  7:13 UTC (permalink / raw)
  To: comp.lang.ada; +Cc: Randy Brukardt

> >      The various Hash functions should be good hash functions, returning a
> wide spread of values for different string values. It should be unlikely for
> similar strings to return the same value.
> >
> > As if any implementor needs to be told that they should use a "good"
> rather than a "bad"
> > hash function!
> 
> We'd have preferred to say something stronger, and normatively, but there
> doesn't seem to be a way to describe what a hash function does, other than
> returning a value of Hash_Type that depends of the argument. So we use IA to
> describe what the intent is; IA can be informal and use undefined terms like
> "good".
> 
> Without the IA, the only indication that this is a hash function is the
> name. That wouldn't be enough. And if you don't know what a hash function
> is, the IA will be helpful.

How about this then:

"The various Hash functions should return a wide spread of values for different
string values. It should be unlikely for similar strings to return the same value."

Ciao,

D.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hashing on System.Address
  2005-06-14 20:18     ` Randy Brukardt
@ 2005-06-15  7:27       ` Duncan Sands
  0 siblings, 0 replies; 26+ messages in thread
From: Duncan Sands @ 2005-06-15  7:27 UTC (permalink / raw)
  To: comp.lang.ada; +Cc: Randy Brukardt

Hi Randy,

> That's one of the reasons that Ada 2006 adds the "Mod" attribute. You could
> use that to discard any extra bits. (If you want to keep them, well, that
> gets messier.).

I didn't know about 'Mod - it will certainly be useful.  Luckily GNAT seems to have
implemented it.

> So I'd use something like:
...

Interesting suggestion!

Thanks again,

Duncan.



^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2005-06-15  7:27 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-06-13 20:18 Hashing on System.Address Duncan Sands
2005-06-13 21:31 ` Mark Lorenzen
2005-06-14  7:23   ` Duncan Sands
2005-06-14 20:18     ` Randy Brukardt
2005-06-15  7:27       ` Duncan Sands
2005-06-13 21:57 ` Matthew Heaney
2005-06-13 23:41   ` Robert A Duff
2005-06-14  2:22     ` Matthew Heaney
2005-06-14  7:27       ` Duncan Sands
2005-06-14  8:52         ` Alex R. Mosteo
2005-06-14  9:06           ` Duncan Sands
2005-06-14  9:09           ` Duncan Sands
2005-06-14 17:00             ` Pascal Obry
2005-06-14 20:03             ` Randy Brukardt
2005-06-14 23:20               ` Robert A Duff
2005-06-15  7:13               ` Duncan Sands
2005-06-14  8:44       ` Larry Kilgallen
2005-06-14 10:03       ` Marius Amado Alves
     [not found]       ` <abb2a2c53e3708803aa68bb87834b7bc@netcabo.pt>
2005-06-14 10:14         ` Duncan Sands
     [not found]         ` <200506141214.14213.baldrick@free.fr>
2005-06-14 10:29           ` Marius Amado Alves
     [not found]           ` <64f19b6707e4d19f5362900bbfb80b76@netcabo.pt>
2005-06-14 10:49             ` Marius Amado Alves
2005-06-14  8:42     ` Larry Kilgallen
2005-06-14 11:18     ` Duncan Sands
2005-06-14  7:18   ` Duncan Sands
2005-06-14 15:18     ` Matthew Heaney
2005-06-14 16:22       ` Duncan Sands

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox