From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: 103376,ac39a12d5faf5b14
X-Google-Attributes: gid103376,public
X-Google-ArrivalTime: 2002-04-24 08:02:43 PST
Path: 
 archiver1.google.com!news1.google.com!newsfeed.stanford.edu!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!xmission!news-out.spamkiller.net!propagator2-maxim!propagator-maxim!news-in.spamkiller.net!out.nntp.be!propagator-SanJose!in.nntp.be!dax.net!juliett.dax.net!not-for-mail
Newsgroups: comp.lang.ada
Subject: Re: Grace and Maps (was Re: Development process in the Ada community)
References: <3CB46975.90408@snafu.de> <3CBAFFEE.2080708@snafu.de>
 <4519e058.0204171036.6f0a7394@posting.google.com> <3CBDD795.4060706@snafu.de>
 <4519e058.0204180800.44fac012@posting.google.com> <3CBF0341.8020406@mail.com>
 <a9n0tk$6fs$1@nh.pace.co.uk>
 <4519e058.0204190529.559a47ae@posting.google.com>
 <a9p98h$8hn$1@nh.pace.co.uk> <3CC1C6B3.6060306@telepath.com>
 <3CC21747.5000501@telepath.com> <V+UgdHreGTJg@eisner.encompasserve.org>
 <x7vr8l871it.fsf@smaug.pushface.org> <4519e058.0204220534.2eb33730@posting.go
 <3CC48F34.5A474E0F@boeing.com> <3CC49C50.485AE213@san.rr.com>
 <3CC4B5C0.4D16077C@acm.org> <3CC4E9DA.E02BE0DA@san.rr.com>
 <7vznzukhzf.fsf@vlinux.voxelvision.no> <3CC5B161.C719C5D8@san.rr.com>
 <7vg01lwaku.fsf@vlinux.voxelvision.no> <3CC6B932.75C468E1@san.rr.com>
From: Ole-Hjalmar Kristensen <oleh@vlinux.voxelvision.no>
Message-ID: <7vbsc9w367.fsf@vlinux.voxelvision.no>
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.1
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Wed, 24 Apr 2002 15:02:09 GMT
NNTP-Posting-Host: 193.216.12.150
X-Complaints-To: abuse@tele2.no
X-Trace: juliett.dax.net 1019660529 193.216.12.150 (Wed,
 24 Apr 2002 17:02:09 MET DST)
NNTP-Posting-Date: Wed, 24 Apr 2002 17:02:09 MET DST
Organization: Tele2 Norway AS Public Access
Xref: archiver1.google.com comp.lang.ada:23057
Date: 2002-04-24T15:02:09+00:00
List-Id: <comp.lang.ada>

Darren New <dnew@san.rr.com> writes:

> Ole-Hjalmar Kristensen wrote:
> > I agree, not the thing you would want in a hard real time
> > safety-critical system. Buf for soft real time, non saftey critical
> > systems it's good enough. And it really can make a difference in
> > throughput in for instance applications like relational algebra
> > processing.
> 
> Well, there's also the question of the malicious user, who intentionally
> inserts a lot of items that hash to the same value (over a web
> interface, say) and thereby breaks something.
>  

OK, but how does this malicious user manage to do this when I'm using
a hash function which on average changes half the bits of its output
for any single bit change in its input? I'm not saying it's
impossible, but it seems pretty far-fetched to me. Of course, if he's
got the source to my program and is good at inverting hash functions
he can do it.  In particular (pardon the C code, but as you say
yourself, in C it's easy), how do you construct your keys so they all
hash to the same value given this particular hash function? :

/* The mixing step */
#define mix(a,b,c) \
{ \
  a=a-b;  a=a-c;  a=a^(c>>13); \
  b=b-c;  b=b-a;  b=b^(a<<8);  \
  c=c-a;  c=c-b;  c=c^(b>>13); \
  a=a-b;  a=a-c;  a=a^(c>>12); \
  b=b-c;  b=b-a;  b=b^(a<<16); \
  c=c-a;  c=c-b;  c=c^(b>>5);  \
  a=a-b;  a=a-c;  a=a^(c>>3);  \
  b=b-c;  b=b-a;  b=b^(a<<10); \
  c=c-a;  c=c-b;  c=c^(b>>15); \
}

/* The whole new hash function */
inline u4 burtle( 
		 register u1 *k,      /* the key */
		 u4           length, /* the length of the key in bytes */
		 u4           initval /* the previous hash, or an arbitrary value */
		 )
{
  register u4 a,b,c;  /* the internal state */
  u4          len;    /* how many key bytes still need mixing */

  /* Set up the internal state */
  len = length;
  a = b = 0x9e3779b9;  /* the golden ratio; an arbitrary value */
  c = initval;         /* variable initialization of internal state */

  /*---------------------------------------- handle most of the key */
  while (len >= 12)
    {
      a=a+(k[0]+((u4)k[1]<<8)+((u4)k[2]<<16) +((u4)k[3]<<24));
      b=b+(k[4]+((u4)k[5]<<8)+((u4)k[6]<<16) +((u4)k[7]<<24));
      c=c+(k[8]+((u4)k[9]<<8)+((u4)k[10]<<16)+((u4)k[11]<<24));
      mix(a,b,c);
      k = k+12; len = len-12;
    }

  /*------------------------------------- handle the last 11 bytes */
  c = c+length;
  switch(len)              /* all the case statements fall through */
    {
    case 11: c=c+((u4)k[10]<<24);
    case 10: c=c+((u4)k[9]<<16);
    case 9 : c=c+((u4)k[8]<<8);
      /* the first byte of c is reserved for the length */
    case 8 : b=b+((u4)k[7]<<24);
    case 7 : b=b+((u4)k[6]<<16);
    case 6 : b=b+((u4)k[5]<<8);
    case 5 : b=b+k[4];
    case 4 : a=a+((u4)k[3]<<24);
    case 3 : a=a+((u4)k[2]<<16);
    case 2 : a=a+((u4)k[1]<<8);
    case 1 : a=a+k[0];
      /* case 0: nothing left to add */
    }
  mix(a,b,c);
  return c;
}


> > Yes, although the reimplementation usually just consists of calling the
> > same hash functions with different parameters. After all, we don't
> > really care what we are hashing, all we need to know is the location
> > and size of the key, so one good hash function is really all you need.
> 
> Hmmm... So if you have keys you can't break up? I mean, I'm not sure how
> you would make a generic hash function that would hash any record, or
> any array, or any private type, or ...
> 

Sure, in that case you will have to call the hash function for each
part of the key.  But in that case, wouldn't you also need to define a
comparision operator if you used a tree?

> In C it's easy - you cast the pointer to a char* and hash, and hope
> there's no randomly-initialized padding in the middle. I don't know

In Ada, you can get rid of any packing if you insist, at least more
portably than in C.

> how to do this in Ada, tho - it would seem that every private type
> (for example) would have to implement its own hash
> function. Wouldn't it?
> 

Hmm. What's wrong with having a generic hash function which you
instantiate for your private type? Provided that you can read the key
as an array of bytes, it should work the same in both C and Ada. I
know this is low-level and dirty, but for hashing, I think it's a
reasonable approach. 

> -- 
> Darren New 
> San Diego, CA, USA (PST). Cryptokeys on demand.
>    The 90/10 rule of toothpaste: the last 10% of 
>          the tube lasts as long as the first 90%.

BTW., I'm not arguing for hash tables everywhere. I'm just saying
there are some appliactions where O(log N) just isn't good enough, and that
hash tables *in practice* is what you need.