From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: 103376,ac39a12d5faf5b14
X-Google-Attributes: gid103376,public
X-Google-ArrivalTime: 2002-04-24 12:36:15 PST
Path: 
 archiver1.google.com!news2.google.com!news1.google.com!newsfeed.stanford.edu!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!easynet-tele!easynet-melon!easynet.net!uio.no!193.216.69.35.MISMATCH!dax.net!juliett.dax.net!not-for-mail
Newsgroups: comp.lang.ada
Subject: Re: Grace and Maps (was Re: Development process in the Ada community)
References: <3CB46975.90408@snafu.de> <3CBAFFEE.2080708@snafu.de>
 <4519e058.0204171036.6f0a7394@posting.google.com> <3CBDD795.4060706@snafu.de>
 <4519e058.0204180800.44fac012@posting.google.com> <3CBF0341.8020406@mail.com>
 <a9n0tk$6fs$1@nh.pace.co.uk>
 <4519e058.0204190529.559a47ae@posting.google.com>
 <a9p98h$8hn$1@nh.pace.co.uk> <3CC1C6B3.6060306@telepath.com>
 <3CC21747.5000501@telepath.com> <V+UgdHreGTJg@eisner.encompasserve.org>
 <x7vr8l871it.fsf@smaug.pushface.org> <4519e058.0204220534.2eb33730@posting.go
 <3CC48F34.5A474E0F@boeing.com> <3CC49C50.485AE213@san.rr.com>
 <3CC4B5C0.4D16077C@acm.org> <3CC4E9DA.E02BE0DA@san.rr.com>
 <7vznzukhzf.fsf@vlinux.voxelvision.no> <3CC5B161.C719C5D8@san.rr.com>
 <7vg01lwaku.fsf@vlinux.voxelvision.no> <3CC6B932.75C468E1@san.rr.com>
 <7vbsc9w367.fsf@vlinux.voxelvision.no> <3CC6CED0.C5E203F9@san.rr.com>
From: Ole-Hjalmar Kristensen <oleh@vlinux.voxelvision.no>
Message-ID: <7v3cxkx57y.fsf@vlinux.voxelvision.no>
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.1
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Wed, 24 Apr 2002 19:32:34 GMT
NNTP-Posting-Host: 193.216.12.150
X-Complaints-To: abuse@tele2.no
X-Trace: juliett.dax.net 1019676754 193.216.12.150 (Wed,
 24 Apr 2002 21:32:34 MET DST)
NNTP-Posting-Date: Wed, 24 Apr 2002 21:32:34 MET DST
Organization: Tele2 Norway AS Public Access
Xref: archiver1.google.com comp.lang.ada:23075
Date: 2002-04-24T19:32:34+00:00
List-Id: <comp.lang.ada>

Darren New <dnew@san.rr.com> writes:

> Ole-Hjalmar Kristensen wrote:
> > OK, but how does this malicious user manage to do this when I'm using
> > a hash function which on average changes half the bits of its output
> > for any single bit change in its input? 
> 
> It's called the birthday paradox. You hash enough values, you're going
> to find collisions. 
> 
> > I'm not saying it's
> > impossible, but it seems pretty far-fetched to me.
> 
> It seemed far-fetched to me that someone would disassemble IIS or XP in
> order to find unchecked buffer overflows, but they managed within days
> of the release of the software, too.
> 
> > Of course, if he's got the source to my program 
> 
> I kind of thought that was the point of this exercise - to make source,
> yes?
> 
> >and is good at inverting hash functions
> > he can do it.  In particular (pardon the C code, but as you say
> > yourself, in C it's easy), how do you construct your keys so they all
> > hash to the same value given this particular hash function? :
> 
> Well, it's hard to read when it isn't complete C, but assuming "u4" is
> an unsigned integer four bytes long, it's pretty easy.
> 
> /* Crappy pseudo-C... This should find about 2**16 collisions. */
> for (i = 0; i < 2**32; i++) {
>   if (burtle(i, ...) == 27) {
>     printf("Found one that hashes to 27: %ul\n", i);
>   }
> }

I'm curious why you expect this to produce 2**16 collisions. The input
has a range of 2**32, but so has the output. Of course, when you fold
it to a shorter table, you will get collisions, so I would expect
2**16 collisions only if my hash link table has 2**16 elements.  But
yes, I see your point that a determined malicious user might make some very
long hash chains. This may or may not be fatal to the system
> 
> > > Hmmm... So if you have keys you can't break up? I mean, I'm not sure how
> > > you would make a generic hash function that would hash any record, or
> > > any array, or any private type, or ...
> > >
> > 
> > Sure, in that case you will have to call the hash function for each
> > part of the key.  But in that case, wouldn't you also need to define a
> > comparision operator if you used a tree?
> 
> I think there are many more private types that can be ordered than those
> that can be hashed, in Ada at least. Most of the fundamental types in
> Ada come with ordering operators defined, while few come with hashes
> defined. (In Java, "hash" is defined everywhere, for just this reason.)
>  
> > In Ada, you can get rid of any packing if you insist, at least more
> > portably than in C.
> 
> Well, yeah, but I don't think you'd want to.
>  
> > Hmm. What's wrong with having a generic hash function which you
> > instantiate for your private type? Provided that you can read the key
> > as an array of bytes, it should work the same in both C and Ada.
> 
> Well, yeah, but I dont think you can. Doing so would mean that you're
> dependent on the packing of the structure, for example. If your private
> type contains pointers or gaps in the structure or fields that don't
> participate in an equality comparison anything like that, your hash is
> going to fail. Remember that you not only have to generate a hash, but
> that the hash has to be the same for all values that are considered
> equal. If you have a string implementation that has a length count but
> leaves junk at the end of the array of characters, a hash based on the
> physical contents of the record isn't going to work.
> 

Neither is a predefined relational operator for the array likely to work.

According to my admittedly rusty memory, the relational operators are
only defined for scalar types and discrete array types, so I cannot
see how you can avoid defining your own for record types. So trees
have the same problem, really.

> > BTW., I'm not arguing for hash tables everywhere. I'm just saying
> > there are some appliactions where O(log N) just isn't good enough, and that
> > hash tables *in practice* is what you need.
> 
> Agreed. There are many applications where you can control the kind of
> key and almost never want the results softed, in which hash tables are
> perfect. Symbol tables leap to mind. 
> 

Also join processing, hash filters and mappings from disk blocks to
buffers to name a few examples of applications I am familiar with. But
those applications are probably so specialized that most programmers
would write their own hash anyway.

> Perhaps the hashtable would be the third data structure, after sorted
> maps. :-) Hashtables are actually pretty simple to do, if you make the
> user supply the hash value and don't make keys or values limited. A hash
> table with implementations specifically having keys being the various
> string types would probably satisfy 90% of the need for hash tables, as
> well. There aren't a whole lot of operations on a hash table, so it's
> pretty easy to put one together, methinks.
> 
> -- 
> Darren New 
> San Diego, CA, USA (PST). Cryptokeys on demand.
>    The 90/10 rule of toothpaste: the last 10% of 
>          the tube lasts as long as the first 90%.

I have usually taken the easy way out and required the existence of a
hash function for any type which is used as a key. Given that
something like the hash function above is available, the hash function
itself for a new type usually reduces to a one-liner.