From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,ac39a12d5faf5b14 X-Google-Attributes: gid103376,public X-Google-ArrivalTime: 2002-04-24 12:36:15 PST Path: archiver1.google.com!news2.google.com!news1.google.com!newsfeed.stanford.edu!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!easynet-tele!easynet-melon!easynet.net!uio.no!193.216.69.35.MISMATCH!dax.net!juliett.dax.net!not-for-mail Newsgroups: comp.lang.ada Subject: Re: Grace and Maps (was Re: Development process in the Ada community) References: <3CB46975.90408@snafu.de> <3CBAFFEE.2080708@snafu.de> <4519e058.0204171036.6f0a7394@posting.google.com> <3CBDD795.4060706@snafu.de> <4519e058.0204180800.44fac012@posting.google.com> <3CBF0341.8020406@mail.com> <4519e058.0204190529.559a47ae@posting.google.com> <3CC1C6B3.6060306@telepath.com> <3CC21747.5000501@telepath.com> <4519e058.0204220534.2eb33730@posting.go <3CC48F34.5A474E0F@boeing.com> <3CC49C50.485AE213@san.rr.com> <3CC4B5C0.4D16077C@acm.org> <3CC4E9DA.E02BE0DA@san.rr.com> <7vznzukhzf.fsf@vlinux.voxelvision.no> <3CC5B161.C719C5D8@san.rr.com> <7vg01lwaku.fsf@vlinux.voxelvision.no> <3CC6B932.75C468E1@san.rr.com> <7vbsc9w367.fsf@vlinux.voxelvision.no> <3CC6CED0.C5E203F9@san.rr.com> From: Ole-Hjalmar Kristensen Message-ID: <7v3cxkx57y.fsf@vlinux.voxelvision.no> User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.1 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Wed, 24 Apr 2002 19:32:34 GMT NNTP-Posting-Host: 193.216.12.150 X-Complaints-To: abuse@tele2.no X-Trace: juliett.dax.net 1019676754 193.216.12.150 (Wed, 24 Apr 2002 21:32:34 MET DST) NNTP-Posting-Date: Wed, 24 Apr 2002 21:32:34 MET DST Organization: Tele2 Norway AS Public Access Xref: archiver1.google.com comp.lang.ada:23075 Date: 2002-04-24T19:32:34+00:00 List-Id: Darren New writes: > Ole-Hjalmar Kristensen wrote: > > OK, but how does this malicious user manage to do this when I'm using > > a hash function which on average changes half the bits of its output > > for any single bit change in its input? > > It's called the birthday paradox. You hash enough values, you're going > to find collisions. > > > I'm not saying it's > > impossible, but it seems pretty far-fetched to me. > > It seemed far-fetched to me that someone would disassemble IIS or XP in > order to find unchecked buffer overflows, but they managed within days > of the release of the software, too. > > > Of course, if he's got the source to my program > > I kind of thought that was the point of this exercise - to make source, > yes? > > >and is good at inverting hash functions > > he can do it. In particular (pardon the C code, but as you say > > yourself, in C it's easy), how do you construct your keys so they all > > hash to the same value given this particular hash function? : > > Well, it's hard to read when it isn't complete C, but assuming "u4" is > an unsigned integer four bytes long, it's pretty easy. > > /* Crappy pseudo-C... This should find about 2**16 collisions. */ > for (i = 0; i < 2**32; i++) { > if (burtle(i, ...) == 27) { > printf("Found one that hashes to 27: %ul\n", i); > } > } I'm curious why you expect this to produce 2**16 collisions. The input has a range of 2**32, but so has the output. Of course, when you fold it to a shorter table, you will get collisions, so I would expect 2**16 collisions only if my hash link table has 2**16 elements. But yes, I see your point that a determined malicious user might make some very long hash chains. This may or may not be fatal to the system > > > > Hmmm... So if you have keys you can't break up? I mean, I'm not sure how > > > you would make a generic hash function that would hash any record, or > > > any array, or any private type, or ... > > > > > > > Sure, in that case you will have to call the hash function for each > > part of the key. But in that case, wouldn't you also need to define a > > comparision operator if you used a tree? > > I think there are many more private types that can be ordered than those > that can be hashed, in Ada at least. Most of the fundamental types in > Ada come with ordering operators defined, while few come with hashes > defined. (In Java, "hash" is defined everywhere, for just this reason.) > > > In Ada, you can get rid of any packing if you insist, at least more > > portably than in C. > > Well, yeah, but I don't think you'd want to. > > > Hmm. What's wrong with having a generic hash function which you > > instantiate for your private type? Provided that you can read the key > > as an array of bytes, it should work the same in both C and Ada. > > Well, yeah, but I dont think you can. Doing so would mean that you're > dependent on the packing of the structure, for example. If your private > type contains pointers or gaps in the structure or fields that don't > participate in an equality comparison anything like that, your hash is > going to fail. Remember that you not only have to generate a hash, but > that the hash has to be the same for all values that are considered > equal. If you have a string implementation that has a length count but > leaves junk at the end of the array of characters, a hash based on the > physical contents of the record isn't going to work. > Neither is a predefined relational operator for the array likely to work. According to my admittedly rusty memory, the relational operators are only defined for scalar types and discrete array types, so I cannot see how you can avoid defining your own for record types. So trees have the same problem, really. > > BTW., I'm not arguing for hash tables everywhere. I'm just saying > > there are some appliactions where O(log N) just isn't good enough, and that > > hash tables *in practice* is what you need. > > Agreed. There are many applications where you can control the kind of > key and almost never want the results softed, in which hash tables are > perfect. Symbol tables leap to mind. > Also join processing, hash filters and mappings from disk blocks to buffers to name a few examples of applications I am familiar with. But those applications are probably so specialized that most programmers would write their own hash anyway. > Perhaps the hashtable would be the third data structure, after sorted > maps. :-) Hashtables are actually pretty simple to do, if you make the > user supply the hash value and don't make keys or values limited. A hash > table with implementations specifically having keys being the various > string types would probably satisfy 90% of the need for hash tables, as > well. There aren't a whole lot of operations on a hash table, so it's > pretty easy to put one together, methinks. > > -- > Darren New > San Diego, CA, USA (PST). Cryptokeys on demand. > The 90/10 rule of toothpaste: the last 10% of > the tube lasts as long as the first 90%. I have usually taken the easy way out and required the existence of a hash function for any type which is used as a key. Given that something like the hash function above is available, the hash function itself for a new type usually reduces to a one-liner.