From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,ac39a12d5faf5b14 X-Google-Attributes: gid103376,public X-Google-ArrivalTime: 2002-04-24 08:02:43 PST Path: archiver1.google.com!news1.google.com!newsfeed.stanford.edu!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!xmission!news-out.spamkiller.net!propagator2-maxim!propagator-maxim!news-in.spamkiller.net!out.nntp.be!propagator-SanJose!in.nntp.be!dax.net!juliett.dax.net!not-for-mail Newsgroups: comp.lang.ada Subject: Re: Grace and Maps (was Re: Development process in the Ada community) References: <3CB46975.90408@snafu.de> <3CBAFFEE.2080708@snafu.de> <4519e058.0204171036.6f0a7394@posting.google.com> <3CBDD795.4060706@snafu.de> <4519e058.0204180800.44fac012@posting.google.com> <3CBF0341.8020406@mail.com> <4519e058.0204190529.559a47ae@posting.google.com> <3CC1C6B3.6060306@telepath.com> <3CC21747.5000501@telepath.com> <4519e058.0204220534.2eb33730@posting.go <3CC48F34.5A474E0F@boeing.com> <3CC49C50.485AE213@san.rr.com> <3CC4B5C0.4D16077C@acm.org> <3CC4E9DA.E02BE0DA@san.rr.com> <7vznzukhzf.fsf@vlinux.voxelvision.no> <3CC5B161.C719C5D8@san.rr.com> <7vg01lwaku.fsf@vlinux.voxelvision.no> <3CC6B932.75C468E1@san.rr.com> From: Ole-Hjalmar Kristensen Message-ID: <7vbsc9w367.fsf@vlinux.voxelvision.no> User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.1 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Wed, 24 Apr 2002 15:02:09 GMT NNTP-Posting-Host: 193.216.12.150 X-Complaints-To: abuse@tele2.no X-Trace: juliett.dax.net 1019660529 193.216.12.150 (Wed, 24 Apr 2002 17:02:09 MET DST) NNTP-Posting-Date: Wed, 24 Apr 2002 17:02:09 MET DST Organization: Tele2 Norway AS Public Access Xref: archiver1.google.com comp.lang.ada:23057 Date: 2002-04-24T15:02:09+00:00 List-Id: Darren New writes: > Ole-Hjalmar Kristensen wrote: > > I agree, not the thing you would want in a hard real time > > safety-critical system. Buf for soft real time, non saftey critical > > systems it's good enough. And it really can make a difference in > > throughput in for instance applications like relational algebra > > processing. > > Well, there's also the question of the malicious user, who intentionally > inserts a lot of items that hash to the same value (over a web > interface, say) and thereby breaks something. > OK, but how does this malicious user manage to do this when I'm using a hash function which on average changes half the bits of its output for any single bit change in its input? I'm not saying it's impossible, but it seems pretty far-fetched to me. Of course, if he's got the source to my program and is good at inverting hash functions he can do it. In particular (pardon the C code, but as you say yourself, in C it's easy), how do you construct your keys so they all hash to the same value given this particular hash function? : /* The mixing step */ #define mix(a,b,c) \ { \ a=a-b; a=a-c; a=a^(c>>13); \ b=b-c; b=b-a; b=b^(a<<8); \ c=c-a; c=c-b; c=c^(b>>13); \ a=a-b; a=a-c; a=a^(c>>12); \ b=b-c; b=b-a; b=b^(a<<16); \ c=c-a; c=c-b; c=c^(b>>5); \ a=a-b; a=a-c; a=a^(c>>3); \ b=b-c; b=b-a; b=b^(a<<10); \ c=c-a; c=c-b; c=c^(b>>15); \ } /* The whole new hash function */ inline u4 burtle( register u1 *k, /* the key */ u4 length, /* the length of the key in bytes */ u4 initval /* the previous hash, or an arbitrary value */ ) { register u4 a,b,c; /* the internal state */ u4 len; /* how many key bytes still need mixing */ /* Set up the internal state */ len = length; a = b = 0x9e3779b9; /* the golden ratio; an arbitrary value */ c = initval; /* variable initialization of internal state */ /*---------------------------------------- handle most of the key */ while (len >= 12) { a=a+(k[0]+((u4)k[1]<<8)+((u4)k[2]<<16) +((u4)k[3]<<24)); b=b+(k[4]+((u4)k[5]<<8)+((u4)k[6]<<16) +((u4)k[7]<<24)); c=c+(k[8]+((u4)k[9]<<8)+((u4)k[10]<<16)+((u4)k[11]<<24)); mix(a,b,c); k = k+12; len = len-12; } /*------------------------------------- handle the last 11 bytes */ c = c+length; switch(len) /* all the case statements fall through */ { case 11: c=c+((u4)k[10]<<24); case 10: c=c+((u4)k[9]<<16); case 9 : c=c+((u4)k[8]<<8); /* the first byte of c is reserved for the length */ case 8 : b=b+((u4)k[7]<<24); case 7 : b=b+((u4)k[6]<<16); case 6 : b=b+((u4)k[5]<<8); case 5 : b=b+k[4]; case 4 : a=a+((u4)k[3]<<24); case 3 : a=a+((u4)k[2]<<16); case 2 : a=a+((u4)k[1]<<8); case 1 : a=a+k[0]; /* case 0: nothing left to add */ } mix(a,b,c); return c; } > > Yes, although the reimplementation usually just consists of calling the > > same hash functions with different parameters. After all, we don't > > really care what we are hashing, all we need to know is the location > > and size of the key, so one good hash function is really all you need. > > Hmmm... So if you have keys you can't break up? I mean, I'm not sure how > you would make a generic hash function that would hash any record, or > any array, or any private type, or ... > Sure, in that case you will have to call the hash function for each part of the key. But in that case, wouldn't you also need to define a comparision operator if you used a tree? > In C it's easy - you cast the pointer to a char* and hash, and hope > there's no randomly-initialized padding in the middle. I don't know In Ada, you can get rid of any packing if you insist, at least more portably than in C. > how to do this in Ada, tho - it would seem that every private type > (for example) would have to implement its own hash > function. Wouldn't it? > Hmm. What's wrong with having a generic hash function which you instantiate for your private type? Provided that you can read the key as an array of bytes, it should work the same in both C and Ada. I know this is low-level and dirty, but for hashing, I think it's a reasonable approach. > -- > Darren New > San Diego, CA, USA (PST). Cryptokeys on demand. > The 90/10 rule of toothpaste: the last 10% of > the tube lasts as long as the first 90%. BTW., I'm not arguing for hash tables everywhere. I'm just saying there are some appliactions where O(log N) just isn't good enough, and that hash tables *in practice* is what you need.