From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=unavailable autolearn_force=no version=3.4.4 X-Received: by 10.180.189.137 with SMTP id gi9mr179508wic.1.1378269144051; Tue, 03 Sep 2013 21:32:24 -0700 (PDT) Path: border1.nntp.ams3.giganews.com!border2.nntp.ams3.giganews.com!border2.nntp.ams2.giganews.com!border4.nntp.ams.giganews.com!nntp.giganews.com!hk9no6782553wib.1!news-out.google.com!ed8ni108535wic.0!nntp.google.com!proxad.net!feeder1-2.proxad.net!feeder.erje.net!eu.feeder.erje.net!eternal-september.org!feeder.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Paul Rubin Newsgroups: comp.lang.ada Subject: Re: Hash Type Size Date: Tue, 03 Sep 2013 21:50:18 -0700 Organization: Nightsong/Fort GNOX Message-ID: <7xppspkx6d.fsf@ruckus.brouhaha.com> References: <1679ec49-424b-43bd-8f35-a5f69e658112@googlegroups.com> <7aa26916-cde1-46f8-9f49-d9ebcc2dee93@googlegroups.com> <782ef090-7299-4164-b4e5-14a06d1c1a44@googlegroups.com> <8268e85c-e372-4883-8449-ef5253e2c77e@googlegroups.com> Mime-Version: 1.0 Injection-Info: mx05.eternal-september.org; posting-host="d94d289a4df6ae47ea4d4f8b2ae808e7"; logging-data="18450"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19VD+rsgej4iyRbCQ45Wgaj" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux) Cancel-Lock: sha1:IK/JrUpOgoshOpZVZk8Byb3y2Dw= sha1:0jqx1RJcar3zmeH1h2MkH+mhyOc= Content-Type: text/plain; charset=us-ascii X-Original-Bytes: 2267 Xref: number.nntp.dca.giganews.com comp.lang.ada:183272 Date: 2013-09-03T21:50:18-07:00 List-Id: Peter Brooks writes: >> [1] > Thank you for that -it's extremely useful. 32% collisions is far too > high for me, so I certainly need a better hash function. It doesn't say how many words are in the dictionary, so it doesn't tell you anything. A hash function should approximate a random function, which means you get collisions starting around sqrt(number of slots in the table). Re "tailoring the hash function to the data", this is called perfect hashing and there's lots of stuff online about how to do it (see google). It's useful in some situations, but not that many. What is your actual application?