From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=unavailable autolearn_force=no version=3.4.4 Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!news.eternal-september.org!news.eternal-september.org!feeder.eternal-september.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail From: Niklas Holsti Newsgroups: comp.lang.ada Subject: Re: B-tree performance Date: Sun, 01 Sep 2013 11:05:27 +0300 Organization: Tidorum Ltd Message-ID: References: <9fc26b8a-7a5c-4941-81cb-8f0e9c17a660@googlegroups.com> <2383e534-646c-4500-ad05-40caf9460c35@googlegroups.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Trace: individual.net emIng497qe4uy7f0wMT/6QG+3DA1qufo4X9bQ7ArX2+1u381PX Cancel-Lock: sha1:ktr+30Eg3S+OnNoxkQ5WInqth+A= User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:17.0) Gecko/20130801 Thunderbird/17.0.8 In-Reply-To: <2383e534-646c-4500-ad05-40caf9460c35@googlegroups.com> Xref: news.eternal-september.org comp.lang.ada:17075 Date: 2013-09-01T11:05:27+03:00 List-Id: On 13-09-01 09:05 , Peter Brooks wrote: > On Sunday, 1 September 2013 06:57:00 UTC+2, Niklas Holsti wrote: >> >> >> I find your question hard to understand. Are you really talking >> about B-trees (http://en.wikipedia.org/wiki/B-tree), with fan-out >> typically much larger than 2? Or are you talking about binary trees >> with 2-way fan-out? Your phrase "left and right" suggests the >> latter. >> > Yes, I think you're right. Though, because of clashes, leaf nodes can > have many members, so it isn't quite a binary tree. Ah... you are first computing a hash-value from the actual key, and then looking up the hash-value in a tree? That is an unusual way to use hashing. Usually, the hash-value is arithmetically converted into a direct "hash index" of the hash-table (for example, using "hash_value mod hash_table_size"), and the hash-table entry at that index contains all objects that have this hash-index (often as a linked list, but other ways exist). It is this direct computation of an array index that makes hashing fast, compared with containers that need some kind of search. See http://en.wikipedia.org/wiki/Hash_table. (It is also the reason why there is little need to go beyond 32 bits for the hash-value size, since the hash-value will be folded into a much smaller array index anyway.) If the average number of collisions (= objects with the same hash-index) is small, a linear traversal of the collision set is fast enough to locate the object with the desired actual key. If the number of collisions is large, it means that the hash-table is too small (or the hash function is bad). Hashed containers typically increase the hash-table size automatically when the frequency of collisions exceeds some threshold. If for some reason you want to the keep the hash-table small, although this leads to many collisions, I suppose you could organize the colliding object sets into some kind of keyed containers, for example B-trees, but then you should use the actual key to index these containers, not the hash-value. I assume your data structures will be in main memory. If some parts are in secondary memory (disk files) there are more trade-offs to consider. > My question is when you get diminishing returns from using a sorted > array. Or, equivalently, what would be the optimal size of the array > of key values? Either make up an analytical model (which will of course depend on lots of parameters of the problem and processor) or experiment and measure. I would do the latter. But it will probably also depend on the pattern of accesses. For example, if the accesses have temporal locality, using some kind of "move to front" heuristic on the lists of colliding objects can help. -- Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ .