comp.lang.ada
 help / color / mirror / Atom feed
From: "Dan'l Miller" <optikos@verizon.net>
Subject: Re: B-tree performance
Date: Tue, 3 Sep 2013 09:14:19 -0700 (PDT)
Date: 2013-09-03T09:14:19-07:00	[thread overview]
Message-ID: <31834951-571f-4361-b18a-e0abde8fa219@googlegroups.com> (raw)
In-Reply-To: <9fc26b8a-7a5c-4941-81cb-8f0e9c17a660@googlegroups.com>

On Saturday, August 31, 2013 2:14:22 AM UTC-5, Peter Brooks wrote:
> I'm interested to know if anybody has any real-world evidence for this hypothesis, or if they agree with my gut feel.

I disagree with the direction that your gut feel took you.  I will diverge from your original posting at the appropriate points below.

> If I build a B-tree, then the conventional means to search it would be to start in the middle, then cascade left and right following the pointer structure. That's slow because most searches won't turn up a value in the first N levels of the b-tree.
> 
> So it'd make sense to have [...snip...]

So it would make sense to not use a B-tree.  Indeed, it would make sense to not restart the comparison again at each node, because it wastes time and bloats memory-space.  Hence, it would make sense to use a radix trie.  http://en.wikipedia.org/wiki/Radix_trie

Because a URI is a sequence of ISO10646/Unicode characters (i.e., a textual string), PATRICIA tries (i.e., radix tries with radix = 2) might be an option to minimize sparseness, which tends to lessen space utilization, when compared to too-large of radices, i.e., too large of powers of two, which cause too sparsely-branching nodes of an excessively n-ary trie.

> The question is how many levels would it make sense to keep in the array.  [...snip...]

Hence, it would make sense to use a radix trie, with a small radix, say, 2 or 4 that fits into your processor's level-1 or level-2 cache-line of 32B or 64B.  Optimizing for a 64b addresses (i.e., 8B) and a 32B level-1 cache-line, 4 of those addresses would fit into one cache-line, so radix = 4, which means compare the key 2 bits at a time for a 4-ary branching at each node, which means a O(log-base-4 n) search, a O(log-base-4 n) insertion, and a O(log-base-4 n) deletion, where n is the maximum quantity of bit-pairs in the URI string, which is 4 times string byte-length if the URI is in UTF8 encoding.

(Btw, yeah, I know that I should eliminate the base from the logarithm in Bachmann-Landau big-omicron notation, but I am emphasizing than radix=4 has typically half the work to accomplish than radix=2, while not increasing the quantity of cache-lines per node.  And, yes, I know that many people were taught incorrectly that upper-bound growth rate is Latin capital letter O, not Greek capital letter Omicron, but I am loyal to Landau's original seminal references in 1909 through 1914.)  https://en.wikipedia.org/wiki/Bachmann-Landau_notation#Family_of_Bachmann.E2.80.93Landau_notations


  parent reply	other threads:[~2013-09-03 16:14 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-31  7:14 B-tree performance Peter Brooks
2013-08-31 15:26 ` Shark8
2013-08-31 15:47   ` Peter Brooks
2013-09-01  4:57 ` Niklas Holsti
2013-09-01  6:05   ` Peter Brooks
2013-09-01  8:05     ` Niklas Holsti
2013-09-01  8:38       ` Peter Brooks
2013-09-02  4:28         ` Niklas Holsti
2013-09-03  2:15           ` Peter Brooks
2013-09-05  8:25         ` Memory mapping and network file systems (Was: B-tree performance) Jacob Sparre Andersen
2013-09-05 10:12           ` Peter Brooks
2013-09-03 16:14 ` Dan'l Miller [this message]
2013-09-03 17:43   ` B-tree performance Peter Brooks
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox