From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=unavailable autolearn_force=no version=3.4.4 X-Received: by 10.224.52.6 with SMTP id f6mr1265231qag.2.1378224859317; Tue, 03 Sep 2013 09:14:19 -0700 (PDT) X-Received: by 10.49.48.17 with SMTP id h17mr33476qen.40.1378224859294; Tue, 03 Sep 2013 09:14:19 -0700 (PDT) Path: border1.nntp.dca3.giganews.com!border3.nntp.dca.giganews.com!border1.nntp.dca.giganews.com!nntp.giganews.com!q10no2524838qai.0!news-out.google.com!p7ni567qas.0!nntp.google.com!j7no150195qai.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.lang.ada Date: Tue, 3 Sep 2013 09:14:19 -0700 (PDT) In-Reply-To: <9fc26b8a-7a5c-4941-81cb-8f0e9c17a660@googlegroups.com> Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=64.183.207.38; posting-account=zwxLlwoAAAChLBU7oraRzNDnqQYkYbpo NNTP-Posting-Host: 64.183.207.38 References: <9fc26b8a-7a5c-4941-81cb-8f0e9c17a660@googlegroups.com> User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: <31834951-571f-4361-b18a-e0abde8fa219@googlegroups.com> Subject: Re: B-tree performance From: "Dan'l Miller" Injection-Date: Tue, 03 Sep 2013 16:14:19 +0000 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Original-Bytes: 3827 Xref: number.nntp.dca.giganews.com comp.lang.ada:183266 Date: 2013-09-03T09:14:19-07:00 List-Id: On Saturday, August 31, 2013 2:14:22 AM UTC-5, Peter Brooks wrote: > I'm interested to know if anybody has any real-world evidence for this hy= pothesis, or if they agree with my gut feel. I disagree with the direction that your gut feel took you. I will diverge = from your original posting at the appropriate points below. > If I build a B-tree, then the conventional means to search it would be to= start in the middle, then cascade left and right following the pointer str= ucture. That's slow because most searches won't turn up a value in the firs= t N levels of the b-tree. >=20 > So it'd make sense to have [...snip...] So it would make sense to not use a B-tree. Indeed, it would make sense to= not restart the comparison again at each node, because it wastes time and = bloats memory-space. Hence, it would make sense to use a radix trie. http= ://en.wikipedia.org/wiki/Radix_trie Because a URI is a sequence of ISO10646/Unicode characters (i.e., a textual= string), PATRICIA tries (i.e., radix tries with radix =3D 2) might be an o= ption to minimize sparseness, which tends to lessen space utilization, when= compared to too-large of radices, i.e., too large of powers of two, which = cause too sparsely-branching nodes of an excessively n-ary trie. > The question is how many levels would it make sense to keep in the array.= [...snip...] Hence, it would make sense to use a radix trie, with a small radix, say, 2 = or 4 that fits into your processor's level-1 or level-2 cache-line of 32B o= r 64B. Optimizing for a 64b addresses (i.e., 8B) and a 32B level-1 cache-l= ine, 4 of those addresses would fit into one cache-line, so radix =3D 4, wh= ich means compare the key 2 bits at a time for a 4-ary branching at each no= de, which means a O(log-base-4 n) search, a O(log-base-4 n) insertion, and = a O(log-base-4 n) deletion, where n is the maximum quantity of bit-pairs in= the URI string, which is 4 times string byte-length if the URI is in UTF8 = encoding. (Btw, yeah, I know that I should eliminate the base from the logarithm in B= achmann-Landau big-omicron notation, but I am emphasizing than radix=3D4 ha= s typically half the work to accomplish than radix=3D2, while not increasin= g the quantity of cache-lines per node. And, yes, I know that many people = were taught incorrectly that upper-bound growth rate is Latin capital lette= r O, not Greek capital letter Omicron, but I am loyal to Landau's original = seminal references in 1909 through 1914.) https://en.wikipedia.org/wiki/Ba= chmann-Landau_notation#Family_of_Bachmann.E2.80.93Landau_notations