* B-tree performance @ 2013-08-31 7:14 Peter Brooks 2013-08-31 15:26 ` Shark8 ` (2 more replies) 0 siblings, 3 replies; 13+ messages in thread From: Peter Brooks @ 2013-08-31 7:14 UTC (permalink / raw) I'm interested to know if anybody has any real-world evidence for this hypothesis, or if they agree with my gut feel. If I build a B-tree, then the conventional means to search it would be to start in the middle, then cascade left and right following the pointer structure. That's slow because most searches won't turn up a value in the first N levels of the b-tree. So it'd make sense to have an array which contains the address values of the B-tree down to a certain level. That way, for a look up, you search the array, which is nice and quick, and this leads you to the correct sub-tree and you then continue as usual, from the top of that sub-tree. The question is how many levels would it make sense to keep in the array. Too many and the array itself consumes too many resources and slows things down. Too few and you don't gain much, particularly if the B-tree is sparse. The sizes of the buckets, or sub-b-trees would be: Level Buckets 1 18446744073709551616 3 9223372036854775808 5 4611686018427387904 9 2305843009213693952 17 1152921504606846976 33 576460752303423488 65 576460752303423488 129 288230376151711744 257 144115188075855872 513 72057594037927936 1025 36028797018963968 2049 18014398509481984 4097 9007199254740992 8193 2251799813685248 16385 1125899906842624 32769 562949953421312 65537 140737488355328 131073 70368744177664 262145 35184372088832 524289 17592186044416 1048577 8796093022208 2097153 4398046511104 4194305 2199023255552 8388609 1099511627776 16777217 549755813888 33554433 274877906944 67108865 137438953472 134217729 68719476736 268435457 34359738368 536870913 17179869184 1073741825 8589934592 2147483649 4294967296 4294967297 2147483648 8589934593 1073741824 13 levels would, I think, speed things up considerably, with only 4k's worth of overhead. How much deeper would it make sense to go before you get diminishing returns? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: B-tree performance 2013-08-31 7:14 B-tree performance Peter Brooks @ 2013-08-31 15:26 ` Shark8 2013-08-31 15:47 ` Peter Brooks 2013-09-01 4:57 ` Niklas Holsti 2013-09-03 16:14 ` B-tree performance Dan'l Miller 2 siblings, 1 reply; 13+ messages in thread From: Shark8 @ 2013-08-31 15:26 UTC (permalink / raw) I think that the correct optimization here would be highly dependent on the information you're dealing with. If, for example, you're handling a dictionary or phone-book, you could index off of the first letter [or two] and use that as your starting-range's endpoint. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: B-tree performance 2013-08-31 15:26 ` Shark8 @ 2013-08-31 15:47 ` Peter Brooks 0 siblings, 0 replies; 13+ messages in thread From: Peter Brooks @ 2013-08-31 15:47 UTC (permalink / raw) On Saturday, 31 August 2013 17:26:41 UTC+2, Shark8 wrote: > I think that the correct optimization here would be highly dependent on the information you're dealing with. If, for example, you're handling a dictionary or phone-book, you could index off of the first letter [or two] and use that as your starting-range's endpoint. > Yes, that's true. I'll be working with the hash codes of RDF URIs, which should be uniformly dense - well, actually, they will be because I'll be experimenting with hash functions to find one that is. I suppose that I'll just have to try a few options an time the result. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: B-tree performance 2013-08-31 7:14 B-tree performance Peter Brooks 2013-08-31 15:26 ` Shark8 @ 2013-09-01 4:57 ` Niklas Holsti 2013-09-01 6:05 ` Peter Brooks 2013-09-03 16:14 ` B-tree performance Dan'l Miller 2 siblings, 1 reply; 13+ messages in thread From: Niklas Holsti @ 2013-09-01 4:57 UTC (permalink / raw) On 13-08-31 10:14 , Peter Brooks wrote: > I'm interested to know if anybody has any real-world evidence for > this hypothesis, or if they agree with my gut feel. > > If I build a B-tree, then the conventional means to search it would > be to start in the middle, then cascade left and right following the > pointer structure. That's slow because most searches won't turn up a > value in the first N levels of the b-tree. I find your question hard to understand. Are you really talking about B-trees (http://en.wikipedia.org/wiki/B-tree), with fan-out typically much larger than 2? Or are you talking about binary trees with 2-way fan-out? Your phrase "left and right" suggests the latter. > So it'd make sense to have an array which contains the address values > of the B-tree down to a certain level. That way, for a look up, you > search the array, which is nice and quick, and this leads you to the > correct sub-tree and you then continue as usual, from the top of that > sub-tree. If you really mean B-trees, it seems to me that you could just increase the fan-out of the B-tree to get the same effect. With a fan-out of say 4000, the root of the B-tree consists of a sorted array of 4000 (or 3999) key values, which is nice and quick to search. -- Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ . ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: B-tree performance 2013-09-01 4:57 ` Niklas Holsti @ 2013-09-01 6:05 ` Peter Brooks 2013-09-01 8:05 ` Niklas Holsti 0 siblings, 1 reply; 13+ messages in thread From: Peter Brooks @ 2013-09-01 6:05 UTC (permalink / raw) On Sunday, 1 September 2013 06:57:00 UTC+2, Niklas Holsti wrote: > > > I find your question hard to understand. Are you really talking about > B-trees (http://en.wikipedia.org/wiki/B-tree), with fan-out typically > much larger than 2? Or are you talking about binary trees with 2-way > fan-out? Your phrase "left and right" suggests the latter. > Yes, I think you're right. Though, because of clashes, leaf nodes can have many members, so it isn't quite a binary tree. > > > So it'd make sense to have an array which contains the address values > > of the B-tree down to a certain level. That way, for a look up, you > > > search the array, which is nice and quick, and this leads you to the > > > correct sub-tree and you then continue as usual, from the top of that > > > sub-tree. > > If you really mean B-trees, it seems to me that you could just increase > the fan-out of the B-tree to get the same effect. With a fan-out of say > 4000, the root of the B-tree consists of a sorted array of 4000 (or > 3999) key values, which is nice and quick to search. > Yes, that is really my questions. Since the hash algorithm will produce a uniformly dense distribution, the array of key values that will fit best will be one that divides 64bit space up evenly. My question is when you get diminishing returns from using a sorted array. Or, equivalently, what would be the optimal size of the array of key values? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: B-tree performance 2013-09-01 6:05 ` Peter Brooks @ 2013-09-01 8:05 ` Niklas Holsti 2013-09-01 8:38 ` Peter Brooks 0 siblings, 1 reply; 13+ messages in thread From: Niklas Holsti @ 2013-09-01 8:05 UTC (permalink / raw) On 13-09-01 09:05 , Peter Brooks wrote: > On Sunday, 1 September 2013 06:57:00 UTC+2, Niklas Holsti wrote: >> >> >> I find your question hard to understand. Are you really talking >> about B-trees (http://en.wikipedia.org/wiki/B-tree), with fan-out >> typically much larger than 2? Or are you talking about binary trees >> with 2-way fan-out? Your phrase "left and right" suggests the >> latter. >> > Yes, I think you're right. Though, because of clashes, leaf nodes can > have many members, so it isn't quite a binary tree. Ah... you are first computing a hash-value from the actual key, and then looking up the hash-value in a tree? That is an unusual way to use hashing. Usually, the hash-value is arithmetically converted into a direct "hash index" of the hash-table (for example, using "hash_value mod hash_table_size"), and the hash-table entry at that index contains all objects that have this hash-index (often as a linked list, but other ways exist). It is this direct computation of an array index that makes hashing fast, compared with containers that need some kind of search. See http://en.wikipedia.org/wiki/Hash_table. (It is also the reason why there is little need to go beyond 32 bits for the hash-value size, since the hash-value will be folded into a much smaller array index anyway.) If the average number of collisions (= objects with the same hash-index) is small, a linear traversal of the collision set is fast enough to locate the object with the desired actual key. If the number of collisions is large, it means that the hash-table is too small (or the hash function is bad). Hashed containers typically increase the hash-table size automatically when the frequency of collisions exceeds some threshold. If for some reason you want to the keep the hash-table small, although this leads to many collisions, I suppose you could organize the colliding object sets into some kind of keyed containers, for example B-trees, but then you should use the actual key to index these containers, not the hash-value. I assume your data structures will be in main memory. If some parts are in secondary memory (disk files) there are more trade-offs to consider. > My question is when you get diminishing returns from using a sorted > array. Or, equivalently, what would be the optimal size of the array > of key values? Either make up an analytical model (which will of course depend on lots of parameters of the problem and processor) or experiment and measure. I would do the latter. But it will probably also depend on the pattern of accesses. For example, if the accesses have temporal locality, using some kind of "move to front" heuristic on the lists of colliding objects can help. -- Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ . ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: B-tree performance 2013-09-01 8:05 ` Niklas Holsti @ 2013-09-01 8:38 ` Peter Brooks 2013-09-02 4:28 ` Niklas Holsti 2013-09-05 8:25 ` Memory mapping and network file systems (Was: B-tree performance) Jacob Sparre Andersen 0 siblings, 2 replies; 13+ messages in thread From: Peter Brooks @ 2013-09-01 8:38 UTC (permalink / raw) On Sunday, 1 September 2013 10:05:27 UTC+2, Niklas Holsti wrote: > > > If for some reason you want to the keep the hash-table small, although > this leads to many collisions, I suppose you could organize the > colliding object sets into some kind of keyed containers, for example > B-trees, but then you should use the actual key to index these > containers, not the hash-value. > I'm using a tree structure to represent a sparse array because it makes inserts and deletes easier and quicker, and cleaner. I'm using the large hash index to minimise the number of collisions. If there was a better sparse array algorithm that'd probably suit me better. Then I'd simply hash everything to a 32 bit (or even larger) array index and be done with it. With dynamic allocation, memory is used more efficiently. If you use a sparse array, it can grow in size slowly (needing periodic cleaning) because deletions don't delete the location, but simply put the contents to null, and such null entries accumulate. > > > I assume your data structures will be in main memory. If some parts are > in secondary memory (disk files) there are more trade-offs to consider. > I'm hoping to use persistent objects to reduce the extra complications there - using OS paging makes more sense, from a performance point of view, than using files. If the persistent objects are shadowed in RAM disc and that's shared over a network, it can make the solution nicely scalable across multiple networked machines. > > Either make up an analytical model (which will of course depend on lots > of parameters of the problem and processor) or experiment and measure. I > would do the latter. But it will probably also depend on the pattern of > accesses. For example, if the accesses have temporal locality, using > some kind of "move to front" heuristic on the lists of colliding objects > can help. > Yes, some sort of caching might help, particularly for predicates. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: B-tree performance 2013-09-01 8:38 ` Peter Brooks @ 2013-09-02 4:28 ` Niklas Holsti 2013-09-03 2:15 ` Peter Brooks 2013-09-05 8:25 ` Memory mapping and network file systems (Was: B-tree performance) Jacob Sparre Andersen 1 sibling, 1 reply; 13+ messages in thread From: Niklas Holsti @ 2013-09-02 4:28 UTC (permalink / raw) On 13-09-01 11:38 , Peter Brooks wrote: > On Sunday, 1 September 2013 10:05:27 UTC+2, Niklas Holsti wrote: >> >> >> If for some reason you want to the keep the hash-table small, although >> this leads to many collisions, I suppose you could organize the >> colliding object sets into some kind of keyed containers, for example >> B-trees, but then you should use the actual key to index these >> containers, not the hash-value. >> > I'm using a tree structure to represent a sparse array because it > makes inserts and deletes easier and quicker, and cleaner. But insertion and deletion are simple and fast in the basic hash-table structure, too, as long as the collision sets are small. In the traditional hash-table structure, the collision sets are implemented as unordered linked lists of objects. You can insert or delete any list entry with a few link changes. If you adjust the size of the hash table to keep the collision set size bounded by some constant, insertion and deletions are O(1) on average. -- Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ . ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: B-tree performance 2013-09-02 4:28 ` Niklas Holsti @ 2013-09-03 2:15 ` Peter Brooks 0 siblings, 0 replies; 13+ messages in thread From: Peter Brooks @ 2013-09-03 2:15 UTC (permalink / raw) On Monday, 2 September 2013 06:28:57 UTC+2, Niklas Holsti wrote: > > > But insertion and deletion are simple and fast in the basic hash-table > structure, too, as long as the collision sets are small. > Only if the hash-table is large relative to the date. The test data I'm using contains 13,225,167 triples - a real triplestore might have ten times that. The file is nearly 2Gb in size. > > In the traditional hash-table structure, the collision sets are > implemented as unordered linked lists of objects. You can insert or > delete any list entry with a few link changes. If you adjust the size of > the hash table to keep the collision set size bounded by some constant, > I am meaning, as I said, to use a binary tree - a linked list. So collisions will just be further links. The problem I see is the tree getting too deep, so that it takes too long to find items (as usual the time to add something is less critical than the time to find something). The fewer collisions, the less lookup time, but the more items in the top-level hash table, the shallower the overall structure. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Memory mapping and network file systems (Was: B-tree performance) 2013-09-01 8:38 ` Peter Brooks 2013-09-02 4:28 ` Niklas Holsti @ 2013-09-05 8:25 ` Jacob Sparre Andersen 2013-09-05 10:12 ` Peter Brooks 1 sibling, 1 reply; 13+ messages in thread From: Jacob Sparre Andersen @ 2013-09-05 8:25 UTC (permalink / raw) Peter Brooks wrote: > If the persistent objects are shadowed in RAM disc and that's shared > over a network, it can make the solution nicely scalable across > multiple networked machines. That sounds rather dangerous to me. I would expect either very low performance or lots of data corruption. Greetings, Jacob -- »In Ada you model the problem space, not the solution space.« -- Robert I. Eachus ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Memory mapping and network file systems (Was: B-tree performance) 2013-09-05 8:25 ` Memory mapping and network file systems (Was: B-tree performance) Jacob Sparre Andersen @ 2013-09-05 10:12 ` Peter Brooks 0 siblings, 0 replies; 13+ messages in thread From: Peter Brooks @ 2013-09-05 10:12 UTC (permalink / raw) On Thursday, 5 September 2013 10:25:08 UTC+2, Jacob Sparre Andersen wrote: >> > If the persistent objects are shadowed in RAM disc and that's shared > > over a network, it can make the solution nicely scalable across > > multiple networked machines. > > That sounds rather dangerous to me. I would expect either very low > performance or lots of data corruption. > -- Robert I. Eachus > So would I. Unless, as is the case, triplestores are more read than written, so only one process needs to write, removing the main source of corruption and removing the need to do lots of locking. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: B-tree performance 2013-08-31 7:14 B-tree performance Peter Brooks 2013-08-31 15:26 ` Shark8 2013-09-01 4:57 ` Niklas Holsti @ 2013-09-03 16:14 ` Dan'l Miller 2013-09-03 17:43 ` Peter Brooks 2 siblings, 1 reply; 13+ messages in thread From: Dan'l Miller @ 2013-09-03 16:14 UTC (permalink / raw) On Saturday, August 31, 2013 2:14:22 AM UTC-5, Peter Brooks wrote: > I'm interested to know if anybody has any real-world evidence for this hypothesis, or if they agree with my gut feel. I disagree with the direction that your gut feel took you. I will diverge from your original posting at the appropriate points below. > If I build a B-tree, then the conventional means to search it would be to start in the middle, then cascade left and right following the pointer structure. That's slow because most searches won't turn up a value in the first N levels of the b-tree. > > So it'd make sense to have [...snip...] So it would make sense to not use a B-tree. Indeed, it would make sense to not restart the comparison again at each node, because it wastes time and bloats memory-space. Hence, it would make sense to use a radix trie. http://en.wikipedia.org/wiki/Radix_trie Because a URI is a sequence of ISO10646/Unicode characters (i.e., a textual string), PATRICIA tries (i.e., radix tries with radix = 2) might be an option to minimize sparseness, which tends to lessen space utilization, when compared to too-large of radices, i.e., too large of powers of two, which cause too sparsely-branching nodes of an excessively n-ary trie. > The question is how many levels would it make sense to keep in the array. [...snip...] Hence, it would make sense to use a radix trie, with a small radix, say, 2 or 4 that fits into your processor's level-1 or level-2 cache-line of 32B or 64B. Optimizing for a 64b addresses (i.e., 8B) and a 32B level-1 cache-line, 4 of those addresses would fit into one cache-line, so radix = 4, which means compare the key 2 bits at a time for a 4-ary branching at each node, which means a O(log-base-4 n) search, a O(log-base-4 n) insertion, and a O(log-base-4 n) deletion, where n is the maximum quantity of bit-pairs in the URI string, which is 4 times string byte-length if the URI is in UTF8 encoding. (Btw, yeah, I know that I should eliminate the base from the logarithm in Bachmann-Landau big-omicron notation, but I am emphasizing than radix=4 has typically half the work to accomplish than radix=2, while not increasing the quantity of cache-lines per node. And, yes, I know that many people were taught incorrectly that upper-bound growth rate is Latin capital letter O, not Greek capital letter Omicron, but I am loyal to Landau's original seminal references in 1909 through 1914.) https://en.wikipedia.org/wiki/Bachmann-Landau_notation#Family_of_Bachmann.E2.80.93Landau_notations ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: B-tree performance 2013-09-03 16:14 ` B-tree performance Dan'l Miller @ 2013-09-03 17:43 ` Peter Brooks 0 siblings, 0 replies; 13+ messages in thread From: Peter Brooks @ 2013-09-03 17:43 UTC (permalink / raw) On Tuesday, 3 September 2013 18:14:19 UTC+2, Dan'l Miller wrote: > > > So it would make sense to not use a B-tree. Indeed, it would make sense to not restart the comparison again at each node, because it wastes time and bloats memory-space. Hence, it would make sense to use a radix trie. http://en.wikipedia.org/wiki/Radix_trie > Yes, that might make sense. I did something similar many years ago when I produced correlation matrices of letter sequences in words. It wouldn't work as a linked list, but a multi-dimensional associative array might be an option - thank you, I'll look into that. One pleasant side-effect of this method would be that it'd be much easier to search for the same entry in different URIs. I'll have to experiment a bit on workable representations of this. The funny thing about this is that it might be very easy to implement in awk. Not what I was looking for or thinking of. I'm not sure how awk behaves with very large arrays - I'll find out quite soon. ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2013-09-05 10:12 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-08-31 7:14 B-tree performance Peter Brooks 2013-08-31 15:26 ` Shark8 2013-08-31 15:47 ` Peter Brooks 2013-09-01 4:57 ` Niklas Holsti 2013-09-01 6:05 ` Peter Brooks 2013-09-01 8:05 ` Niklas Holsti 2013-09-01 8:38 ` Peter Brooks 2013-09-02 4:28 ` Niklas Holsti 2013-09-03 2:15 ` Peter Brooks 2013-09-05 8:25 ` Memory mapping and network file systems (Was: B-tree performance) Jacob Sparre Andersen 2013-09-05 10:12 ` Peter Brooks 2013-09-03 16:14 ` B-tree performance Dan'l Miller 2013-09-03 17:43 ` Peter Brooks
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox