Help B* and B+ Trees

comp.lang.ada
 help / color / mirror / Atom feed

* Help B* and B+ Trees
@ 1998-05-14  0:00 whizzbang
  1998-05-14  0:00 ` Charles Hixson
  1998-05-14  0:00 ` Matthew Heaney
  0 siblings, 2 replies; 16+ messages in thread
From: whizzbang @ 1998-05-14  0:00 UTC (permalink / raw)



Can anyone please help me on information regarding B-Plus & B-Star Trees. I
am not after code, but I am after documentation on how they work. I f anyone
has any interesting information I will be more than greatfull. You can email
me on:
ryan@gic.net.au
Thank you
Ryan Froman






^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Help B* and B+ Trees
  1998-05-14  0:00 whizzbang
@ 1998-05-14  0:00 ` Charles Hixson
  1998-05-14  0:00   ` Robert Dewar
  1998-05-14  0:00 ` Matthew Heaney
  1 sibling, 1 reply; 16+ messages in thread
From: Charles Hixson @ 1998-05-14  0:00 UTC (permalink / raw)
  To: whizzbang


Well... depending on your level of expertise, Knuth wrote the book on
this one.  Warning: He uses assembly code for a pseudo-machine in his
explanations!
I forget whether it's "Fundamental Algorithms" or "Sorting and
Searching", but it's one of the volumes of "The Art of Computer
Programming".

whizzbang wrote:
> 
> Can anyone please help me on information regarding B-Plus & B-Star Trees. I
> am not after code, but I am after documentation on how they work. I f anyone
> has any interesting information I will be more than greatfull. You can email
> me on:
> ryan@gic.net.au
> Thank you
> Ryan Froman

-- 
Charles Hixson	charleshixson@earthling.net
(510) 464-7733	or chixso@mtc.dst.ca.us




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Help B* and B+ Trees
  1998-05-14  0:00 whizzbang
  1998-05-14  0:00 ` Charles Hixson
@ 1998-05-14  0:00 ` Matthew Heaney
  1998-05-14  0:00   ` Robert Dewar
  1 sibling, 1 reply; 16+ messages in thread
From: Matthew Heaney @ 1998-05-14  0:00 UTC (permalink / raw)



In article <6je5il$ecf$1@news.iinet.net.au>, "whizzbang" <none@present.com>
wrote:
(start of quote)
Can anyone please help me on information regarding B-Plus & B-Star Trees. I
am not after code, but I am after documentation on how they work. I f anyone
has any interesting information I will be more than greatfull.
(end of quote)

The best book I've found is

File Structures, 2nd ed
Michael J. Folk, Bill Zoellick
Addison-Wesley, 1992
ISBN 0-201-55713-4
QA76.9.F5F65

This is the best book I've read on all aspects of files and disks, and it
served as the basis for the B-trees I wrote in Ada95.

As a starting point, I also used the B-tree code in 

Algorithms + Data Structures = Programs
Niklaus Wirth
Prentice-Hall, 1976

Although I haven't read it, you can try

File Structures with Ada
Nancy E. Miller and Charles G. Petersen
Benjamin/Cummings, 1990

There's an appendix in Chris Date's database book that discusses B-trees
and indexed sequential files; he cites Knuth.  There's also a chapter in
Algorithms, by Cormen, Leiserson, and Rivest.

Of course, you may want to read the original papers by Bayer and McCreight,
Bayer, and the survey article by Comer.

By the way, even though Bayer didn't explicitly say what the B was for, it
seems it stands for "Bayer."  He cited the literature on "AVL" trees in the
original paper, so he was probably continuing in tradition of naming a tree
using the initial from one's last name.  Perhaps the confusion about the
meaning of the "B" came about because the trees weren't named BM-Trees,
since McCreight shared authorship in the paper most frequently cited.  Who
knows?




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Help B* and B+ Trees
  1998-05-14  0:00 ` Matthew Heaney
@ 1998-05-14  0:00   ` Robert Dewar
  0 siblings, 0 replies; 16+ messages in thread
From: Robert Dewar @ 1998-05-14  0:00 UTC (permalink / raw)



<<In article <6je5il$ecf$1@news.iinet.net.au>, "whizzbang" <none@present.com>
wrote:
(start of quote)
Can anyone please help me on information regarding B-Plus & B-Star Trees. I
am not after code, but I am after documentation on how they work. I f anyone
has any interesting information I will be more than greatfull.
(end of quote)
>>


I recomend going back to the original B-Tree paper (as always Knuth is
careful with references, so you can find the refrefrence in Vol III).
It is one of the really great papers in the computer science literature,
since it not only presents a critical idea that has been widely used, but
does it in a very understandable and very thorough manner.





^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Help B* and B+ Trees
  1998-05-14  0:00 ` Charles Hixson
@ 1998-05-14  0:00   ` Robert Dewar
  1998-05-15  0:00     ` Charles Hixson
  1998-05-16  0:00     ` Tarjei T. Jensen
  0 siblings, 2 replies; 16+ messages in thread
From: Robert Dewar @ 1998-05-14  0:00 UTC (permalink / raw)



Charles says

<<Well... depending on your level of expertise, Knuth wrote the book on
this one.  Warning: He uses assembly code for a pseudo-machine in his
explanations!
>>


This is quite unfair to Don, he explains algorithms in a high level
manner using abstract pseudo-code. MIX is only used in low level analysis
of actual performance on a typical machine, i.e. to get a feel for the
constants involved and go from O(n**2) to C*n**2, you need a concrete
machine!





^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Help B* and B+ Trees
  1998-05-14  0:00   ` Robert Dewar
@ 1998-05-15  0:00     ` Charles Hixson
  1998-05-16  0:00       ` Robert Dewar
  1998-05-16  0:00     ` Tarjei T. Jensen
  1 sibling, 1 reply; 16+ messages in thread
From: Charles Hixson @ 1998-05-15  0:00 UTC (permalink / raw)
  To: Robert Dewar


Yes, he explains a lot.  But I kept finding that I had to drop into MIX
to figure out EXACTLY what he meant, and if one is implementing a
B-Tree, the EXACTLY is precisely what is wanted.

Robert Dewar wrote:
> 
> Charles says
> 
> <<Well... depending on your level of expertise, Knuth wrote the book on
> this one.  Warning: He uses assembly code for a pseudo-machine in his
> explanations!
> >>
> 
> This is quite unfair to Don, he explains algorithms in a high level
> manner using abstract pseudo-code. MIX is only used in low level analysis
> of actual performance on a typical machine, i.e. to get a feel for the
> constants involved and go from O(n**2) to C*n**2, you need a concrete
> machine!

-- 
Charles Hixson	charleshixson@earthling.net
(510) 464-7733	or chixso@mtc.dst.ca.us




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Help B* and B+ Trees
  1998-05-14  0:00   ` Robert Dewar
  1998-05-15  0:00     ` Charles Hixson
@ 1998-05-16  0:00     ` Tarjei T. Jensen
  1998-05-16  0:00       ` Robert Dewar
  1998-05-17  0:00       ` Dan Johnston D.B.
  1 sibling, 2 replies; 16+ messages in thread
From: Tarjei T. Jensen @ 1998-05-16  0:00 UTC (permalink / raw)

Robert Dewar wrote: 
> This is quite unfair to Don, he explains algorithms in a high level
> manner using abstract pseudo-code. MIX is only used in low level analysis
> of actual performance on a typical machine, i.e. to get a feel for the
> constants involved and go from O(n**2) to C*n**2, you need a concrete
> machine!

Not really. Any computer will suffice. If he had used Pascal like he did
with TeX and Metafont (I would of course prefer Ada 95) it would be
simple for any user to discover the differences. Pascal compilers are
widely available and it would be simple for interested parties to type
in the examples and analyze the results.

I believe he used mix simply because he likes bit twiddling. I don't
mind that, it just makes the books less readable and longer than they
could have been.

I have read his explanation for using mix on his web page and I think
the stated reasons are entirely bogus and that he will have a hard time
finding scientific evidence supporting his position.

It is strange that someone who worries so much about readability as to
create his own type setting system does not consider the impact his
choice of programming language for the examples has.

Greetings,

-- 
// Tarjei T. Jensen 
//    tarjei@online.no || voice +47 51 62 85 58
//   Support you local rescue centre: GET LOST!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Help B* and B+ Trees
  1998-05-16  0:00     ` Tarjei T. Jensen
@ 1998-05-16  0:00       ` Robert Dewar
  1998-05-16  0:00         ` Tarjei T. Jensen
  1998-05-17  0:00       ` Dan Johnston D.B.
  1 sibling, 1 reply; 16+ messages in thread
From: Robert Dewar @ 1998-05-16  0:00 UTC (permalink / raw)



Tarjei said

<<Not really. Any computer will suffice. If he had used Pascal like he did
with TeX and Metafont (I would of course prefer Ada 95) it would be
simple for any user to discover the differences. Pascal compilers are
widely available and it would be simple for interested parties to type
in the examples and analyze the results.

I believe he used mix simply because he likes bit twiddling. I don't
mind that, it just makes the books less readable and longer than they
could have been.
>>


I strongly disagree. I don't think exact evaluation of concrete complexity
at the machine cycle level is relevant to Knuth's work, so I agree it would
have better been left out. BUT, given that he wanted to address this, then
using a concrete machine is the only way of doing things. Yes, he could
certainly have used a real machine instead of MIX, though I don't think
there was any very good choice at the time.

I strongly disagree with the notion that the same results can be obtained
using high level languages. You are FAR too dependent on vagaries of
compilers and are likely to be measuring glitches in code generation,
rather than fundamental differences in computation complexity if you do this.

In my graduate course on microprocessor architecture at NYU this last
semester, I gave as a project picking one particular issue and investigating
efficienty issues at the machine level, e.g. multiple if's compared to
a case jump table, or inlining on/off, or the value of hardware integer
multiplication. It is amazing how much compilers got in the way of these
projects by generating strange stuff. Nearly everyone had to significantly
modify the compiler ASM output to get meaningfule unbiased results.

"it would be simple for interested parties to type in the examples and analyuze
the results"

FAR from simple, you have to look at, and fix up, the generated assembly
language.



Probably the best approach is to adopt a very high level machine with
known complexities for its high level operations (this could be based on
a subset of Ada or Pascal or C or whatever). This at least would allow
one to avoid the gross distortion that comes from, e.g. only counting
comparisons in a sort.

P.S. I once proposed the following minimal-number-of-comparisons
sorting routine to Knuth, and asked him why it did not qualify for
his chapter on sorting thtat considers only number of compares:


1. Get number of items to sort

2. Enumerate all possible decision trees corresponding to this number,
   and pick the one with the smallest number of levels.,

3. Sort using this decision tree

Only step 3 involves comparisons of the data elements

(of course there is rather a lot of uncounted book keeping in step 2 :-)

(he never replied :-) :-)





^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Help B* and B+ Trees
  1998-05-15  0:00     ` Charles Hixson
@ 1998-05-16  0:00       ` Robert Dewar
  0 siblings, 0 replies; 16+ messages in thread
From: Robert Dewar @ 1998-05-16  0:00 UTC (permalink / raw)



Charles said

<<Yes, he explains a lot.  But I kept finding that I had to drop into MIX
to figure out EXACTLY what he meant, and if one is implementing a
B-Tree, the EXACTLY is precisely what is wanted.
>>

Well of course I can't argue with what Charles needed to do, but in my view
the description of B-Tree's in KnuthV3 is complete and accurate. The MIX
code adds nothing but over-specification and confusion in this particular
case if you ask me! Certainly most readers will not need to read the Mix
code. I have taught hundreds of students using this book, and one of my
important pieces of advice is to completely ignore the Mix stuff!





^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Help B* and B+ Trees
  1998-05-16  0:00       ` Robert Dewar
@ 1998-05-16  0:00         ` Tarjei T. Jensen
  1998-05-17  0:00           ` Robert Dewar
  0 siblings, 1 reply; 16+ messages in thread
From: Tarjei T. Jensen @ 1998-05-16  0:00 UTC (permalink / raw)

I suppose we more or less agree.

Robert Dewar wrote:
> 
> I strongly disagree. I don't think exact evaluation of concrete complexity
> at the machine cycle level is relevant to Knuth's work, so I agree it would
> have better been left out. BUT, given that he wanted to address this, then
> using a concrete machine is the only way of doing things. Yes, he could
> certainly have used a real machine instead of MIX, though I don't think
> there was any very good choice at the time.

And how do you propose to get cache hit and misses into this? It is a
time since I read about the implementation of Mix. I doubt that the
implementation takes into account internal pipelining and the effect of
cache hits and misses which we would normally have in a real system.

The only way I can see we can get this into the equation is to use real
computers. E.g. time the same code on a 386, 486 (with and without
cache) and various pentiums (with cache and without cache).

> I strongly disagree with the notion that the same results can be obtained
> using high level languages. You are FAR too dependent on vagaries of
> compilers and are likely to be measuring glitches in code generation,
> rather than fundamental differences in computation complexity if you do this.

It would be silly to teach algorithms that way. When dealing with
algorithms one should constrain oneself to timing the execution of the
algoritm. If neccessary repeat the execution enough times to obtain a
meaningful result.

Then one can relate the time to the number of statements executed. The
object of the exercise would be to make the student able to estimate
relative execution time of an algoritm.

Analyzing the generated code is something that should be done in a
microprosessor architecture or compiler architecture course.

> In my graduate course on microprocessor architecture at NYU this last
> semester, I gave as a project picking one particular issue and investigating
> efficienty issues at the machine level, e.g. multiple if's compared to
> a case jump table, or inlining on/off, or the value of hardware integer
> multiplication. It is amazing how much compilers got in the way of these
> projects by generating strange stuff. Nearly everyone had to significantly
> modify the compiler ASM output to get meaningfule unbiased results.

That is as it should be for a microprocessor architechture or compiler
architecture course. It becomes interesting to ask why the compiler
generates this or that code and what the effects of doing things
differently would be. In an algorithm course this only confuses.

> Probably the best approach is to adopt a very high level machine with
> known complexities for its high level operations (this could be based on
> a subset of Ada or Pascal or C or whatever). This at least would allow
> one to avoid the gross distortion that comes from, e.g. only counting
> comparisons in a sort.

I don't see how this reflects the real world where a compiler writer may
want to keep jumps within the instruction pipeline or obtain a cache
hit.

I still want to time the algorithm and then explain the time differences
with references to the source code. I see no point in complicating
matters by decoding compiler output (unless the result is confusing).

Greetings,

-- 
// Tarjei T. Jensen 
//    tarjei@online.no || voice +47 51 62 85 58
//   Support you local rescue centre: GET LOST!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Help B* and B+ Trees
  1998-05-17  0:00 Help B* and B+ Trees Alexander E. Kopilovitch
@ 1998-05-17  0:00 ` Robert Dewar
  0 siblings, 0 replies; 16+ messages in thread
From: Robert Dewar @ 1998-05-17  0:00 UTC (permalink / raw)



Alexander said

<<He (Knuth) may have foreseen an appearence of JVM. And he did his best to
adapt us to the future nightmares.
>>

Perhaps there was a :-) missing from this post, but just in case it really
was a misunderstanding, there is not even a vague relationshipo between
the semantic level of JVM and the semantic level of MIX.





^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Help B* and B+ Trees
@ 1998-05-17  0:00 Alexander E. Kopilovitch
  1998-05-17  0:00 ` Robert Dewar
  0 siblings, 1 reply; 16+ messages in thread
From: Alexander E. Kopilovitch @ 1998-05-17  0:00 UTC (permalink / raw)



In <355D5DF7.1451@online.no> "Tarjei T. Jensen" <tarjei@online.no> wrote:
>I believe he used mix simply because he likes bit twiddling. I don't
>mind that, it just makes the books less readable and longer than they
>could have been.
He (Knuth) may have foreseen an appearence of JVM. And he did his best to
adapt us to the future nightmares.


Alexander Kopilovitch                      aek@pcaek.spb.su
Saint-Petersburg
Russia

 




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Help B* and B+ Trees
  1998-05-17  0:00       ` Dan Johnston D.B.
@ 1998-05-17  0:00         ` Tarjei T. Jensen
  1998-05-17  0:00           ` Robert Dewar
  0 siblings, 1 reply; 16+ messages in thread
From: Tarjei T. Jensen @ 1998-05-17  0:00 UTC (permalink / raw)



Dan Johnston D.B. wrote:

> Pascal did not exist in 1967 when Knuth published Vol 1 Fundamental
> Algorithms and even Vol 3 Searching and Sorting was published in the
> same year 1973 as the Pascal Revised Report.  Knuth's language options
> would have been pretty limited when he started these classic works.

I'll settle for Algol.

Greetings,

-- 
// Tarjei T. Jensen 
//    tarjei@online.no || voice +47 51 62 85 58
//   Support you local rescue centre: GET LOST!




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Help B* and B+ Trees
  1998-05-16  0:00         ` Tarjei T. Jensen
@ 1998-05-17  0:00           ` Robert Dewar
  0 siblings, 0 replies; 16+ messages in thread
From: Robert Dewar @ 1998-05-17  0:00 UTC (permalink / raw)



Tarjei said

<<And how do you propose to get cache hit and misses into this? It is a
  time since I read about the implementation of Mix. I doubt that the
  implementation takes into account internal pipelining and the effect of
  cache hits and misses which we would normally have in a real system.>>

Slight historical mismatch here. KV3 was written around 1970, when
memories were as fast as processors, and cache hit and miss times
were irrelevant since machines were typically uncached. I agree it
is FAR harder to get accurate timings these days. You did not even
mention TLB misses.

The point is that you still come much closer to the real world at this
level than at the O(N**2) comparisons counted level!

<<I don't see how this reflects the real world where a compiler writer may
  want to keep jumps within the instruction pipeline or obtain a cache
  hit.
  I still want to time the algorithm and then explain the time differences
  with references to the source code. I see no point in complicating
  matters by decoding compiler output (unless the result is confusing).>>

Fine: you are actually strongly agreeing with me, so if you don't see this,
you did not quite understand my suggestion. 

Obviously you can't explain all timing differences at the high level
language level, because of the factors you mention, still you can
indeed come pretty close PROVIDING, and this is a very important PROVIDING,
you have a computational model of the language you are using. That is what
I meant by "adopting a very high level machine with known complexities ..
based on a subset of Ada or Pascal or C or whatever). The point is that
if you want to reference source code in understanding complexity, then
you must restrict the source code to operations for which you have some
reasonable understanding of their computational complexity (e.g. we 
do NOT want to include propagation of exceptions in the mix).

<<Then one can relate the time to the number of statements executed>>

Only if you have some idea of what it means to execute a statement in
terms of computational complexity!





^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Help B* and B+ Trees
  1998-05-17  0:00         ` Tarjei T. Jensen
@ 1998-05-17  0:00           ` Robert Dewar
  0 siblings, 0 replies; 16+ messages in thread
From: Robert Dewar @ 1998-05-17  0:00 UTC (permalink / raw)



Tarjei said

<<I'll settle for Algol.

Greetings,
>>

I doubt it. Using a language with no notion of pointers or references would
be obfuscatory in this context. After all Fortran could also have been used,
but would have suffered from the same serious defect.





^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Help B* and B+ Trees
  1998-05-16  0:00     ` Tarjei T. Jensen
  1998-05-16  0:00       ` Robert Dewar
@ 1998-05-17  0:00       ` Dan Johnston D.B.
  1998-05-17  0:00         ` Tarjei T. Jensen
  1 sibling, 1 reply; 16+ messages in thread
From: Dan Johnston D.B. @ 1998-05-17  0:00 UTC (permalink / raw)



In <355D5DF7.1451@online.no> "Tarjei T. Jensen" <tarjei@online.no> writes:

>Robert Dewar wrote: 
>> This is quite unfair to Don, he explains algorithms in a high level
>> manner using abstract pseudo-code. MIX is only used in low level analysis
>> of actual performance on a typical machine, i.e. to get a feel for the
>> constants involved and go from O(n**2) to C*n**2, you need a concrete
>> machine!

>Not really. Any computer will suffice. If he had used Pascal like he did
>with TeX and Metafont (I would of course prefer Ada 95) it would be
>simple for any user to discover the differences. Pascal compilers are
>widely available and it would be simple for interested parties to type
>in the examples and analyze the results.

Pascal did not exist in 1967 when Knuth published Vol 1 Fundamental
Algorithms and even Vol 3 Searching and Sorting was published in the
same year 1973 as the Pascal Revised Report.  Knuth's language options
would have been pretty limited when he started these classic works.
          dan.     Dan Johnston,     dan@csee.uq.edu.au




^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~1998-05-17  0:00 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1998-05-17  0:00 Help B* and B+ Trees Alexander E. Kopilovitch
1998-05-17  0:00 ` Robert Dewar
  -- strict thread matches above, loose matches on Subject: below --
1998-05-14  0:00 whizzbang
1998-05-14  0:00 ` Charles Hixson
1998-05-14  0:00   ` Robert Dewar
1998-05-15  0:00     ` Charles Hixson
1998-05-16  0:00       ` Robert Dewar
1998-05-16  0:00     ` Tarjei T. Jensen
1998-05-16  0:00       ` Robert Dewar
1998-05-16  0:00         ` Tarjei T. Jensen
1998-05-17  0:00           ` Robert Dewar
1998-05-17  0:00       ` Dan Johnston D.B.
1998-05-17  0:00         ` Tarjei T. Jensen
1998-05-17  0:00           ` Robert Dewar
1998-05-14  0:00 ` Matthew Heaney
1998-05-14  0:00   ` Robert Dewar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox