Re: AVL Tree Implementation - Richard A. O'Keefe

comp.lang.ada
 help / color / mirror / Atom feed

From: ok@goanna.cs.rmit.edu.au (Richard A. O'Keefe)
Subject: Re: AVL Tree Implementation
Date: 1996/10/07
Date: 1996-10-07T00:00:00+00:00	[thread overview]
Message-ID: <53ak1c$ar4$1@goanna.cs.rmit.edu.au> (raw)
In-Reply-To: 531btt$rab@felix.seas.gwu.edu

mfeldman@seas.gwu.edu (Michael Feldman) writes:
>>The assumption here is that a data structures textbook is of value
>>*only* to students, and more specifically only to students of data

>>Leaving out operations, or providing fragments that require nontrivial
>>stitching, is a good way to ensure that this assumption will be *true*
>>of a particular book.

>Which assumption? That occasionally a user will curse the author?
>Yeah, I guess I'll have to live with that.:-)

No, the assumption that a data structures textbook is of value *only*
to students, and more specifically only to students of data structures,
and then only while they are studying data structures.

>I admire your very high standards for books; you and I have discussed
>this publicly and privately before. I don;t think it's too realistic
>for you to expect _perfection_, though.

Who is asking for perfection?  Since when has "complete working code
for a major data structure" been _perfection_ in a data structures
textbook?

>Nothing would _ever_ get out
>to where people could use it if the standard were _that_ high...

We would be much better off if fewer books _were_ out to where people
could be abused by them.  (This does not apply to your books.)
I must admit that would be very dismayed indeed if the revised editions
of The Art of Computer Programming were held up much longer...

>>The reader is always left with the suspicion that the author didn't
>>provide complete code because he _couldn't_ and that the data structure
>>is of no real use.

><shrug> How do I respond to this one? ("Trust me! I know how to do it!")?

>>One of the things I was proudest of in the Pascal code I provided to the
>>students in the CS241 course when I taught it was that it was complete
>>working generic (thanks to M4, sigh, and thank goodness for Ada) code
>>that they could take away and _use_ in applications

>Yep, we all write code like this. I'm reasonably proud of mine too, even
>if some of it was stubbed out so students could fill it back in.

The specific case at issue is variants of binary search trees.
We are talking about implementing AVL trees by ripping out a number of
mostly small operations into separate procedures that plug into a
parent unit that does not exist (there is no AVL_Trees_Generic for them
to plug into).  In the absence of the book (which of course the *intended*
users of the code would certainly have, so this is quite unfair) I surmise
that the intent is to modify a copy of Binary_Search_Trees_Generic.

	-- Digression
	I find it in practice a *much* better interface to define
	insert(Collection, Element) = Collection := Collection U {Element}
	delete(Collection, Element) = Collection := Collection \ {Element}
	Why do I find this a better interface than the one with the
	apparently arbitrary "raise Not_Found if delete from empty tree"
	version?  Because reasoning about it is *much* simpler thanks to
	the absence of a not-practically-motivated case analysis.
	-- End digression.

Now doing this is surprisingly tricky, because while there _is_ a
replacement for the Insert operation, there _isn't_ any replacement
for the Delete operation, and that's needed.  Never mind the "get the
sods to do some work plugging things together" issue, the *most difficult*
operation in the data structure is not provided.  It is this kind of thing
which has had me throwing books across the room in rage (my disposable
income is _very_ small; the last 10 technical books I book were on sale
and without that I wouldn't have been able to afford even one of them).
It is this kind of thing which has had a colleague speaking harshly of
a book, and advising his students not to buy it.  And it is when authors
preferentially omit _hard_ operations that suspicions are aroused.
(Like when Sedgewick left out deletion from Red-Black trees in one of
his books.)

I also note that binary_search_trees_generic.ads (hence presumably
avl_trees_generic.ads) provides only a small number of useful operations.

Here is a sketch of the "dictionary" interface I used in my Pascal class.

required instantiation parameters
    TAG			(instance name)
    KeyType		(type of keys)
    ValType		(type of satellite data)

optional instantiation parameters
    MaxSize		(integer; size limit)
    ChunkSize		(integer; used for "chunked" allocation)
    keyRead		(procedure for reading KeyType)
    valRead		(procedure for reading ValType)
    keyWrite		(procedure for writing KeyType)
    valWrite		(procedure for writing ValType)
    keyCompare		(x,y -> -1,0,+1 if x<y,x=y,x>y)
    keyHash		(key -> natural for hashing)

In Ada, this would be something like

    generic
	type Element is private;
	with function "<"(Left, Right: Element) return Boolean is <>;
	with function Hash(Item: Element) return natural;
	Max_Size  : constant Positive := 1;
	Chunk_Size: constant Positive := 1;
    package Dictionary_Generic_PKG is

keyRead, valRead, and the derived TAG$read procedure would be in a child
package.  (There wasn't any really satisfactory way to do this in Ada83;
I _do_ like Ada95.)  The same for keyWrite, valWrite, and the derived
TAG$write procedure.

general "container" properties from the package
    TAG$type		(the abstract data type)
    TAG$limit		(0 for unbounded collections)
    TAG$make		(constructor routine)
    TAG$free		(destructor  routine)
    TAG$reset		(= free + make, may be cheaper)
    TAG$size		(returns number of elements)
    TAG$empty		(TAG$size = 0, may be _much_ faster)
    TAG$full		(false => safe to insert new element)
    TAG$copy		(assignment)
    TAG$read		(only if keyRead and valRead provided)
    TAG$write		(only if keyWrite and valWrite provided)

I was considerably startled not to see an equivalent of TAG$empty
or TAG$size in binary_search_trees.ads, because it is an extremely
important operation of the abstract type which cannot be synthesised
from the available operations (at least, not without using exceptions,
and it's a clumsy design that _requires_ you to use exception handling
to tell if a container is empty).

dictionary-specific properties
    TAG$lookup		(dictionary x key -> val/bool)
    TAG$has		(dictionary x key -> bool)
    TAG$enter		(dictionary, key, val)
    TAG$forget		(dictionary, key)

My distinction between `lookup' and `has' is related to but different
from Feldman's distinction between `Search' and `Retrieve'; the reason
is simply that I have a strong dislike for letting clients get their
hands on pointers to internal parts of a data structure.  Feldman's
interface
	procedure Search(T: Tree; K: Key_Type) return Tree;
	function Retrieve(T: Tree) return Element;
		-- or raise Not_Found if T is null
exposes too much about the implementation; it would not work very well
with 

	type Node;
	type Link is access Node;
	type Node is
	   record
	      Left, Right : Link;
	      Item        : Element;
	   end record;
	type Tree is
	   record
	      Root: Link;
	      Size: Natural;
	   end record;

What's this got to do with providing complete working code?  Simply
that this imperfect information hiding flaw in the interface doesn't
really show up until you try to provide a practically complete set
of operations.  (It also helps if you try to make _all_ your dictionary
packages use the _same_ interface, or as closely the same as possible.)

    TAG$min		(dictionary -> key, if keyCompare provided)
    TAG$max		(dictionary -> key, if keyCompare provided)
    TAG$extractMin	(dictionary -> key, val)
    TAG$extractMax	(dictionary -> key, val)

These are important, because they cannot be implemented _outside_ the
interface.

"bulk" operations
    TAG$walk2		traversal, may update vals
    TAG$count2		counts key,val pairs satisfying predicate
    TAG$select2		keeps key,val pairs satisfying predicate
    TAG$reject2		discards key,val pairs satisfying predicate

Here again, there is an interesting point that comes up when you do not
try to confine yourself to a minimal set of operations, but try to produce
a _practical_ tool:  the most efficient way I know to implement the `select'
and `reject' operations is to put the retained nodes into a linear list and
then rebuild a balanced tree when you're done.  The pleasant fact that
binary search trees let you get good element-at-a-time operations and
_also_, by flattening, processing as ordered lists, and rebuilding,
very efficient whole-set operations, is not something that one learns from
a text which focuses _solely_ on insert/delete/lookup.

With respect to plugging in, I am not aware of any useful way in which
one could derive Splay trees by plugging small changes into BSTs.

>But AQ&S is only a guide, not a law. We've been around this loop
>before, publicly and privately. I am still a member of the (still large)
>group of intro-course teachers (especially in Pascal circles) who believe 
>that beginners learn code patterns better if the reserved words are
>emphasized. We do that with upper case.

(a) There you part company with the Pascal textbook we used to use,
    "Foundations of Computer Science by Aho & Ullman" (if there isn't
    an Ada edition yet, I hope there is soon).
    They use lower case for keywords.

(b) We are not talking about beginners.
    We are talking about data structures texts for CS2 and later.
    If CS2 students still need to be reminded that 'procedure' is
    a keyword, heaven help us.

(c) Is there published _evidence_ that SHOUTING the KEYWORDS helps,
    or is it _only_ a belief?  Our CS1 students have problems
    aplently, but telling which are the reserved words and which
    are not has never been one of them.  Or at least, not one that
    they have ever _asked_ me about, and they have asked me about
    a great many things.

(d) Given that we have Emacs, which can highlight Ada text and even
    colourise it, and other such editors are available (Alpha for
    the Macintosh is one, there are others for PCs), it is not clear
    to me why shouting in CAPITALS is thought better than a subtler
    typographic style such as underlining or the traditional bold face.
    Given the existence of HTML and Mosaic (free), and the ease with
    which one can write a program to automatically add HTML markup
    such as

	<B>with</B> Token_PKG, Expression_PKG, Statement_PKG;
	<B>use</B>  Token_PKG, Expression_PKG, Statement_PKG;

	<B>package</B> Block_PKG <B>is</B>

    (a real example from real code I am showing to students for an
    assignment, that really was HTML-marked-up by a small program),
    it is again unclear why SHOUTING in CAPITALS is BETTER than bold.

(e) Having introduced a rule "all capitals means keyword" you are
    then stuck for a way to write an acronym.  If you write "AVL",
    as the normal rule for acronyms would have it, you are stuck,
    because now you will confuse your students into thinking it is
    a keyword.  (This applies to other things like CRC as well, and
    I once had occasion to read a lot of code in a language where
    CRC _was_ a keyword, so it's not _that_ far-fethced.)

(f) Many of the most important code patterns are not distinguished by
    their keywords.  In fact, the more a student thinks of code patterns
    in terms of keywords, the worse off that student is, because his or
    her knowledge is thereby encoded in a form which does not transfer
    to other languages.

>Since I'm writing for beginners, that's the lexical style
>I choose. It's not worth fighting over. The important thing is
>_consistent_ style (even AQ&S says this, and mine is consistent)
>and _clear_ code (which I strive for, though this is not quite
>so measurable). 

The sad thing is that there is a substantial example of yours _in_ the
AQ&S guidelines, contradicting many of the layout and spelling rules 
in the same document.  (This is the October 95 version I'm talking about.)

The trouble is that your code may be SELF-consistent, but it is not the
only code that students see.  The effect on students of showing them
code formatted according to AQ&S rules *plus* code in Professor X's
self-consistent-but-different way *plus* code formatted in your
self-consistent-but-different way is to show them a *corpus* that is
*not* consistently laid out.

It is particularly important to note that the Ada 95 LRM itself (which
we have on-line here for students to look at) is consistent with the
AD&S guidelines.

We have a choice:
 - stop trying to teach the students that consistent layout is good
 - use *only* your books (including banning the LRM)
 - use *none* of your books.
Despite the many and major merits of your books,
the third choice has the highest payoff.

>Underscores are a different story. Purely a matter of taste. YMMV.:-)

I strongly disagree.  I, and the authors of the AQ&S guidlines, may be
completely wrong in our belief that Clearly_Separated_Words are easier
to read than RunTogetherWords.  But this is obviously a matter that is
suspectible to experimental determination one way or the other.  (In
fact I was under the impression that the matter _had_ been settled by
psychological experiment, with a clear victory for underscores.)  This
may in part depend on the native language of the reader; someone familiar
with German or Inuit might find runtogetherwords easier to follow than
someone more familiar with English.

I strongly suspect a Pascal influence on the people who like RunTogetherWords,
but note that
 - Pascal was first implemented on the CDC machines, and my recollection
   is that their native character set didn't _have_ an underscore, so it
   wasn't an option
 - the current standard _does_ have the underscore in identifiers
(Similarly,  when Smalltalk was being designed, the character that now
displays as underscore displayed as left arrow, and having been used for
assignment was not available as a separator, hence the SmalltalkStyle.)

I have already given an instance in which I was willing to change my
own style in order to be consistent with the AQ&S and thereby to be
more consistent about what we show students.  Here I will say that if
I am shown good experimental evidence that RunOnWords are easier to
read than Separated_Words (an unlikely outcome given the history of
natural writing systems; ever tried reading old Greek or Latin
insecriptions? the language is intelligible but the runonwordsareha
rdtoread) then I promise to change my style in the direction of
greater readability.

>To summarize - we need more Ada 95 books for different "markets".

True, but for the wrong reason.  This is the Age of the Computer (or
was it the Age of the Train?).  Is there any reason why there cannot 
be two *automatically* produced views of the same book?  There certainly
*can* be two automatically produced views of the same _code_ (proof: my
program that moved Feldman's code to a more readable AQ&S-ish style).
(This kind of idea is just one of many reasons why I hate What-You-See-
Is-All-You-Get document processors.)

We need Ada-for-TAPI, Ada-for-fun-and-profit, Ada-colouring-book,
Ada-OOP, Ada-for-Windows, Ada-for-Linux, Ada-in-21-days, and other
such books.  The TAFE market is different from the HE market which
is different from the "industry training course" market which is
different from ...

>Who's going to write the one Richard needs?
>(Basically, a software components book like the original Booch
>components book.)

Well, again, I was strongly disappointed in the original Booch components
book because it did not *completely* describe the implementations.  If you
wanted to use the components, you had to buy a licence which we didn't
have, so the book was of little practical use to me.

It _is_ true that students will need to revise data structures in real
life.  Unfortunately, many of the changes they will need to make are
not forefigured in a lot of books, and cannot easily be done by plugging
in different separate subunits.  (Such as the change from BSTs to Splay
trees.)  They also need practice in selecting appropriate components for
re-use and in _using_ those components.  There are lots of questions
like "separate key and value types?  single Element type?  Element type
with Key_Of function?" for dictionaries which need discussing.  (My own
opinion is that the third is by far the clumsiest of the three, but the
really important thing is to explain the choice.)

Then of course there is the question of how much of a book to devote to
mathematical analysis of the data structures.  There is certainly a
difference in market there.

Up until now I haven't had the cheek to produce my own data structures
book.  Perhaps I should give it serious consideration.

-- 
Australian citizen since 14 August 1996.  *Now* I can vote the xxxs out!
Richard A. O'Keefe; http://www.cs.rmit.edu.au/%7Eok; RMIT Comp.Sci.

next prev parent reply	other threads:[~1996-10-07  0:00 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
1996-09-24  0:00 AVL Tree Implementation Steve Kiraly
1996-09-25  0:00 ` Richard A. O'Keefe
1996-09-26  0:00   ` Michael Feldman
1996-10-01  0:00     ` Richard A. O'Keefe
1996-10-03  0:00       ` Michael Feldman
1996-10-06  0:00         ` Stanley Allen
1996-10-07  0:00         ` Richard A. O'Keefe [this message]
1996-10-16  0:00           ` John Howard
1996-09-27  0:00 ` Mats Weber
1996-09-30  0:00 ` Gerald.Kasner
  -- strict thread matches above, loose matches on Subject: below --
1996-09-28  0:00 Robert Dewar

replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox