From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,5b61ae0cff785f8b X-Google-Attributes: gid103376,public From: ok@goanna.cs.rmit.edu.au (Richard A. O'Keefe) Subject: Re: AVL Tree Implementation Date: 1996/10/07 Message-ID: <53ak1c$ar4$1@goanna.cs.rmit.edu.au> X-Deja-AN: 187311864 references: <52alk3$j0k@goanna.cs.rmit.edu.au> <52epbl$lht@felix.seas.gwu.edu> <52pvit$gae@goanna.cs.rmit.edu.au> <531btt$rab@felix.seas.gwu.edu> organization: Comp Sci, RMIT, Melbourne, Australia nntp-posting-user: ok newsgroups: comp.lang.ada Date: 1996-10-07T00:00:00+00:00 List-Id: mfeldman@seas.gwu.edu (Michael Feldman) writes: >>The assumption here is that a data structures textbook is of value >>*only* to students, and more specifically only to students of data >>Leaving out operations, or providing fragments that require nontrivial >>stitching, is a good way to ensure that this assumption will be *true* >>of a particular book. >Which assumption? That occasionally a user will curse the author? >Yeah, I guess I'll have to live with that.:-) No, the assumption that a data structures textbook is of value *only* to students, and more specifically only to students of data structures, and then only while they are studying data structures. >I admire your very high standards for books; you and I have discussed >this publicly and privately before. I don;t think it's too realistic >for you to expect _perfection_, though. Who is asking for perfection? Since when has "complete working code for a major data structure" been _perfection_ in a data structures textbook? >Nothing would _ever_ get out >to where people could use it if the standard were _that_ high... We would be much better off if fewer books _were_ out to where people could be abused by them. (This does not apply to your books.) I must admit that would be very dismayed indeed if the revised editions of The Art of Computer Programming were held up much longer... >>The reader is always left with the suspicion that the author didn't >>provide complete code because he _couldn't_ and that the data structure >>is of no real use. > How do I respond to this one? ("Trust me! I know how to do it!")? >>One of the things I was proudest of in the Pascal code I provided to the >>students in the CS241 course when I taught it was that it was complete >>working generic (thanks to M4, sigh, and thank goodness for Ada) code >>that they could take away and _use_ in applications >Yep, we all write code like this. I'm reasonably proud of mine too, even >if some of it was stubbed out so students could fill it back in. The specific case at issue is variants of binary search trees. We are talking about implementing AVL trees by ripping out a number of mostly small operations into separate procedures that plug into a parent unit that does not exist (there is no AVL_Trees_Generic for them to plug into). In the absence of the book (which of course the *intended* users of the code would certainly have, so this is quite unfair) I surmise that the intent is to modify a copy of Binary_Search_Trees_Generic. -- Digression I find it in practice a *much* better interface to define insert(Collection, Element) = Collection := Collection U {Element} delete(Collection, Element) = Collection := Collection \ {Element} Why do I find this a better interface than the one with the apparently arbitrary "raise Not_Found if delete from empty tree" version? Because reasoning about it is *much* simpler thanks to the absence of a not-practically-motivated case analysis. -- End digression. Now doing this is surprisingly tricky, because while there _is_ a replacement for the Insert operation, there _isn't_ any replacement for the Delete operation, and that's needed. Never mind the "get the sods to do some work plugging things together" issue, the *most difficult* operation in the data structure is not provided. It is this kind of thing which has had me throwing books across the room in rage (my disposable income is _very_ small; the last 10 technical books I book were on sale and without that I wouldn't have been able to afford even one of them). It is this kind of thing which has had a colleague speaking harshly of a book, and advising his students not to buy it. And it is when authors preferentially omit _hard_ operations that suspicions are aroused. (Like when Sedgewick left out deletion from Red-Black trees in one of his books.) I also note that binary_search_trees_generic.ads (hence presumably avl_trees_generic.ads) provides only a small number of useful operations. Here is a sketch of the "dictionary" interface I used in my Pascal class. required instantiation parameters TAG (instance name) KeyType (type of keys) ValType (type of satellite data) optional instantiation parameters MaxSize (integer; size limit) ChunkSize (integer; used for "chunked" allocation) keyRead (procedure for reading KeyType) valRead (procedure for reading ValType) keyWrite (procedure for writing KeyType) valWrite (procedure for writing ValType) keyCompare (x,y -> -1,0,+1 if xy) keyHash (key -> natural for hashing) In Ada, this would be something like generic type Element is private; with function "<"(Left, Right: Element) return Boolean is <>; with function Hash(Item: Element) return natural; Max_Size : constant Positive := 1; Chunk_Size: constant Positive := 1; package Dictionary_Generic_PKG is keyRead, valRead, and the derived TAG$read procedure would be in a child package. (There wasn't any really satisfactory way to do this in Ada83; I _do_ like Ada95.) The same for keyWrite, valWrite, and the derived TAG$write procedure. general "container" properties from the package TAG$type (the abstract data type) TAG$limit (0 for unbounded collections) TAG$make (constructor routine) TAG$free (destructor routine) TAG$reset (= free + make, may be cheaper) TAG$size (returns number of elements) TAG$empty (TAG$size = 0, may be _much_ faster) TAG$full (false => safe to insert new element) TAG$copy (assignment) TAG$read (only if keyRead and valRead provided) TAG$write (only if keyWrite and valWrite provided) I was considerably startled not to see an equivalent of TAG$empty or TAG$size in binary_search_trees.ads, because it is an extremely important operation of the abstract type which cannot be synthesised from the available operations (at least, not without using exceptions, and it's a clumsy design that _requires_ you to use exception handling to tell if a container is empty). dictionary-specific properties TAG$lookup (dictionary x key -> val/bool) TAG$has (dictionary x key -> bool) TAG$enter (dictionary, key, val) TAG$forget (dictionary, key) My distinction between `lookup' and `has' is related to but different from Feldman's distinction between `Search' and `Retrieve'; the reason is simply that I have a strong dislike for letting clients get their hands on pointers to internal parts of a data structure. Feldman's interface procedure Search(T: Tree; K: Key_Type) return Tree; function Retrieve(T: Tree) return Element; -- or raise Not_Found if T is null exposes too much about the implementation; it would not work very well with type Node; type Link is access Node; type Node is record Left, Right : Link; Item : Element; end record; type Tree is record Root: Link; Size: Natural; end record; What's this got to do with providing complete working code? Simply that this imperfect information hiding flaw in the interface doesn't really show up until you try to provide a practically complete set of operations. (It also helps if you try to make _all_ your dictionary packages use the _same_ interface, or as closely the same as possible.) TAG$min (dictionary -> key, if keyCompare provided) TAG$max (dictionary -> key, if keyCompare provided) TAG$extractMin (dictionary -> key, val) TAG$extractMax (dictionary -> key, val) These are important, because they cannot be implemented _outside_ the interface. "bulk" operations TAG$walk2 traversal, may update vals TAG$count2 counts key,val pairs satisfying predicate TAG$select2 keeps key,val pairs satisfying predicate TAG$reject2 discards key,val pairs satisfying predicate Here again, there is an interesting point that comes up when you do not try to confine yourself to a minimal set of operations, but try to produce a _practical_ tool: the most efficient way I know to implement the `select' and `reject' operations is to put the retained nodes into a linear list and then rebuild a balanced tree when you're done. The pleasant fact that binary search trees let you get good element-at-a-time operations and _also_, by flattening, processing as ordered lists, and rebuilding, very efficient whole-set operations, is not something that one learns from a text which focuses _solely_ on insert/delete/lookup. With respect to plugging in, I am not aware of any useful way in which one could derive Splay trees by plugging small changes into BSTs. >But AQ&S is only a guide, not a law. We've been around this loop >before, publicly and privately. I am still a member of the (still large) >group of intro-course teachers (especially in Pascal circles) who believe >that beginners learn code patterns better if the reserved words are >emphasized. We do that with upper case. (a) There you part company with the Pascal textbook we used to use, "Foundations of Computer Science by Aho & Ullman" (if there isn't an Ada edition yet, I hope there is soon). They use lower case for keywords. (b) We are not talking about beginners. We are talking about data structures texts for CS2 and later. If CS2 students still need to be reminded that 'procedure' is a keyword, heaven help us. (c) Is there published _evidence_ that SHOUTING the KEYWORDS helps, or is it _only_ a belief? Our CS1 students have problems aplently, but telling which are the reserved words and which are not has never been one of them. Or at least, not one that they have ever _asked_ me about, and they have asked me about a great many things. (d) Given that we have Emacs, which can highlight Ada text and even colourise it, and other such editors are available (Alpha for the Macintosh is one, there are others for PCs), it is not clear to me why shouting in CAPITALS is thought better than a subtler typographic style such as underlining or the traditional bold face. Given the existence of HTML and Mosaic (free), and the ease with which one can write a program to automatically add HTML markup such as with Token_PKG, Expression_PKG, Statement_PKG; use Token_PKG, Expression_PKG, Statement_PKG; package Block_PKG is (a real example from real code I am showing to students for an assignment, that really was HTML-marked-up by a small program), it is again unclear why SHOUTING in CAPITALS is BETTER than bold. (e) Having introduced a rule "all capitals means keyword" you are then stuck for a way to write an acronym. If you write "AVL", as the normal rule for acronyms would have it, you are stuck, because now you will confuse your students into thinking it is a keyword. (This applies to other things like CRC as well, and I once had occasion to read a lot of code in a language where CRC _was_ a keyword, so it's not _that_ far-fethced.) (f) Many of the most important code patterns are not distinguished by their keywords. In fact, the more a student thinks of code patterns in terms of keywords, the worse off that student is, because his or her knowledge is thereby encoded in a form which does not transfer to other languages. >Since I'm writing for beginners, that's the lexical style >I choose. It's not worth fighting over. The important thing is >_consistent_ style (even AQ&S says this, and mine is consistent) >and _clear_ code (which I strive for, though this is not quite >so measurable). The sad thing is that there is a substantial example of yours _in_ the AQ&S guidelines, contradicting many of the layout and spelling rules in the same document. (This is the October 95 version I'm talking about.) The trouble is that your code may be SELF-consistent, but it is not the only code that students see. The effect on students of showing them code formatted according to AQ&S rules *plus* code in Professor X's self-consistent-but-different way *plus* code formatted in your self-consistent-but-different way is to show them a *corpus* that is *not* consistently laid out. It is particularly important to note that the Ada 95 LRM itself (which we have on-line here for students to look at) is consistent with the AD&S guidelines. We have a choice: - stop trying to teach the students that consistent layout is good - use *only* your books (including banning the LRM) - use *none* of your books. Despite the many and major merits of your books, the third choice has the highest payoff. >Underscores are a different story. Purely a matter of taste. YMMV.:-) I strongly disagree. I, and the authors of the AQ&S guidlines, may be completely wrong in our belief that Clearly_Separated_Words are easier to read than RunTogetherWords. But this is obviously a matter that is suspectible to experimental determination one way or the other. (In fact I was under the impression that the matter _had_ been settled by psychological experiment, with a clear victory for underscores.) This may in part depend on the native language of the reader; someone familiar with German or Inuit might find runtogetherwords easier to follow than someone more familiar with English. I strongly suspect a Pascal influence on the people who like RunTogetherWords, but note that - Pascal was first implemented on the CDC machines, and my recollection is that their native character set didn't _have_ an underscore, so it wasn't an option - the current standard _does_ have the underscore in identifiers (Similarly, when Smalltalk was being designed, the character that now displays as underscore displayed as left arrow, and having been used for assignment was not available as a separator, hence the SmalltalkStyle.) I have already given an instance in which I was willing to change my own style in order to be consistent with the AQ&S and thereby to be more consistent about what we show students. Here I will say that if I am shown good experimental evidence that RunOnWords are easier to read than Separated_Words (an unlikely outcome given the history of natural writing systems; ever tried reading old Greek or Latin insecriptions? the language is intelligible but the runonwordsareha rdtoread) then I promise to change my style in the direction of greater readability. >To summarize - we need more Ada 95 books for different "markets". True, but for the wrong reason. This is the Age of the Computer (or was it the Age of the Train?). Is there any reason why there cannot be two *automatically* produced views of the same book? There certainly *can* be two automatically produced views of the same _code_ (proof: my program that moved Feldman's code to a more readable AQ&S-ish style). (This kind of idea is just one of many reasons why I hate What-You-See- Is-All-You-Get document processors.) We need Ada-for-TAPI, Ada-for-fun-and-profit, Ada-colouring-book, Ada-OOP, Ada-for-Windows, Ada-for-Linux, Ada-in-21-days, and other such books. The TAFE market is different from the HE market which is different from the "industry training course" market which is different from ... >Who's going to write the one Richard needs? >(Basically, a software components book like the original Booch >components book.) Well, again, I was strongly disappointed in the original Booch components book because it did not *completely* describe the implementations. If you wanted to use the components, you had to buy a licence which we didn't have, so the book was of little practical use to me. It _is_ true that students will need to revise data structures in real life. Unfortunately, many of the changes they will need to make are not forefigured in a lot of books, and cannot easily be done by plugging in different separate subunits. (Such as the change from BSTs to Splay trees.) They also need practice in selecting appropriate components for re-use and in _using_ those components. There are lots of questions like "separate key and value types? single Element type? Element type with Key_Of function?" for dictionaries which need discussing. (My own opinion is that the third is by far the clumsiest of the three, but the really important thing is to explain the choice.) Then of course there is the question of how much of a book to devote to mathematical analysis of the data structures. There is certainly a difference in market there. Up until now I haven't had the cheek to produce my own data structures book. Perhaps I should give it serious consideration. -- Australian citizen since 14 August 1996. *Now* I can vote the xxxs out! Richard A. O'Keefe; http://www.cs.rmit.edu.au/%7Eok; RMIT Comp.Sci.