From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.5-pre1 (2020-06-20) on ip-172-31-74-118.ec2.internal X-Spam-Level: X-Spam-Status: No, score=-0.0 required=3.0 tests=BAYES_20 autolearn=ham autolearn_force=no version=3.4.5-pre1 Date: 1 Sep 93 22:32:42 GMT From: alice!peju@ucbvax.Berkeley.EDU (Peter Juhl) Subject: Re: Hoare's gripes about Ada (should be so what) Message-ID: <26475@alice.att.com> List-Id: I apologize if this appears twice in different forms, I posted a preliminary version. Doug McIlroy had numerous suggestions for improvements, I hereby send a more readable article (the first one should have been canceled, but one never knows.) I think that it has some interest. Colin James's remarks about quicksort vs distribution sort (hereafter called radix sort) must be tempered according to circumstances. It is not clear exactly which big-O estimates he referred to, but it is worth warning that the occasionally made claim that quicksort takes time N log N while radix sort only takes time N makes sense only if you give entirely different interpretations to N in the two cases. If you have N records comprising a total volume of V "digits" of data, then performance scales like quicksort, expected: N log N comparisons quicksort, worst: N^2 comparisons big-end radix, expected: N log N digit inspections big-end radix, worst: V digit inspections little-end radix, always: V digit inspections V is usually bigger - often much bigger - than N log N. If you are comparing strings, comparisons take expected time at least log N, not time 1. Thus for strings quicksort has expected time N log^2 N. The worst case is VN. So it all depends on what you are sorting. If keys can be represented as strings, and keys must be compared as strings, then big-end radix sort is a good thing. If it should happen that V is not much bigger than N log N and keys are essentially strings, then little-end radix is a good thing. Radix sorting has considerable bookkeeping overhead. Under the conditions set forth above, it is definitely a win, but not without substantial programming effort. So quicksort is not dead: it is easy to get right with reasonable efficiency, and it is the only game in town when keys cannot be compared as strings. A little-known hybrid of radix sort and quicksort gets around the problem of string comparison taking expected time log N, at the cost of a larger scaling constant, without the bookkeeping hassle of radix sorting. Partition on first digits into three parts (less,equal,greater). In the less and greater parts partition recursively on first digits. In the equal part partition sort recursively on second digits and so on. Both research Unix and BSD Unix currently use radix sort in their sort utilities. All this and more can be found in the article, "Engineering radix sort", by McIlroy, Bostic, and McIlroy in "Computing Systems" 6 (1993) 5-27. Another recent article is Davis, "A fast radix sort", in "Computer Journal" 35 (1992) 636-642. Both articles contain effective programs written in C. For a fast C qsort, based on quicksort, see Bentley and D McIlroy to appear next November in "Software Practice and Experience". For the speed-champion qsort (which isn't based on quicksort) see P McIlroy in the proceedings of the Symposium on Discrete Algorithms, Austin, 1993. Incidentally, Hoare is an excellent software engineer. --- peter (peju@research.att.com)