From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.5-pre1 (2020-06-20) on
	ip-172-31-74-118.ec2.internal
X-Spam-Level: 
X-Spam-Status: No, score=-0.0 required=3.0 tests=BAYES_20 autolearn=ham
	autolearn_force=no version=3.4.5-pre1
Date: 1 Sep 93 22:32:42 GMT
From: alice!peju@ucbvax.Berkeley.EDU  (Peter Juhl)
Subject: Re: Hoare's gripes about Ada (should be so what)
Message-ID: <26475@alice.att.com>
List-Id: <comp.lang.ada>

I apologize if this appears twice in different forms, I posted
a preliminary version. Doug McIlroy had numerous suggestions
for improvements, I hereby send a more readable article (the first
one should have been canceled, but one never knows.) 

I think that it has some interest.

Colin James's remarks about quicksort vs distribution sort (hereafter
called radix sort) must be tempered according to circumstances.  It is
not clear exactly which big-O estimates he referred to, but it is
worth warning that the occasionally made claim that quicksort takes
time N log N while radix sort only takes time N makes sense only if
you give entirely different interpretations to N in the two cases.

If you have N records comprising a total volume of V "digits" of data,
then performance scales like

        quicksort, expected:  N log N comparisons
        quicksort, worst:  N^2 comparisons
        big-end radix, expected:  N log N digit inspections
        big-end radix, worst:  V digit inspections
        little-end radix, always:  V digit inspections

V is usually bigger - often much bigger - than N log N.

If you are comparing strings, comparisons take expected time at least
log N, not time 1.  Thus for strings quicksort has expected time N
log^2 N. The worst case is VN.

So it all depends on what you are sorting.  If keys can be represented
as strings, and keys must be compared as strings, then big-end radix
sort is a good thing.

If it should happen that V is not much bigger than N log N and keys
are essentially strings, then little-end radix is a good thing.

Radix sorting has considerable bookkeeping overhead.  Under the
conditions set forth above, it is definitely a win, but not without
substantial programming effort.  So quicksort is not dead:  it is easy
to get right with reasonable efficiency, and it is the only game in
town when keys cannot be compared as strings.

A little-known hybrid of radix sort and quicksort gets around the
problem of string comparison taking expected time log N, at the cost
of a larger scaling constant, without the bookkeeping hassle of radix
sorting.  Partition on first digits into three parts
(less,equal,greater).  In the less and greater parts partition
recursively on first digits.  In the equal part partition sort
recursively on second digits and so on.

Both research Unix and BSD Unix currently use radix sort in their sort
utilities.

All this and more can be found in the article, "Engineering radix
sort", by McIlroy, Bostic, and McIlroy in "Computing Systems" 6 (1993)
5-27.  Another recent article is Davis, "A fast radix sort", in
"Computer Journal" 35 (1992) 636-642.  Both articles contain effective
programs written in C. For a fast C qsort, based on quicksort, see
Bentley and D McIlroy to appear next November in "Software Practice
and Experience".  For the speed-champion qsort (which isn't based on
quicksort) see P McIlroy in the proceedings of the Symposium on
Discrete Algorithms, Austin, 1993.

Incidentally, Hoare is an excellent software engineer.

--- peter (peju@research.att.com)