From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.3 required=5.0 tests=BAYES_00,INVALID_MSGID
	autolearn=no autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: 10f6aa,e1e578817780dac2
X-Google-Attributes: gid10f6aa,public
X-Google-Thread: fc89c,97188312486d4578
X-Google-Attributes: gidfc89c,public
X-Google-Thread: 109fba,baaf5f793d03d420
X-Google-Attributes: gid109fba,public
X-Google-Thread: 1014db,6154de2e240de72a
X-Google-Attributes: gid1014db,public
X-Google-Thread: 103376,97188312486d4578
X-Google-Attributes: gid103376,public
From: ok@goanna.cs.rmit.edu.au (Richard A. O'Keefe)
Subject: Re: Should I learn C or Pascal?
Date: 1996/08/22
Message-ID: <4vg8v8$9i2@goanna.cs.rmit.edu.au>#1/1
X-Deja-AN: 175733753
references: <4u7hi6$s2b@nntp.seflin.lib.fl.us> <4uo74j$95p@ns.broadvision.com>
 <qqk9v3ji06.fsf@tartarus.ucsd.edu> <4v1r2a$gh6@falcon.le.ac.uk>
 <840224235snz@genesis.demon.co.uk> <32159698.39D9@xenon.chromatic.com>
 <3215cc56.142847293@nntp.ix.netcom.com> <4v9a47$gm3@goanna.cs.rmit.edu.au>
 <32199bba.19972158@nntp.ix.netcom.com>
organization: Comp Sci, RMIT, Melbourne, Australia
newsgroups: 
 comp.lang.c,comp.lang.c++,comp.unix.programmer,comp.lang.ada,comp.os.msdos.programmer
nntp-posting-user: ok
Date: 1996-08-22T00:00:00+00:00
List-Id: <comp.lang.ada>


miker3@ix.netcom.com (Mike Rubenstein) writes:
>No.  The assumption that O(N^2) performance is unlikely assumes that
>quicksort is properly implemented.  The distribution of the data has
>nothing to do with it.

The point I was trying to make is that the allegedly "unlikely"
behaviour is already *known* to have occurred much more often than
the theorems you see in data structures books would have you
believe.  Given any non-randomising version of quicksort, I can
construct a pessimal input for that version.  If the distribution
of the input I supply is such that pessimal problems occur with
probability 1, I will get so-called "unlikely" behaviour with
probability 1.  So *in the absence of built-in randomisation*
the distribution has a lot to do with it.

Again, some of the proofs of the unlikelihood of bad behaviour
tacitly assume that all elements are distinct.  Even randomisation
won't help unless you use a fat pivot or some other technique that
handles repeated elements well.

So if Mike Rubenstein means by "properly implemented" a quicksort
using randomisation and a fat pivot, then I am forced to agree with
him.  On the other hand, while the quicksort I use (the "Engineered"
one by Bentley & McIlroy) uses a fat pivot, it doesn't use randomisation,
so for a recently version of quicksort designed and tested by experts
and published in a refereed journal, which gets impressive performance
in practice (it is the first quicksort I've ever come across which can
routinely compete with a well implemented merge sort), it is possible
to find an input distribution for which "Engineered" qsort will behave
badly with probability 1.  So either Mike Rubenstein is wrong, or
Bentley & McIlroy's qsort is not "properly implemented".

>> The likelihood of quicksort doing badly on *your* data depends on the
>> actual probability distribution of *your* data.

>Again, not if quicksort is implemented properly.

Which either means "using randomisation and a fat pivot" or is false.

>There is no need to permute the array.  Just follow Hoare's suggestion
>for avoiding O(N^2) performance.

>Hoare suggested choosing the pivot randomly.  This has the same effect
>on overall performance as permuting the data.

Permuting the whole array is simpler and faster.  There is a phase where
the random number generator is in the cache and the sorting code is not,
then there is a phase when the sorting code is in the cache and the
random number generator is not.  What's more, depending on the random
number generator, it may be possible to vectorise the random number
generation and fuse that loop with the permutation loop.  The separated
approach *looks* as though it ought to be easier to make fast.  (For
example, if using AS183, the three integers it maintains can be kept in
registers during the permutation phase instead of being repeatedly
stored back into memory and reloaded.)

There is also the practical point that if you want to use someone else's
quicksort and don't fully trust it, you can permute the array yourself
before calling qsort, without being able to modify the qsort code.

>Choosing the median of the first, middle, and last elements for the
>pivot also works well in practice.

Ahem.  This is the approach that has yielded bad behaviour IN PRACTICE.

>In particular, it will prevent
>O(N^2) performance if the data is nearly in order or reverse order.

Only if (a) repeated elements are rare or (b) you use a fat pivot.

The only ways to make bad runtime cost unlikely for any algorithm are
(a) use an algorithm for which bad cost can *never* happen
(b) randomise so that you *know* what the distribution is.

-- 
Australian citizen since 14 August 1996.  *Now* I can vote the xxxs out!
Richard A. O'Keefe; http://www.cs.rmit.edu.au/~ok; RMIT Comp.Sci.