From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.3 required=5.0 tests=BAYES_00,INVALID_MSGID
	autolearn=no autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: 10f6aa,e1e578817780dac2
X-Google-Attributes: gid10f6aa,public
X-Google-Thread: 109fba,baaf5f793d03d420
X-Google-Attributes: gid109fba,public
X-Google-Thread: fc89c,97188312486d4578
X-Google-Attributes: gidfc89c,public
X-Google-Thread: 1014db,6154de2e240de72a
X-Google-Attributes: gid1014db,public
X-Google-Thread: 103376,97188312486d4578
X-Google-Attributes: gid103376,public
From: miker3@ix.netcom.com (Mike Rubenstein)
Subject: Re: Should I learn C or Pascal?
Date: 1996/08/20
Message-ID: <32199bba.19972158@nntp.ix.netcom.com>#1/1
X-Deja-AN: 175298137
references: <4u7hi6$s2b@nntp.seflin.lib.fl.us> <4uo74j$95p@ns.broadvision.com>
 <qqk9v3ji06.fsf@tartarus.ucsd.edu> <4v1r2a$gh6@falcon.le.ac.uk>
 <840224235snz@genesis.demon.co.uk> <32159698.39D9@xenon.chromatic.com>
 <3215cc56.142847293@nntp.ix.netcom.com> <4v9a47$gm3@goanna.cs.rmit.edu.au>
organization: Netcom
x-netcom-date: Tue Aug 20  4:21:10 AM PDT 1996
newsgroups: 
 comp.lang.c,comp.lang.c++,comp.unix.programmer,comp.lang.ada,comp.os.msdos.programmer
Date: 1996-08-20T04:21:10-07:00
List-Id: <comp.lang.ada>


ok@goanna.cs.rmit.edu.au (Richard A. O'Keefe) wrote:

> miker3@ix.netcom.com (Mike Rubenstein) writes:
> >You should not have cut class the day they covered the techniques for
> >making O(N^2) performance very unlikely.  In fact, properly
> >implemented, quicksort will give O( N log N) performance on both a
> >reversed list and a shifted list.
> 
> Yes, well I suppose all of us missed _something_ in our CS educations,
> such as the fact that the "proof" that O(N**2) is unlikely assumes that
> every possible permutation of the input is equally likely, which is not
> true in the real world.  Check the paper by Bentley and someone (McIlroy?)
> in Software Practice and Experience on "Engineering a Sort":  the UNIX
> qsort() routine *did* yield O(N**2) behaviour in actual use by people who
> were trying to solve a real problem, not break the sorting routine.

No.  The assumption that O(N^2) performance is unlikely assumes that
quicksort is properly implemented.  The distribution of the data has
nothing to do with it.

The fact that some versions of UNIX qsort() were badly implemented
doesn't change the fact that Hoare gave a simple method in his
original paper for making O(N^2) performance very unlikely for any
data distribution.

> 
> The likelihood of quicksort doing badly on *your* data depends on the
> actual probability distribution of *your* data.

Again, not if quicksort is implemented properly.
 
> This leads to the odd observation that if you don't know what the
> distribution of your data is, and you would like to be confident that
> bad things are genuinely unlikely, a good first step is to *scramble*
> your array before sorting it, so that the assumption of this proof is
> known to be true!  (Permuting an array randomly is only O(N).)

There is no need to permute the array.  Just follow Hoare's suggestion
for avoiding O(N^2) performance.

Hoare suggested choosing the pivot randomly.  This has the same effect
on overall performance as permuting the data.

Choosing the median of the first, middle, and last elements for the
pivot also works well in practice.  In particular, it will prevent
O(N^2) performance if the data is nearly in order or reverse order.


Michael M Rubenstein