From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.3 required=5.0 tests=BAYES_00,INVALID_MSGID autolearn=no autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 10f6aa,e1e578817780dac2 X-Google-Attributes: gid10f6aa,public X-Google-Thread: fc89c,97188312486d4578 X-Google-Attributes: gidfc89c,public X-Google-Thread: 109fba,baaf5f793d03d420 X-Google-Attributes: gid109fba,public X-Google-Thread: 1014db,6154de2e240de72a X-Google-Attributes: gid1014db,public X-Google-Thread: 103376,97188312486d4578 X-Google-Attributes: gid103376,public From: ok@goanna.cs.rmit.edu.au (Richard A. O'Keefe) Subject: Re: Should I learn C or Pascal? Date: 1996/08/22 Message-ID: <4vg8v8$9i2@goanna.cs.rmit.edu.au>#1/1 X-Deja-AN: 175733753 references: <4u7hi6$s2b@nntp.seflin.lib.fl.us> <4uo74j$95p@ns.broadvision.com> <4v1r2a$gh6@falcon.le.ac.uk> <840224235snz@genesis.demon.co.uk> <32159698.39D9@xenon.chromatic.com> <3215cc56.142847293@nntp.ix.netcom.com> <4v9a47$gm3@goanna.cs.rmit.edu.au> <32199bba.19972158@nntp.ix.netcom.com> organization: Comp Sci, RMIT, Melbourne, Australia newsgroups: comp.lang.c,comp.lang.c++,comp.unix.programmer,comp.lang.ada,comp.os.msdos.programmer nntp-posting-user: ok Date: 1996-08-22T00:00:00+00:00 List-Id: miker3@ix.netcom.com (Mike Rubenstein) writes: >No. The assumption that O(N^2) performance is unlikely assumes that >quicksort is properly implemented. The distribution of the data has >nothing to do with it. The point I was trying to make is that the allegedly "unlikely" behaviour is already *known* to have occurred much more often than the theorems you see in data structures books would have you believe. Given any non-randomising version of quicksort, I can construct a pessimal input for that version. If the distribution of the input I supply is such that pessimal problems occur with probability 1, I will get so-called "unlikely" behaviour with probability 1. So *in the absence of built-in randomisation* the distribution has a lot to do with it. Again, some of the proofs of the unlikelihood of bad behaviour tacitly assume that all elements are distinct. Even randomisation won't help unless you use a fat pivot or some other technique that handles repeated elements well. So if Mike Rubenstein means by "properly implemented" a quicksort using randomisation and a fat pivot, then I am forced to agree with him. On the other hand, while the quicksort I use (the "Engineered" one by Bentley & McIlroy) uses a fat pivot, it doesn't use randomisation, so for a recently version of quicksort designed and tested by experts and published in a refereed journal, which gets impressive performance in practice (it is the first quicksort I've ever come across which can routinely compete with a well implemented merge sort), it is possible to find an input distribution for which "Engineered" qsort will behave badly with probability 1. So either Mike Rubenstein is wrong, or Bentley & McIlroy's qsort is not "properly implemented". >> The likelihood of quicksort doing badly on *your* data depends on the >> actual probability distribution of *your* data. >Again, not if quicksort is implemented properly. Which either means "using randomisation and a fat pivot" or is false. >There is no need to permute the array. Just follow Hoare's suggestion >for avoiding O(N^2) performance. >Hoare suggested choosing the pivot randomly. This has the same effect >on overall performance as permuting the data. Permuting the whole array is simpler and faster. There is a phase where the random number generator is in the cache and the sorting code is not, then there is a phase when the sorting code is in the cache and the random number generator is not. What's more, depending on the random number generator, it may be possible to vectorise the random number generation and fuse that loop with the permutation loop. The separated approach *looks* as though it ought to be easier to make fast. (For example, if using AS183, the three integers it maintains can be kept in registers during the permutation phase instead of being repeatedly stored back into memory and reloaded.) There is also the practical point that if you want to use someone else's quicksort and don't fully trust it, you can permute the array yourself before calling qsort, without being able to modify the qsort code. >Choosing the median of the first, middle, and last elements for the >pivot also works well in practice. Ahem. This is the approach that has yielded bad behaviour IN PRACTICE. >In particular, it will prevent >O(N^2) performance if the data is nearly in order or reverse order. Only if (a) repeated elements are rare or (b) you use a fat pivot. The only ways to make bad runtime cost unlikely for any algorithm are (a) use an algorithm for which bad cost can *never* happen (b) randomise so that you *know* what the distribution is. -- Australian citizen since 14 August 1996. *Now* I can vote the xxxs out! Richard A. O'Keefe; http://www.cs.rmit.edu.au/~ok; RMIT Comp.Sci.