comp.lang.ada
 help / color / mirror / Atom feed
From: ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe)
Subject: Re: WE NEED A GOOD Ada SORT PACKAGE!
Date: 9 Nov 1994 18:30:15 +1100
Date: 1994-11-09T18:30:15+11:00	[thread overview]
Message-ID: <39ptq7$ris@goanna.cs.rmit.oz.au> (raw)
In-Reply-To: 39bvj7$543@goanna.cs.rmit.oz.au

> ferguson@TRON.BWI.WEC.COM wrote:

> We are looking for a generic Ada sort package that will be efficient for
> extremely large numbers of records residing on disk (~700,000), occupying about
> 20 MB of memory, which are locally out of order (i.e. the data are only
> unordered in a small region of the whole).  The application will reside on a
> UNIX workstation.  If there are commercially available sort packages,
> we'd sure like to know where they are!

It would be most illuminating if you could tell us what is wrong with the
UNIX sort(1) utility.  If your records (28.5 bytes average, from your figures)
are normal lines of readable text (and if they aren't, you have some nasty
portability problems waiting to bite you) it should be able to do the job.

I have a little utility that I posted to the net a couple of years ago,
written in C.  It's called nsort, and it operates by
    - reading the entire file into memory as a singly-linked list
    - performing a bottom up "natural merge" sort.
      This is one of the best internal sorts around anyway (yes, I do mean
      that it usually beats "quick"sort) and it is especially good with
      mostly-ordered data
    - then it writes the lines out.
I get rather large speedups compared with sort(1).  For example, I just
created a test file:
g% wc /tmp/TEXT
  259533  940803 7206497 /tmp/TEXT
7.2 Mbytes, ~260 000 lines, ~27.8 bytes/record including line terminator
so the average record size is not too far off your figure.  On the SPARC
Solaris 2.3 system that I use, 
    Command                     User System (time in seconds)
    sort  </tmp/TEXT >/dev/null   38+1	-- to my surprise, -y didn't help
    gsort </tmp/TEXT >/dev/null   16+4	-- GNU sort (in textutils.tar.gz)
    nsort </tmp/TEXT >/dev/null   11+1

That's a speedup ratio of 39/12 > 3, which is rather gratifying, because
sort(1) is supposed to be fairly well tuned.  The guts of nsort is about
60 lines of C.  It would be about the same in Ada.

-- 
"The complex-type shall be a simple-type."  ISO 10206:1991 (Extended Pascal)
Richard A. O'Keefe; http://www.cs.rmit.edu.au/~ok; RMIT Comp.Sci.



  parent reply	other threads:[~1994-11-09  7:30 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
1994-11-02 21:19 WE NEED A GOOD Ada SORT PACKAGE! Bennett, Chip (KTR) ~U
1994-11-03 17:42 ` Jahn Rentmeister
1994-11-05  5:43   ` Robert Dewar
1994-11-06 15:28     ` Larry Kahn
     [not found] ` <39bvj7$543@goanna.cs.rmit.oz.au>
1994-11-09  7:30   ` Richard A. O'Keefe [this message]
  -- strict thread matches above, loose matches on Subject: below --
1994-11-01 22:10 ferguson
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox