From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-0.8 required=5.0 tests=BAYES_00,INVALID_DATE autolearn=no autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,e896ad9558f88d18 X-Google-Attributes: gid103376,public X-Google-ArrivalTime: 1994-11-08 23:53:20 PST Path: nntp.gmd.de!xlink.net!howland.reston.ans.net!math.ohio-state.edu!caen!msuinfo!harbinger.cc.monash.edu.au!aggedor.rmit.EDU.AU!goanna.cs.rmit.oz.au!not-for-mail From: ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) Newsgroups: comp.lang.ada Subject: Re: WE NEED A GOOD Ada SORT PACKAGE! Date: 9 Nov 1994 18:30:15 +1100 Organization: Comp Sci, RMIT, Melbourne, Australia Message-ID: <39ptq7$ris@goanna.cs.rmit.oz.au> References: <2EB8042E@SMTPGATE2.STRATCOM.AF.MIL> <39bvj7$543@goanna.cs.rmit.oz.au> NNTP-Posting-Host: goanna.cs.rmit.oz.au NNTP-Posting-User: ok Date: 1994-11-09T18:30:15+11:00 List-Id: > ferguson@TRON.BWI.WEC.COM wrote: > We are looking for a generic Ada sort package that will be efficient for > extremely large numbers of records residing on disk (~700,000), occupying about > 20 MB of memory, which are locally out of order (i.e. the data are only > unordered in a small region of the whole). The application will reside on a > UNIX workstation. If there are commercially available sort packages, > we'd sure like to know where they are! It would be most illuminating if you could tell us what is wrong with the UNIX sort(1) utility. If your records (28.5 bytes average, from your figures) are normal lines of readable text (and if they aren't, you have some nasty portability problems waiting to bite you) it should be able to do the job. I have a little utility that I posted to the net a couple of years ago, written in C. It's called nsort, and it operates by - reading the entire file into memory as a singly-linked list - performing a bottom up "natural merge" sort. This is one of the best internal sorts around anyway (yes, I do mean that it usually beats "quick"sort) and it is especially good with mostly-ordered data - then it writes the lines out. I get rather large speedups compared with sort(1). For example, I just created a test file: g% wc /tmp/TEXT 259533 940803 7206497 /tmp/TEXT 7.2 Mbytes, ~260 000 lines, ~27.8 bytes/record including line terminator so the average record size is not too far off your figure. On the SPARC Solaris 2.3 system that I use, Command User System (time in seconds) sort /dev/null 38+1 -- to my surprise, -y didn't help gsort /dev/null 16+4 -- GNU sort (in textutils.tar.gz) nsort /dev/null 11+1 That's a speedup ratio of 39/12 > 3, which is rather gratifying, because sort(1) is supposed to be fairly well tuned. The guts of nsort is about 60 lines of C. It would be about the same in Ada. -- "The complex-type shall be a simple-type." ISO 10206:1991 (Extended Pascal) Richard A. O'Keefe; http://www.cs.rmit.edu.au/~ok; RMIT Comp.Sci.