From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,14bff0642983a2a5 X-Google-Attributes: gid103376,public X-Google-ArrivalTime: 2003-07-29 17:33:48 PST Path: archiver1.google.com!news1.google.com!newsfeed.stanford.edu!logbridge.uoregon.edu!feed2.news.rcn.net!rcn!dca1-feed2.news.algx.net!dca1-feed1.news.algx.net!allegiance!dca1-nnrp1.news.algx.net.POSTED!not-for-mail Sender: kst@king.cts.com Newsgroups: comp.lang.ada Subject: Re: sorting large numbers of large records References: From: Keith Thompson Message-ID: User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Wed, 30 Jul 2003 00:32:16 GMT NNTP-Posting-Host: 209.68.192.180 X-Complaints-To: abuse@algx.net X-Trace: dca1-nnrp1.news.algx.net 1059525136 209.68.192.180 (Tue, 29 Jul 2003 20:32:16 EDT) NNTP-Posting-Date: Tue, 29 Jul 2003 20:32:16 EDT Organization: Allegiance Internet - Greenbelt, MD Xref: archiver1.google.com comp.lang.ada:40974 Date: 2003-07-30T00:32:16+00:00 List-Id: "Brien L. Christesen" writes: > That is a good point, and I looked into the unix sort command. The only > problem is that as far as I can tell, that only sorts text files. I have > a file of binary records, so I have no idea how I could use a system sort > command to do it. Is there any way that would work? Translate your binary records into a sortable text format, one record per line, sort the resulting text file, and translate back into your binary format. As long as a line doesn't contain NUL or linefeed characters, you should be ok. If you're using GNU sort, the length of each line can be unlimited; for a non-GNU Unix sort program, there may be some limit, but it's probably at least 1024 characters. If you don't have GNU sort, consider installing it; it's part of the GNU coreutils package at . Put the sort key in a fixed-width field at the beginning of the line, in a form that can be sorted by a simple string comparison (Unix sort can do numeric comparisons, but they're slower). You can probably make the other fields fixed-width or variable-width, whichever is more convenient. Don't worry too much about making the text format human-readable. Consider using hexadecimal rather than decimal for integer fields; it sorts just as well (when treated as strings) and may be cheaper to convert. You can even use raw hexadecimal for floating-point and other binary fields (convert the representation, not the value), as long as you're not using them as part of the sort key; the only requirement is that you're able to recover the binary values from the strings. Carefully read the man page for the sort command on your system. If your program needs to be portable to multiple Unix-like systems, read the man page on all of them. Pay attention to any limitations on line length. Use whatever options are needed to turn off any locale-specific behavior; you need raw bytewise ASCII collation, not something that knows about accented letters. For GNU sort, set the environment variable $LC_ALL to "C". Finally, I'm not sure what GNU sort (or any other Unix-like sort) does with input too big to fit into memory; read the documentation and/or experiment to make sure it fits your needs. I know that GNU sort has an option to tell it where to put temporary files, so apparently it uses temporary files somehow. -- Keith Thompson (The_Other_Keith) kst@cts.com San Diego Supercomputer Center <*> Schroedinger does Shakespeare: "To be *and* not to be"