From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.3 required=5.0 tests=BAYES_00,INVALID_MSGID autolearn=no autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,7402728c011ea87a,start X-Google-Attributes: gid103376,public From: "Brian R. Hanson" Subject: Efficient io of arbitrary binary data. Date: 1996/09/13 Message-ID: <3239B3B2.1AE4@cray.com>#1/1 X-Deja-AN: 180430782 content-type: text/plain; charset=us-ascii organization: Cray Research a division of Silicon Graphics, Inc. mime-version: 1.0 newsgroups: comp.lang.ada x-mailer: Mozilla 3.0b7 (X11; I; SunOS 5.4 sun4m) Date: 1996-09-13T00:00:00+00:00 List-Id: I recently had to replace a merge/sort program written in terible fortran with a new implimentation written in c. The data being sorted is variable length binary records. The sort reads as many records as fit into some large buffer sorts them and writes the sorted data to a file in large blocks. THe blocks are generated so that no record spans a block and the size of the block is chosen to be efficiently read and written by the os. Once the initial sort pass is complete, the file now has some number of sorted regions (built of these blocks which it merges in log2(n) passes. Using c I was able to read the blocks (asynchronously) and merge the data from the input buffers directly to the output buffer. The buffer management routines returned references directly the the strings in the buffers which could be compared and the appropriate one moved. I considering how this program could be written in Ada (part of an attempt to become Ada literate in an Ada hostile environment) I a puzzled. The approaches which Ada seems to allow all require much more copying of data as I am not allowed to return a reference to a slice of an array I can only return the slice itself. In c, the approach to building the block is to treat the block as an array of char and an array of int. The record data is written from the begining of the block and the record lengths are written from the end of the block. Records are stored until a record long enough to cause the length and data to overlap is encountered at which time the block is written and the record is stored in the new block instead. (having the length information and data grow toward the middle allows both to be used naturally without worrying about alignment). When the blocks are being read, the routine to get the next record keeps returning the location and size of the next record in the block until the block is exhausted when it starts on the next. Writing the block is not hard in Ada with a little help from unchecked_conversion. However, reading the records seems to require that the data be returned from the block reader rather than a reference to the data. Is this true or there a much better approach to solving this problem of efficient io of arbitrary binary records? -- Brian Hanson -- brh@cray.com