comp.lang.ada
 help / color / mirror / Atom feed
* Efficient io of arbitrary binary data.
@ 1996-09-13  0:00 Brian R. Hanson
  1996-09-14  0:00 ` Larry Kilgallen
  1996-09-14  0:00 ` Larry Kilgallen
  0 siblings, 2 replies; 10+ messages in thread
From: Brian R. Hanson @ 1996-09-13  0:00 UTC (permalink / raw)



I recently had to replace a merge/sort program written in terible
fortran with a new implimentation written in c.

The data being sorted is variable length binary records.  The 
sort reads as many records as fit into some large buffer sorts them
and writes the sorted data to a file in large blocks.  THe blocks
are generated so that no record spans a block and the size of the block
is chosen to be efficiently read and written by the os.

Once the initial sort pass is complete, the file now has some number
of sorted regions (built of these blocks which it merges in log2(n)
passes.

Using c I was able to read the blocks (asynchronously) and merge
the data from the input buffers directly to the output buffer.
The buffer management routines returned references directly the the
strings in the buffers which could be compared and the appropriate
one moved.

I considering how this program could be written in Ada (part of 
an attempt to become Ada literate in an Ada hostile environment)
I a puzzled.  The approaches which Ada seems to allow all require 
much more copying of data as I am not allowed to return a reference
to a slice of an array I can only return the slice itself.

In c, the approach to building the block is to treat the block as
an array of char and an array of int.  The record data is written
from the begining of the block and the record lengths are written 
from the end of the block.  Records are stored until a record long
enough to cause the length and data to overlap is encountered at
which time the block is written and the record is stored in the 
new block instead.  (having the length information and data grow 
toward the middle allows both to be used naturally without worrying
about alignment).  When the blocks are being read, the routine to
get the next record keeps returning the location and size of the 
next record in the block until the block is exhausted when it starts
on the next.

Writing the block is not hard in Ada with a little help from 
unchecked_conversion.  However, reading the records seems to require
that the data be returned from the block reader rather than a 
reference to the data.

Is this true or there a much better approach to solving this problem
of efficient io of arbitrary binary records?

-- Brian Hanson
-- brh@cray.com




^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~1996-09-17  0:00 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1996-09-13  0:00 Efficient io of arbitrary binary data Brian R. Hanson
1996-09-14  0:00 ` Larry Kilgallen
1996-09-16  0:00   ` Brian Hanson
1996-09-16  0:00     ` Stephen Leake
1996-09-16  0:00     ` Robert A Duff
1996-09-16  0:00     ` Larry Kilgallen
1996-09-14  0:00 ` Larry Kilgallen
1996-09-14  0:00   ` Robert Dewar
1996-09-16  0:00   ` Brian Hanson
1996-09-17  0:00   ` Ted Dennison

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox