* Efficient io of arbitrary binary data.
@ 1996-09-13 0:00 Brian R. Hanson
1996-09-14 0:00 ` Larry Kilgallen
1996-09-14 0:00 ` Larry Kilgallen
0 siblings, 2 replies; 10+ messages in thread
From: Brian R. Hanson @ 1996-09-13 0:00 UTC (permalink / raw)
I recently had to replace a merge/sort program written in terible
fortran with a new implimentation written in c.
The data being sorted is variable length binary records. The
sort reads as many records as fit into some large buffer sorts them
and writes the sorted data to a file in large blocks. THe blocks
are generated so that no record spans a block and the size of the block
is chosen to be efficiently read and written by the os.
Once the initial sort pass is complete, the file now has some number
of sorted regions (built of these blocks which it merges in log2(n)
passes.
Using c I was able to read the blocks (asynchronously) and merge
the data from the input buffers directly to the output buffer.
The buffer management routines returned references directly the the
strings in the buffers which could be compared and the appropriate
one moved.
I considering how this program could be written in Ada (part of
an attempt to become Ada literate in an Ada hostile environment)
I a puzzled. The approaches which Ada seems to allow all require
much more copying of data as I am not allowed to return a reference
to a slice of an array I can only return the slice itself.
In c, the approach to building the block is to treat the block as
an array of char and an array of int. The record data is written
from the begining of the block and the record lengths are written
from the end of the block. Records are stored until a record long
enough to cause the length and data to overlap is encountered at
which time the block is written and the record is stored in the
new block instead. (having the length information and data grow
toward the middle allows both to be used naturally without worrying
about alignment). When the blocks are being read, the routine to
get the next record keeps returning the location and size of the
next record in the block until the block is exhausted when it starts
on the next.
Writing the block is not hard in Ada with a little help from
unchecked_conversion. However, reading the records seems to require
that the data be returned from the block reader rather than a
reference to the data.
Is this true or there a much better approach to solving this problem
of efficient io of arbitrary binary records?
-- Brian Hanson
-- brh@cray.com
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Efficient io of arbitrary binary data.
1996-09-13 0:00 Efficient io of arbitrary binary data Brian R. Hanson
@ 1996-09-14 0:00 ` Larry Kilgallen
1996-09-16 0:00 ` Brian Hanson
1996-09-14 0:00 ` Larry Kilgallen
1 sibling, 1 reply; 10+ messages in thread
From: Larry Kilgallen @ 1996-09-14 0:00 UTC (permalink / raw)
In article <3239B3B2.1AE4@cray.com>, "Brian R. Hanson" <brh@cray.com> writes:
> I a puzzled. The approaches which Ada seems to allow all require
> much more copying of data as I am not allowed to return a reference
> to a slice of an array I can only return the slice itself.
Presuming the concern is runtime efficiency rather than code inspection
efficiency :-), whether you return a slice or a reference to a slice
may be immaterial. A compiler with aggressive optimization may generate
inline code for your whole "read" function, making the object code
for the two methods equivalent. When programming in Ada you have
given the compiler more information than programming in C, so it has
more opportunity to optimize. Based on your return address, I would
presume folks in your shop are big on optimizing compilers.
Larry Kilgallen
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Efficient io of arbitrary binary data.
1996-09-14 0:00 ` Larry Kilgallen
@ 1996-09-16 0:00 ` Brian Hanson
1996-09-16 0:00 ` Stephen Leake
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Brian Hanson @ 1996-09-16 0:00 UTC (permalink / raw)
Larry Kilgallen) writes:
In article <3239B3B2.1AE4@cray.com>, "Brian R. Hanson" <brh@cray.com> writes:
> I a puzzled. The approaches which Ada seems to allow all require
> much more copying of data as I am not allowed to return a reference
> to a slice of an array I can only return the slice itself.
Presuming the concern is runtime efficiency rather than code inspection
efficiency :-), whether you return a slice or a reference to a slice
may be immaterial. A compiler with aggressive optimization may generate
inline code for your whole "read" function, making the object code
for the two methods equivalent. When programming in Ada you have
given the compiler more information than programming in C, so it has
more opportunity to optimize. Based on your return address, I would
presume folks in your shop are big on optimizing compilers.
Actually, folks in our shop are big on optimizing c, c++ and fortran compilers.
My target platforms - sgi (our new parent) and sun (our workstation of
choice prior to acquiring a parent) - use gnat.
is it really likely that
case compare_keys(current_string(buf1), current_string(buf2)) is
when smaller, the_same =>
store_string(buf3, current_string(buf1));
advance_string(buf1);
when larger =>
store_string(buf3, current_string(buf2));
advance_string(buf2);
end case;
would really optimize the copying of the string slices away.
package buffers is
type buffer_type is private;
procedure advance_string(buf: in out buffer_type);
function current_string(buf: in buffer_type) return string;
procedure store_string(buf: in out buffer_type, data: in string);
private
type buffer_type is record
data: string(65536);
start: positive;
length: positive;
-- other details omitted.
end record;
end buffers;
package body buffers is
-- details omitted.
function current_string(buf: in buffer_type) return string is
begin
return buf.data(buf.start, buf.length);
end;
end buffers;
Brian Hanson
--
-- Brian Hanson
-- brh@cray.com
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Efficient io of arbitrary binary data.
1996-09-16 0:00 ` Brian Hanson
@ 1996-09-16 0:00 ` Stephen Leake
1996-09-16 0:00 ` Larry Kilgallen
1996-09-16 0:00 ` Robert A Duff
2 siblings, 0 replies; 10+ messages in thread
From: Stephen Leake @ 1996-09-16 0:00 UTC (permalink / raw)
Brian Hanson wrote:
> [discussion of optimizers snipped]
> is it really likely that
>
> case compare_keys(current_string(buf1), current_string(buf2)) is
> when smaller, the_same =>
> store_string(buf3, current_string(buf1));
> advance_string(buf1);
> when larger =>
> store_string(buf3, current_string(buf2));
> advance_string(buf2);
> end case;
>
>
> package buffers is
> type buffer_type is private;
> procedure advance_string(buf: in out buffer_type);
> function current_string(buf: in buffer_type) return string;
> procedure store_string(buf: in out buffer_type, data: in string);
> private
> type buffer_type is record
> data: string(65536);
> start: positive;
> length: positive;
> -- other details omitted.
> end record;
> end buffers;
>
> package body buffers is
> -- details omitted.
>
> function current_string(buf: in buffer_type) return string is
> begin
> return buf.data(buf.start, buf.length);
> end;
> end buffers;
>
Since you are trying to keep only one copy of each string, but need to
pass access values for it, I would suggest you define your own string
type:
type String_Access is access String;
type Fast_String is record
Buffer : String_Access;
First : NATURAL;
Length : NATURAL; -- or maybe last?
end record;
then define all your functions with this type:
function current_string(buf: in buffer_type) return Fast_String is
begin
return (buf.data'access, (buf.start, buf.length);
end;
procedure store_string (To : in buffer_type; from : in fast_string) is
begin
To.data (1 .. from.length) :=
from.buffer (from.first .. from.first + from.length - 1);
end;
This way, the string is never copied until you need to move it.
warning: I have NOT run this code thru a compiler, so typos are
probably present.
> Brian Hanson
> --
> -- Brian Hanson
> -- brh@cray.com
--
- Stephe
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Efficient io of arbitrary binary data.
1996-09-16 0:00 ` Brian Hanson
1996-09-16 0:00 ` Stephen Leake
@ 1996-09-16 0:00 ` Larry Kilgallen
1996-09-16 0:00 ` Robert A Duff
2 siblings, 0 replies; 10+ messages in thread
From: Larry Kilgallen @ 1996-09-16 0:00 UTC (permalink / raw)
In article <BRH.96Sep16002020@poplar111.cray.com>, brh@poplar111.cray.com (Brian Hanson) writes:
> Actually, folks in our shop are big on optimizing c, c++ and fortran compilers.
> My target platforms - sgi (our new parent) and sun (our workstation of
> choice prior to acquiring a parent) - use gnat.
>
> is it really likely that
>
>
> case compare_keys(current_string(buf1), current_string(buf2)) is
> when smaller, the_same =>
> store_string(buf3, current_string(buf1));
> advance_string(buf1);
> when larger =>
> store_string(buf3, current_string(buf2));
> advance_string(buf2);
> end case;
>
> would really optimize the copying of the string slices away.
As has been pointed out on this list, one of the advantages of using
GNAT is that if you feel it is inadequate in some regard, you can
extend it. Optimization seems one of the best areas for extension,
since you are not doing something which will conflict with whatever
the next standard might be.
Larry Kilgallen
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Efficient io of arbitrary binary data.
1996-09-16 0:00 ` Brian Hanson
1996-09-16 0:00 ` Stephen Leake
1996-09-16 0:00 ` Larry Kilgallen
@ 1996-09-16 0:00 ` Robert A Duff
2 siblings, 0 replies; 10+ messages in thread
From: Robert A Duff @ 1996-09-16 0:00 UTC (permalink / raw)
In article <BRH.96Sep16002020@poplar111.cray.com>,
Brian Hanson <brh@poplar111.cray.com> wrote:
[stuff about sorting slices of strings]
I suggest you define a private type to represent your string-slice
pointers. Each string pointer is represented as a low and high bound,
or low bound and length, or whatever, relative to the buffer. You can
even use an access type pointing to the first character, plus a length,
if you make the buffer components aliased.
Then write your sorting routine in terms of this private data type.
Make sure the buffer is immutable (that is, don't allow modifications
while string-slice-pointers are in use).
- Bob
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Efficient io of arbitrary binary data.
1996-09-13 0:00 Efficient io of arbitrary binary data Brian R. Hanson
1996-09-14 0:00 ` Larry Kilgallen
@ 1996-09-14 0:00 ` Larry Kilgallen
1996-09-14 0:00 ` Robert Dewar
` (2 more replies)
1 sibling, 3 replies; 10+ messages in thread
From: Larry Kilgallen @ 1996-09-14 0:00 UTC (permalink / raw)
In article <3239B3B2.1AE4@cray.com>, "Brian R. Hanson" <brh@cray.com> writes:
> I recently had to replace a merge/sort program written in terible
> fortran with a new implimentation written in c.
>
> The data being sorted is variable length binary records. The
> sort reads as many records as fit into some large buffer sorts them
> and writes the sorted data to a file in large blocks. THe blocks
> are generated so that no record spans a block and the size of the block
> is chosen to be efficiently read and written by the os.
Given that many constraints, you are bound to do some individualized
fiddling, which is the basic operational mode for C. Such activities
are possible in Ada, but not using the higher level constructs which
make the language popular with its adherents.
> I considering how this program could be written in Ada (part of
> an attempt to become Ada literate in an Ada hostile environment)
> I a puzzled. The approaches which Ada seems to allow all require
> much more copying of data as I am not allowed to return a reference
> to a slice of an array I can only return the slice itself.
Given that your input is performed in some os-specific fashion anyway
(your approach seems to indicate you would not want extra os buffering),
you could return an access to the input buffer (array) and indices for the
start and ending bytes of a particular record. Copy the data when you
are ready to ship the record out. Given a high-quality optimizing
compiler, a tight loop to move the bytes should not be any less efficient
than having it done by a run-time-library.
Larry Kilgallen
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Efficient io of arbitrary binary data.
1996-09-14 0:00 ` Larry Kilgallen
@ 1996-09-14 0:00 ` Robert Dewar
1996-09-16 0:00 ` Brian Hanson
1996-09-17 0:00 ` Ted Dennison
2 siblings, 0 replies; 10+ messages in thread
From: Robert Dewar @ 1996-09-14 0:00 UTC (permalink / raw)
Brian said
"> I considering how this program could be written in Ada (part of
> an attempt to become Ada literate in an Ada hostile environment)
> I a puzzled. The approaches which Ada seems to allow all require
> much more copying of data as I am not allowed to return a reference
> to a slice of an array I can only return the slice itself."
Well seeing as you have neither slices nor references to slices in C,
it is hardly possible that this limitation is a significant one! DOn't
assume that because you are writing in Ada, you *have* to use all its
features!
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Efficient io of arbitrary binary data.
1996-09-14 0:00 ` Larry Kilgallen
1996-09-14 0:00 ` Robert Dewar
@ 1996-09-16 0:00 ` Brian Hanson
1996-09-17 0:00 ` Ted Dennison
2 siblings, 0 replies; 10+ messages in thread
From: Brian Hanson @ 1996-09-16 0:00 UTC (permalink / raw)
Brian said
"> I considering how this program could be written in Ada (part of
> an attempt to become Ada literate in an Ada hostile environment)
> I a puzzled. The approaches which Ada seems to allow all require
> much more copying of data as I am not allowed to return a reference
> to a slice of an array I can only return the slice itself."
Robert Dewar writes:
Well seeing as you have neither slices nor references to slices in C,
it is hardly possible that this limitation is a significant one! DOn't
assume that because you are writing in Ada, you *have* to use all its
features!
C may not have slices but since a pointer to a char can be considered an
a array, I have the equivalent to a slice by carrying around the length
seperately. I can use memcmp and memcpy to compare and move this block
of data. I would expect that most compilers turn these library calls
into inline code.
I had hopes of packaging up the buffer management code so that the sort
code just dealt with string references. A language that I used for many
years (cybil - a descendent of pascal used as a system language on the
Control Data 180 NOS/ve systems) provided two usefull abstractions - the
heap and sequence. The heap was used to provide a capability vaguely
similar to Ada's storage pools. Sequences provided a way to access a
block of memory in a way that did not violate type checking. References
to objects could be "next"ed from the heap stack like. In, cybil,
writing this sort program would be very straight forward with the full
assistance of the compiler. It is my experience with cybil that
attracts me to Ada and makes writing c painful.
So the question is not whether I can translate the code line for line
from c to Ada generating a c program in Ada but what are the Ada
idioms for dealing with this sort of program?
Brian Hanson
--
-- Brian Hanson
-- brh@cray.com
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Efficient io of arbitrary binary data.
1996-09-14 0:00 ` Larry Kilgallen
1996-09-14 0:00 ` Robert Dewar
1996-09-16 0:00 ` Brian Hanson
@ 1996-09-17 0:00 ` Ted Dennison
2 siblings, 0 replies; 10+ messages in thread
From: Ted Dennison @ 1996-09-17 0:00 UTC (permalink / raw)
Brian Hanson wrote:
>
> Brian said
>
> "> I considering how this program could be written in Ada (part of
> > an attempt to become Ada literate in an Ada hostile environment)
> > I a puzzled. The approaches which Ada seems to allow all require
> > much more copying of data as I am not allowed to return a reference
> > to a slice of an array I can only return the slice itself."
>
> Robert Dewar writes:
> Well seeing as you have neither slices nor references to slices in C,
> it is hardly possible that this limitation is a significant one! DOn't
> assume that because you are writing in Ada, you *have* to use all its
> features!
>
> C may not have slices but since a pointer to a char can be considered an
> a array, I have the equivalent to a slice by carrying around the length
> seperately. I can use memcmp and memcpy to compare and move this block
> of data. I would expect that most compilers turn these library calls
> into inline code.
The Ada equivalent to THAT would be passing around access types, and using
UNCHECKED_CONVERSION to hose the source object types into other access types
(assuming a typical implementation of access types - check your appendix F).
This is, of course, to be avioded.
A slightly better method is to declare a "buffer" type which is an array
of byte (or integer, whatever's convienent)-sized objects. Eventually you
will probably copy data out of your reciept buffer for client use. At THAT
point, do the UNCHECKED_CONVERSION. It won't be any extra code, if you are
already copying anyway.
--
T.E.D.
| Work - mailto:dennison@escmail.orl.mmc.com |
| Home - mailto:dennison@iag.net |
| URL - http://www.iag.net/~dennison |
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~1996-09-17 0:00 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1996-09-13 0:00 Efficient io of arbitrary binary data Brian R. Hanson
1996-09-14 0:00 ` Larry Kilgallen
1996-09-16 0:00 ` Brian Hanson
1996-09-16 0:00 ` Stephen Leake
1996-09-16 0:00 ` Larry Kilgallen
1996-09-16 0:00 ` Robert A Duff
1996-09-14 0:00 ` Larry Kilgallen
1996-09-14 0:00 ` Robert Dewar
1996-09-16 0:00 ` Brian Hanson
1996-09-17 0:00 ` Ted Dennison
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox