Optimization Question

comp.lang.ada
 help / color / mirror / Atom feed

* Optimization Question
@ 2001-01-22  0:05 dvdeug
  2001-01-22  1:57 ` Robert Dewar
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: dvdeug @ 2001-01-22  0:05 UTC (permalink / raw)


I'm trying to write a program similar to the Unix utility strings, as my
copy of strings refuses to run a 17GB file. It seems to work, but it's
about 10x slower than strings, and rough calculations puts running time
on that 17GB file at 10 hours. I'm running the woody Debian version of
GNAT (3.13) on i686-linux-gnu, and I compiled the program with gnatmake
-g -gnatwa -gnatpn -Wall -W -O3 strings.adb. Is there anything I've
missed that speed this program a lot? (It's been run through gcov, so
the numbers up front are execution counts.)


		with Ada.Characters.Handling; use Ada.Characters.Handling;
		with Ada.Characters.Latin_1; use Ada.Characters.Latin_1;
		with Ada.Sequential_IO;
		with Ada.Command_Line; use Ada.Command_Line;
		with Ada.Text_IO;

           2    procedure Strings is

           1       type Byte is mod 2 ** 8;
		   package Byte_IO is new Ada.Sequential_IO (Byte);
		   use Byte_IO;

       56710       function String_Charp (A : Character) return Boolean
is
		   begin
       56710          return Is_ISO_646 (A) and then
		        (Is_Graphic (A) or else A = HT or else A = LF or else A = CR);
		   end String_Charp;
		   pragma Inline (String_Charp);

           1       Binary_File : File_Type;
           1       Letter_Buffer : String (1 .. 4);
		   subtype Buffer_Size is Integer range 0 .. 4;
           1       Letters_Found : Buffer_Size := 0;
           1       Current_Char : Byte;

           1       Seperating_String : constant String := (LF, NUL);

		begin
           1       if Argument_Count /= 1 then
      ######          Set_Exit_Status (1);
      ######          Ada.Text_IO.Put ("One file name only!");
      ######          return;
		   end if;

           1       Open (Binary_File, In_File, Argument(1));
       56711       loop
       56711          Read (Binary_File, Current_Char);

       56710          if String_Charp (Character'Val (Current_Char))
then
       29610             if Letters_Found < 4 then
        8453                Letters_Found := Letters_Found + 1;
        8453                Letter_Buffer (Letters_Found) :=
Character'Val (Current_Char);
        8453                if Letters_Found = 4 then
         916                   Ada.Text_IO.Put (Letter_Buffer);
		            end if;
		         else
       21157                Ada.Text_IO.Put (Character'Val
(Current_Char));
		         end if;
		      else
       27100             if Letters_Found = 4 then
         916                Ada.Text_IO.Put (Seperating_String);
		         end if;
       27100             Letters_Found := 0;
		      end if;
		   end loop;
      ######       Ada.Text_IO.Put ("Invalid end!");
      ######       Set_Exit_Status (2);
		exception
           1       when End_Error =>
           1          Close (Binary_File);
      ######       when others =>
      ######          raise;
		end Strings;



--
David Starner - dstarner98@aasaa.ofe.org


Sent via Deja.com
http://www.deja.com/



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Optimization Question
  2001-01-22  0:05 Optimization Question dvdeug
@ 2001-01-22  1:57 ` Robert Dewar
  2001-01-22  3:22   ` dvdeug
                     ` (2 more replies)
  2001-01-22 22:01 ` Keith Thompson
       [not found] ` <94ld65$1hs$1@nnrp1.deja.com>
  2 siblings, 3 replies; 21+ messages in thread
From: Robert Dewar @ 2001-01-22  1:57 UTC (permalink / raw)


In article <94ftfu$b59$1@nnrp1.deja.com>,
  dvdeug@my-deja.com wrote:

<<questions about speeding up code snipped>>

I am *amazed* that this is only ten times slower when the
I/O is done in such a perfectly gruesome manner (sequential
I/O instantiated on bytes).

It is elementary that you want to read big chunks of a file
at a time. What GNAT does is to read the entire source of
a program in one read statement.


Sent via Deja.com
http://www.deja.com/



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Optimization Question
  2001-01-22  1:57 ` Robert Dewar
@ 2001-01-22  3:22   ` dvdeug
  2001-01-22  4:05     ` Robert Dewar
                       ` (2 more replies)
  2001-01-22 15:24   ` Ted Dennison
  2001-01-22 15:26   ` Ted Dennison
  2 siblings, 3 replies; 21+ messages in thread
From: dvdeug @ 2001-01-22  3:22 UTC (permalink / raw)

In article <94g431$ge3$1@nnrp1.deja.com>,
  Robert Dewar <robert_dewar@my-deja.com> wrote:
> In article <94ftfu$b59$1@nnrp1.deja.com>,
>   dvdeug@my-deja.com wrote:
>
> <<questions about speeding up code snipped>>
>
> I am *amazed* that this is only ten times slower when the
> I/O is done in such a perfectly gruesome manner (sequential
> I/O instantiated on bytes).
>
> It is elementary that you want to read big chunks of a file
> at a time. What GNAT does is to read the entire source of
> a program in one read statement.

Actually, it's only about 3-5 times slower, after doing more careful
measurements. I think GNU strings might be too wrapped up in the details
of object code to be a good general purpose strings program.

Unfortunately, that "perfectly gruesome manner" is the only one that
jumps out at me. I don't want to use Ada.Text_IO, because I want control
of CR's and LF's. I don't see how to use Sequential I/O on larger
chunks, as I may end up with a piece of file that doesn't fill a chunk.
Direct I/O can't be faster than sequential I/O. And reading a 17GB file
into memory doesn't seem like a good idea.

--
David Starner - dstarner98@aasaa.ofe.org

Sent via Deja.com
http://www.deja.com/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Optimization Question
  2001-01-22  3:22   ` dvdeug
@ 2001-01-22  4:05     ` Robert Dewar
  2001-01-22  4:06     ` Robert Dewar
  2001-01-22 19:04     ` M. Kotiaho
  2 siblings, 0 replies; 21+ messages in thread
From: Robert Dewar @ 2001-01-22  4:05 UTC (permalink / raw)

In article <94g920$k64$1@nnrp1.deja.com>,
  dvdeug@my-deja.com wrote:
> Unfortunately, that "perfectly gruesome manner" is the only
> one that jumps out at me. I don't want to use Ada.Text_IO,
> because I want control of CR's and LF's.

Using sequential I/O is horribly inefficient AND non-portable.
You have no way of knowing the format of such a file, and to
assume that individual characters are read as records is an
asumption not warranted by the RM (though it likely may work).

The proper approach is to use stream_io here, and read in giant
chunks of the file.

Sent via Deja.com
http://www.deja.com/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Optimization Question
  2001-01-22  3:22   ` dvdeug
  2001-01-22  4:05     ` Robert Dewar
@ 2001-01-22  4:06     ` Robert Dewar
  2001-01-22 19:04     ` M. Kotiaho
  2 siblings, 0 replies; 21+ messages in thread
From: Robert Dewar @ 2001-01-22  4:06 UTC (permalink / raw)


In article <94g920$k64$1@nnrp1.deja.com>,
  dvdeug@my-deja.com wrote:
> I don't want to use Ada.Text_IO, because I want control
> of CR's and LF's.

You could use Get_Immediate, or you could read from the
stream associated with the text file, but certainly using
stream_io is the best way to go.


Sent via Deja.com
http://www.deja.com/



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Optimization Question
  2001-01-22  1:57 ` Robert Dewar
  2001-01-22  3:22   ` dvdeug
@ 2001-01-22 15:24   ` Ted Dennison
  2001-01-22 16:12     ` Robert Dewar
  2001-01-22 16:15     ` Robert Dewar
  2001-01-22 15:26   ` Ted Dennison
  2 siblings, 2 replies; 21+ messages in thread
From: Ted Dennison @ 2001-01-22 15:24 UTC (permalink / raw)

In article <94g431$ge3$1@nnrp1.deja.com>,
  Robert Dewar <robert_dewar@my-deja.com> wrote:
> In article <94ftfu$b59$1@nnrp1.deja.com>,
>   dvdeug@my-deja.com wrote:
>
> <<questions about speeding up code snipped>>
>
> I am *amazed* that this is only ten times slower when the
> I/O is done in such a perfectly gruesome manner (sequential
> I/O instantiated on bytes).
>
> It is elementary that you want to read big chunks of a file
> at a time. What GNAT does is to read the entire source of

As a point of reference, I'm in the process of writing a little app in
Windows NT to split files for the purpose of distributing large files on
floppies. My first iteration took the dumb approach and used Direct_IO
instantiated on bytes, copying each byte from one disk to the other.
Filling a whole floppy in this manner took me about 4.5 minutes.
However, I noticed that copying the same file to the floppy using
Windows explorer takes about 45 seconds.

For my next trick, I changed to using Ada.Streams.Stream_IO. First I
tried to copy the whole disks' worth of data into a buffer using 'Read,
then copy it to the floppy using 'Write. It still took 4.5 minutes.
That's not suprising, since 'Read on an array is defined as individual
'Reads for each element.

So next I changed the code to instead call Ada.Streams.Read and
Ada.Streams.Write directly for the entire amount of data that is going
on the disk (one operation each per disk). When I compiled and ran this,
a disk's worht of data copied in....(drumroll please)...45 seconds, just
like for Windows.

Of course 45 seconds is a bit long to wait with no feedback, so I
changed it to only write portions of the disk's worth of data, and
output a '*' character to the screen between them. Unsuprisingly, the
amount of chunks used has a serious impact on the amount of time the
operation takes. So one has to strike a balance. I found a realtively
happy medium at 10 copies per floppy. That only increases the copy time
by about 10 seconds.

Anyway, if you want to try to perform large file operations, it looks
like Stream_IO and Ada.Streams.Read and Write are the way to go. The
only other way I could think to do it would be to dynamicly instantiate
Sequential or Direct IO with the data size you want to use for each I/O
operation. That would be a pain.

--
T.E.D.

http://www.telepath.com/~dennison/Ted/TED.html

Sent via Deja.com
http://www.deja.com/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Optimization Question
  2001-01-22  1:57 ` Robert Dewar
  2001-01-22  3:22   ` dvdeug
  2001-01-22 15:24   ` Ted Dennison
@ 2001-01-22 15:26   ` Ted Dennison
  2001-01-22 16:17     ` Robert Dewar
  2 siblings, 1 reply; 21+ messages in thread
From: Ted Dennison @ 2001-01-22 15:26 UTC (permalink / raw)


In article <94g431$ge3$1@nnrp1.deja.com>,
  Robert Dewar <robert_dewar@my-deja.com> wrote:
> In article <94ftfu$b59$1@nnrp1.deja.com>,
>   dvdeug@my-deja.com wrote:
>
> It is elementary that you want to read big chunks of a file
> at a time. What GNAT does is to read the entire source of
> a program in one read statement.

Does Gnat's implementation of the *_IO packages do no buffering at all?

--
T.E.D.

http://www.telepath.com/~dennison/Ted/TED.html


Sent via Deja.com
http://www.deja.com/



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Optimization Question
  2001-01-22 15:24   ` Ted Dennison
@ 2001-01-22 16:12     ` Robert Dewar
  2001-01-22 16:48       ` Ted Dennison
  2001-01-22 16:15     ` Robert Dewar
  1 sibling, 1 reply; 21+ messages in thread
From: Robert Dewar @ 2001-01-22 16:12 UTC (permalink / raw)

In article <94hjbp$ks6$1@nnrp1.deja.com>,
  Ted Dennison <dennison@telepath.com> wrote:

> My first iteration took the dumb approach and used Direct_IO
> instantiated on bytes.

This of course is even *WORSE* than using sequential_io, since
there is extra positioning overhead. I can't imagine why you
would choose Direct_IO for what is obviously a sequential
problem. Just shows that if there is a way to abuse things
someone will take advantage of it :-) :-)

Yes, of course, using Stream_IO in big chunks is the only
sensible implementation approach.

Sent via Deja.com
http://www.deja.com/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Optimization Question
  2001-01-22 15:24   ` Ted Dennison
  2001-01-22 16:12     ` Robert Dewar
@ 2001-01-22 16:15     ` Robert Dewar
  1 sibling, 0 replies; 21+ messages in thread
From: Robert Dewar @ 2001-01-22 16:15 UTC (permalink / raw)

In article <94hjbp$ks6$1@nnrp1.deja.com>,
  Ted Dennison <dennison@telepath.com> wrote:

> My first iteration took the dumb approach and used Direct_IO
> instantiated on bytes.

By the way, this is even LESS portable than using Sequential_IO
since it is even more likely that an implementation will add
some control information to direct_io records.

Sent via Deja.com
http://www.deja.com/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Optimization Question
  2001-01-22 15:26   ` Ted Dennison
@ 2001-01-22 16:17     ` Robert Dewar
  2001-01-22 16:59       ` Ted Dennison
  0 siblings, 1 reply; 21+ messages in thread
From: Robert Dewar @ 2001-01-22 16:17 UTC (permalink / raw)

In article <94hje0$kui$1@nnrp1.deja.com>,
  Ted Dennison <dennison@telepath.com> wrote:

> Does Gnat's implementation of the *_IO packages do no
> buffering at all?

Of course it does buffering (RTFM). That's not the issue!
The issue is that if you do a call for byte by byte, there
is a lot of computational overhead. For example, this means
you have to check for each byte written whether the file is
currently open in the appropriate mode.

If you think "oh well, surely it will be buffered, so I won't
find it that inefficient", you are operating in the dark :-)

Sent via Deja.com
http://www.deja.com/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Optimization Question
  2001-01-22 16:12     ` Robert Dewar
@ 2001-01-22 16:48       ` Ted Dennison
  0 siblings, 0 replies; 21+ messages in thread
From: Ted Dennison @ 2001-01-22 16:48 UTC (permalink / raw)

In article <94hm5q$nmc$1@nnrp1.deja.com>,
  Robert Dewar <robert_dewar@my-deja.com> wrote:
> In article <94hjbp$ks6$1@nnrp1.deja.com>,
>   Ted Dennison <dennison@telepath.com> wrote:
>
> > My first iteration took the dumb approach and used Direct_IO
> > instantiated on bytes.
>
> This of course is even *WORSE* than using sequential_io, since
> there is extra positioning overhead. I can't imagine why you
> would choose Direct_IO for what is obviously a sequential
> problem. Just shows that if there is a way to abuse things
> someone will take advantage of it :-) :-)

:-)

I used Direct_IO because for my purpose (splitting the file into chunks
that will fit on the floppy), I needed to know how big the file actually
is. Direct_IO has a length-of-file routine (Size), while Sequential_IO
does not.

I suppose I could have used Direct_IO just for that purpose and
Sequential_IO for the actual IO operations. But that seemed a little
silly when Direct_IO also had all the operations I needed.

So are you saying that the versions of Read and Write in Direct_IO that
don't have the "Positive_Count" typed parameters still take longer than
the routines with the exact same parameter profile in Sequential_IO? Not
that it really matters that much. We've already ascertained that its
going to be slow either way.

--
T.E.D.

http://www.telepath.com/~dennison/Ted/TED.html

Sent via Deja.com
http://www.deja.com/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Optimization Question
  2001-01-22 16:17     ` Robert Dewar
@ 2001-01-22 16:59       ` Ted Dennison
  0 siblings, 0 replies; 21+ messages in thread
From: Ted Dennison @ 2001-01-22 16:59 UTC (permalink / raw)


In article <94hmdt$nqd$1@nnrp1.deja.com>,
  Robert Dewar <robert_dewar@my-deja.com> wrote:
> The issue is that if you do a call for byte by byte, there
> is a lot of computational overhead. For example, this means
> you have to check for each byte written whether the file is
> currently open in the appropriate mode.

...transforming a small one-time overhead into 1,450,000 small one-time
overheads. Ahhhh....OK. You're right; I should have seen that.

--
T.E.D.

http://www.telepath.com/~dennison/Ted/TED.html


Sent via Deja.com
http://www.deja.com/



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Optimization Question
  2001-01-22  3:22   ` dvdeug
  2001-01-22  4:05     ` Robert Dewar
  2001-01-22  4:06     ` Robert Dewar
@ 2001-01-22 19:04     ` M. Kotiaho
  2001-01-22 20:22       ` dvdeug
  2 siblings, 1 reply; 21+ messages in thread
From: M. Kotiaho @ 2001-01-22 19:04 UTC (permalink / raw)


dvdeug@my-deja.com wrote:

> In article <94g431$ge3$1@nnrp1.deja.com>,
>   Robert Dewar <robert_dewar@my-deja.com> wrote:
> > In article <94ftfu$b59$1@nnrp1.deja.com>,
> >   dvdeug@my-deja.com wrote:
> >
> > <<questions about speeding up code snipped>>
> >
> > I am *amazed* that this is only ten times slower when the
> > I/O is done in such a perfectly gruesome manner (sequential
> > I/O instantiated on bytes).
> >
> > It is elementary that you want to read big chunks of a file
> > at a time. What GNAT does is to read the entire source of
> > a program in one read statement.
>
> Actually, it's only about 3-5 times slower, after doing more careful
> measurements. I think GNU strings might be too wrapped up in the details
> of object code to be a good general purpose strings program.
>
> Unfortunately, that "perfectly gruesome manner" is the only one that
> jumps out at me. I don't want to use Ada.Text_IO, because I want control
> of CR's and LF's. I don't see how to use Sequential I/O on larger
> chunks, as I may end up with a piece of file that doesn't fill a chunk.
> Direct I/O can't be faster than sequential I/O. And reading a 17GB file
> into memory doesn't seem like a good idea.
>
> --
> David Starner - dstarner98@aasaa.ofe.org
>
> Sent via Deja.com
> http://www.deja.com/

As has been pointed out elsewhere, you can use Streams to read "big chunks"
at a time.  As you pointed out, you don't want to read 17 GB into memory.
You can still speed things up by using, say, 10K chunks ... the Read
procedure
has a "Last" parameter that will tell you if you ran out of file before you
filled your
chunk.  You also might want to buffer your output, rather than outputting
the
strings as soon as they are located.

HTH,

M. Kotiaho





^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Optimization Question
  2001-01-22 19:04     ` M. Kotiaho
@ 2001-01-22 20:22       ` dvdeug
  0 siblings, 0 replies; 21+ messages in thread
From: dvdeug @ 2001-01-22 20:22 UTC (permalink / raw)

In article <3A6C8432.665A730C@mail.com>,
  "M. Kotiaho" <mylastname@mail.com> wrote:
> As has been pointed out elsewhere, you can use Streams to read "big
chunks"
> at a time.  As you pointed out, you don't want to read 17 GB into
memory.

That's actually a moot point, as the GNAT runtime on Linux/i386 seems to
have as much problems handling the file as most C programs. It
appears I'm going to have to run it through a pipe . . .

> You also might want to buffer your output, rather than outputting
> the
> strings as soon as they are located.

My biggest problem with this is that it was a nice simple program until
buffering started coming in. I guess if I can get it significatly faster
than GNU strings, I may as well finish up the details and release it . .
.

--
David Starner - dstarner98@aasaa.ofe.org

--
David Starner - dvdeug@my-dejanews.com, dstarner98@aasaa.ofe.org,
dvdeug@hotmail.com
GURPS: http://w

Sent via Deja.com
http://www.deja.com/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Optimization Question
  2001-01-22  0:05 Optimization Question dvdeug
  2001-01-22  1:57 ` Robert Dewar
@ 2001-01-22 22:01 ` Keith Thompson
  2001-01-22 22:52   ` dvdeug
       [not found] ` <94ld65$1hs$1@nnrp1.deja.com>
  2 siblings, 1 reply; 21+ messages in thread
From: Keith Thompson @ 2001-01-22 22:01 UTC (permalink / raw)

dvdeug@my-deja.com writes:
> I'm trying to write a program similar to the Unix utility strings, as my
> copy of strings refuses to run a 17GB file. It seems to work, but it's
> about 10x slower than strings, and rough calculations puts running time
> on that 17GB file at 10 hours.

I was going to suggest using GNU strings, until I realized you were
already using it.  If it has problems with a 17GB file, it's probably
because it uses 32 bit file offsets, limiting the size to 2GB or 4GB.

Depending on your OS and compiler, it may be possible to recompile, or
possibly modify, the GNU strings program to handle huge files
properly.  If this is possible, it's likely to be easier than
re-implementing it from scratch.  (GNU strings is part of the binutils
package; the latest sources are at
<ftp://ftp.gnu.org/gnu/binutils/binutils-2.10.1.tar.gz>.)

Try the gcc and binutils documentation; if that's not illuminating,
try a GNU or Linux newsgroup.

-- 
Keith Thompson (The_Other_Keith) kst@cts.com  <http://www.ghoti.net/~kst>
San Diego Supercomputer Center           <*>  <http://www.sdsc.edu/~kst>
MAKE MONEY FAST!!  DON'T FEED IT!!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Optimization Question
  2001-01-22 22:01 ` Keith Thompson
@ 2001-01-22 22:52   ` dvdeug
  2001-01-23  6:46     ` Keith Thompson
  0 siblings, 1 reply; 21+ messages in thread
From: dvdeug @ 2001-01-22 22:52 UTC (permalink / raw)


In article <yecd7dfjlgf.fsf@king.cts.com>,
  Keith Thompson <kst@cts.com> wrote:
> Depending on your OS and compiler, it may be possible to recompile, or
> possibly modify, the GNU strings program to handle huge files
> properly.  If this is possible, it's likely to be easier than
> re-implementing it from scratch.

Why? If the program which I included in my first post was faster and
worked on 17GB files (GNAT's runtime on my platform seems to have the
same problem strings does), it would do what I need it do. It, in fact,
produces a better output than strings does.

--
David Starner - dstarner98@aasaa.ofe.org


Sent via Deja.com
http://www.deja.com/



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Optimization Question
  2001-01-22 22:52   ` dvdeug
@ 2001-01-23  6:46     ` Keith Thompson
  0 siblings, 0 replies; 21+ messages in thread
From: Keith Thompson @ 2001-01-23  6:46 UTC (permalink / raw)


dvdeug@my-deja.com writes:
> In article <yecd7dfjlgf.fsf@king.cts.com>,
>   Keith Thompson <kst@cts.com> wrote:
> > Depending on your OS and compiler, it may be possible to recompile, or
> > possibly modify, the GNU strings program to handle huge files
> > properly.  If this is possible, it's likely to be easier than
> > re-implementing it from scratch.
> 
> Why? If the program which I included in my first post was faster and
> worked on 17GB files (GNAT's runtime on my platform seems to have the
> same problem strings does), it would do what I need it do. It, in fact,
> produces a better output than strings does.

Ok, append "would have been" to my last sentence.  If you've already
re-implemented it, I suppose it doesn't make much difference.

<OFFTOPIC>
Hmm.  How does the GNU strings program fail?  I suspect that
     strings < hugefile
might work better than
    strings hugefile
If you feed the file to the program's standard input stream, it
won't try to open the whole thing.
</OFFTOPIC>

-- 
Keith Thompson (The_Other_Keith) kst@cts.com  <http://www.ghoti.net/~kst>
San Diego Supercomputer Center           <*>  <http://www.sdsc.edu/~kst>
MAKE MONEY FAST!!  DON'T FEED IT!!



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Optimization Question -- Follow up on using the stream read (and write) procedures directly
       [not found]     ` <3A6F663E.C84B94D8@acm.org>
@ 2001-01-26 16:30       ` Jeff Creem
  2001-01-26 21:46         ` Florian Weimer
  0 siblings, 1 reply; 21+ messages in thread
From: Jeff Creem @ 2001-01-26 16:30 UTC (permalink / raw)


A few days ago there was a discussion about using the direct read procedures
which are
part of the stream packages rather than the 'read/'write attributes to read
in large chunks of
data.

This got me thinking about what the best way might be to this in a general
sense so I thought
I'd put this approach out as a strawman before really making a package out
of it with a little more
robust error checking on the validity of the types the user uses.


In any case, at first glance this approach (when it can be used) appears to
be up to 30 times faster
than 'read (for all of the reasons discussed a few days ago) on a 350 Pent
II Win 95 machine.


My least favorite part of the approach is the unchecked_conversion of access
types.


So in any case, here is a partial package and test program.....Any comments?



with Ada.Streams;

generic

  type Element_Type is private;
  type Array_Index_Type is range <>;
  type Block_Array_Type is array (Array_Index_Type) of Element_Type;


package Stream_Utilities is

  --
  -- This procedure attempts to read Item'length elements from
  -- the stream. If fewer than Item'length elements are read, then
  -- Last will return < Item'last. This procedure assumes that
  -- the elements being read are present in the stream in a format
  -- identical to the layout of the elements in the Block_Array_Type.
  --
  --
  procedure Block_Read
    (Stream : access Ada.Streams.Root_Stream_Type'Class;
     Item   : access Block_Array_Type;
     Last   : out Array_Index_Type);

end Stream_Utilities;




with Ada.Streams;
with Unchecked_Conversion;
use Ada.Streams;

package body Stream_Utilities is


   --
   -- There are essentially two approaches that immediately
   -- come to mind to read a block of data efficiently.
   -- The first is to the the standard stream read utilities
   -- to read the data into an array of Stream_Elements followed
   -- by an unchecked_conversion to our actual output data type.
   --
   -- The second, potentially more evil approach is to used
   -- aliased types to essentially overlay what appears to be
   -- a stream element array on top of the memory where our
   -- actual data is. The first approach makes me feel better
   -- than the second, however, even though we know that not
   -- every instance of an unchecked_conversion really always
   -- causes a true copy, I found that in this instance
   -- at least with GNAT 3.12 under solaris we did end up
   -- copying the data
   --
   procedure Block_Read (
         Stream : access Ada.Streams.Root_Stream_Type'Class;
         Item   : access Block_Array_Type;
         Last   :    out Array_Index_Type                    ) is

      --
      -- Create a named access type to the block array type
      -- so we can create an unchecked conversion.
      --
      type Access_Block_Data is access all Block_Array_Type;

      --
      -- Create a constrained subtype of stream elements to
      -- "hold" the data returned from the stream
      --
      subtype Proper_Sized_Stream_Element_Type is
         Stream_Element_Array(1 .. Item.all'Size / Stream_Element'Size);

      --
      -- And finally, create a named access type to this
      -- constrained array.
      --
      type Access_Proper_Sized_Stream_Element_Type is access
Proper_Sized_Stream_Element_Type;


      --
      -- Create a conversion between the block data and the stream data
      --
      function To_Access is
      new Unchecked_Conversion
         (
         Source => Access_Block_Data,
         Target => Access_Proper_Sized_Stream_Element_Type);


      --
      -- Declare an access variable  to the stream data, note that
      -- at this very point we are now creating an overlayed version
      -- of the now aliased input parameter.
      --
      Proper_Sized_Stream_Data : Access_Proper_Sized_Stream_Element_Type :=
        To_Access (Access_Block_Data (Item));

      Local_Last : Stream_Element_Offset;

   begin

      --
      -- Read the data from the stream into our access to
      -- stream element array. Note at this moment we are now also
      -- filling in the Item parameter as well.
      --
      Read(Stream => Stream.all,
         Item   => Proper_Sized_Stream_Data.all,
         Last   => Local_Last);


      --
      -- Calulate the "Last" parameter we are sending back to the
      -- user so they can see it.
      --
      Last := (Item'First - 1) + Array_Index_Type'Base(Local_Last /  (
            Item'Size /
            Stream_Element'Size));

   end Block_Read;


end Stream_Utilities;





with Stream_Utilities;
with Ada.Streams.Stream_Io;
with Text_Io;
with Ada.Calendar;
use type Ada.Calendar.Time;


procedure Si is

   subtype Range_Type is Integer range 1 .. 8192;

   type My_Stream_Data_Type is mod 2**32;
   for My_Stream_Data_Type'Size use 32;



   type My_Array_Type is array (Range_Type) of My_Stream_Data_Type;
   pragma Pack (My_Array_Type);
   for My_Array_Type'Size use 8192*32;


   package Sii is new Stream_Utilities(My_Stream_Data_Type, Range_Type,
      My_Array_Type);


   My_Data : aliased My_Array_Type;
   Last    : Integer;
   File    : Ada.Streams.Stream_Io.File_Type;

   Match : Boolean;

   Repeat_Count : constant := 500;

   Start : Ada.Calendar.Time;
   Stop  : Ada.Calendar.Time;

begin

   --
   -- First, create some stream data
   --
   for I in My_Data'range loop
      My_Data(I) := My_Stream_Data_Type(I);
   end loop;

   Ada.Streams.Stream_Io.Create
      (
      File => File,
      Name => "test_data.bin",
      Mode => Ada.Streams.Stream_Io.Out_File);

   for Repeat in 1 .. Repeat_Count loop

      for I in My_Data'range loop
         My_Data(I) := My_Data(I) + 1;
      end loop;


      My_Array_Type'Write(Ada.Streams.Stream_Io.Stream(File), My_Data);


   end loop;

   Ada.Streams.Stream_Io.Close(File);

   My_Data := (others => 0);


   --
   -- Now, just double check that our block read appears to work
   -- at all.
   --
   Ada.Streams.Stream_Io.Open(
      File => File,
      Name => "test_data.bin",
      Mode => Ada.Streams.Stream_Io.In_File);
   Repeat_Loop :
      for Repeat in 1 .. Repeat_Count loop


      Sii.Block_Read(Ada.Streams.Stream_Io.Stream(File), My_Data'access,
         Last);


      for I in My_Data'range loop
         if My_Data(I) /= My_Stream_Data_Type(I) + My_Stream_Data_Type(
               Repeat) then
            Match := False;
            exit Repeat_Loop;
         end if;
      end loop;
   end loop Repeat_Loop;



   Ada.Streams.Stream_Io.Close(File);

   Text_Io.Put("Last was :");
   Text_Io.Put_Line(Integer'Image(Last));

   Match := True;
   if Match then
      Text_Io.Put_Line("Match");
   else
      Text_Io.Put_Line("No Match");
   end if;


   --
   -- Now, a little benchmarking.
   --
   Ada.Streams.Stream_Io.Open(
      File => File,
      Name => "test_data.bin",
      Mode => Ada.Streams.Stream_Io.In_File);

   Start := Ada.Calendar.Clock;

   for Repeat in 1 .. Repeat_Count loop


      Sii.Block_Read(Ada.Streams.Stream_Io.Stream(File), My_Data'access,
         Last);

   end loop;

   Stop := Ada.Calendar.Clock;

   Text_Io.Put_Line("Block read in " & Duration'Image(Stop - Start));


   Ada.Streams.Stream_Io.Close(File);


   Ada.Streams.Stream_Io.Open(
      File => File,
      Name => "test_data.bin",
      Mode => Ada.Streams.Stream_Io.In_File);

   Start := Ada.Calendar.Clock;

   for Repeat in 1 .. Repeat_Count loop

      My_Array_Type'Read(Ada.Streams.Stream_Io.Stream(File), My_Data);

   end loop;
   Stop := Ada.Calendar.Clock;

   Text_Io.Put_Line("Normal Stream Read  in " & Duration'Image(Stop -
         Start));



   Ada.Streams.Stream_Io.Close(File);


end Si;






^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Optimization Question -- Follow up on using the stream read (and write) procedures directly
  2001-01-26 16:30       ` Optimization Question -- Follow up on using the stream read (and write) procedures directly Jeff Creem
@ 2001-01-26 21:46         ` Florian Weimer
  2001-01-27 19:14           ` Jeff Creem
  0 siblings, 1 reply; 21+ messages in thread
From: Florian Weimer @ 2001-01-26 21:46 UTC (permalink / raw)


"Jeff Creem" <jeff@thecreems.com> writes:

> My least favorite part of the approach is the unchecked_conversion
> of access types.

You shouldn't use Unchecked_Conversion in this case because the
in-memory and stream representations of an object are usually
different.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Optimization Question -- Follow up on using the stream read (and write) procedures directly
  2001-01-26 21:46         ` Florian Weimer
@ 2001-01-27 19:14           ` Jeff Creem
  2001-01-28  0:26             ` Robert Dewar
  0 siblings, 1 reply; 21+ messages in thread
From: Jeff Creem @ 2001-01-27 19:14 UTC (permalink / raw)

"Florian Weimer" <fw@deneb.enyo.de> wrote in message
news:87elxq0ywd.fsf@deneb.enyo.de...
> "Jeff Creem" <jeff@thecreems.com> writes:
>
> > My least favorite part of the approach is the unchecked_conversion
> > of access types.
>
> You shouldn't use Unchecked_Conversion in this case because the
> in-memory and stream representations of an object are usually
> different.

I basically agree but I was probably not clear about what I was trying to
accomplish. This set of
routines is for when  you want the stream representation of an object to
match the in memory
representation and further you want to avoid the huge overhead of the
individual 'read operations
that would happen from the array_type'read attribute.

Even the alternate approach of using unchecked_conversion on the data (v.s.
the access types) would
suffer from the different memory v.s. default stream representation concern
(valid concern but not exactly appropriate for the intended use).

Note that this package could clearly suffer in cases where element_type'size
is < a stream_element'size (and a
few other size cases as well)

But this could be caught easily with a little more error checking.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Optimization Question -- Follow up on using the stream read (and write) procedures directly
  2001-01-27 19:14           ` Jeff Creem
@ 2001-01-28  0:26             ` Robert Dewar
  0 siblings, 0 replies; 21+ messages in thread
From: Robert Dewar @ 2001-01-28  0:26 UTC (permalink / raw)

In article <t767e86uif6d5e@corp.supernews.com>,
  "Jeff Creem" <jeff@thecreems.com> wrote:
> I basically agree but I was probably not clear about what I
> was trying to accomplish. This set of
> routines is for when  you want the stream representation of
> an object to
> match the in memory
> representation

Perfectly reasonable, and this is EXACTLY the sort of low
level target dependent, type breaking operation that
Unchecked_Conversion is there for.

In some ways I think one mark of an experienced Ada programmer
is that they feel VERY uncomfortable when UC is used in an
inappropropriate manner, and VERY comfortable when it is used
in an appropriate manner.

The trouble is that a lot of people have trouble distinguishing
these cases, so they play it case and feel uncomfortable all
the time. The danger of this attitude, sometimes enshrined in
foolish coding standards that forbid the use of UC completely,
is that you miss the fact that often the *cleanest* approach
is to use UC :-)

Sent via Deja.com
http://www.deja.com/

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2001-01-28  0:26 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-01-22  0:05 Optimization Question dvdeug
2001-01-22  1:57 ` Robert Dewar
2001-01-22  3:22   ` dvdeug
2001-01-22  4:05     ` Robert Dewar
2001-01-22  4:06     ` Robert Dewar
2001-01-22 19:04     ` M. Kotiaho
2001-01-22 20:22       ` dvdeug
2001-01-22 15:24   ` Ted Dennison
2001-01-22 16:12     ` Robert Dewar
2001-01-22 16:48       ` Ted Dennison
2001-01-22 16:15     ` Robert Dewar
2001-01-22 15:26   ` Ted Dennison
2001-01-22 16:17     ` Robert Dewar
2001-01-22 16:59       ` Ted Dennison
2001-01-22 22:01 ` Keith Thompson
2001-01-22 22:52   ` dvdeug
2001-01-23  6:46     ` Keith Thompson
     [not found] ` <94ld65$1hs$1@nnrp1.deja.com>
     [not found]   ` <864ryodb1q.fsf@acm.org>
     [not found]     ` <3A6F663E.C84B94D8@acm.org>
2001-01-26 16:30       ` Optimization Question -- Follow up on using the stream read (and write) procedures directly Jeff Creem
2001-01-26 21:46         ` Florian Weimer
2001-01-27 19:14           ` Jeff Creem
2001-01-28  0:26             ` Robert Dewar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox