* Optimization Question @ 2001-01-22 0:05 dvdeug 2001-01-22 1:57 ` Robert Dewar ` (2 more replies) 0 siblings, 3 replies; 21+ messages in thread From: dvdeug @ 2001-01-22 0:05 UTC (permalink / raw) I'm trying to write a program similar to the Unix utility strings, as my copy of strings refuses to run a 17GB file. It seems to work, but it's about 10x slower than strings, and rough calculations puts running time on that 17GB file at 10 hours. I'm running the woody Debian version of GNAT (3.13) on i686-linux-gnu, and I compiled the program with gnatmake -g -gnatwa -gnatpn -Wall -W -O3 strings.adb. Is there anything I've missed that speed this program a lot? (It's been run through gcov, so the numbers up front are execution counts.) with Ada.Characters.Handling; use Ada.Characters.Handling; with Ada.Characters.Latin_1; use Ada.Characters.Latin_1; with Ada.Sequential_IO; with Ada.Command_Line; use Ada.Command_Line; with Ada.Text_IO; 2 procedure Strings is 1 type Byte is mod 2 ** 8; package Byte_IO is new Ada.Sequential_IO (Byte); use Byte_IO; 56710 function String_Charp (A : Character) return Boolean is begin 56710 return Is_ISO_646 (A) and then (Is_Graphic (A) or else A = HT or else A = LF or else A = CR); end String_Charp; pragma Inline (String_Charp); 1 Binary_File : File_Type; 1 Letter_Buffer : String (1 .. 4); subtype Buffer_Size is Integer range 0 .. 4; 1 Letters_Found : Buffer_Size := 0; 1 Current_Char : Byte; 1 Seperating_String : constant String := (LF, NUL); begin 1 if Argument_Count /= 1 then ###### Set_Exit_Status (1); ###### Ada.Text_IO.Put ("One file name only!"); ###### return; end if; 1 Open (Binary_File, In_File, Argument(1)); 56711 loop 56711 Read (Binary_File, Current_Char); 56710 if String_Charp (Character'Val (Current_Char)) then 29610 if Letters_Found < 4 then 8453 Letters_Found := Letters_Found + 1; 8453 Letter_Buffer (Letters_Found) := Character'Val (Current_Char); 8453 if Letters_Found = 4 then 916 Ada.Text_IO.Put (Letter_Buffer); end if; else 21157 Ada.Text_IO.Put (Character'Val (Current_Char)); end if; else 27100 if Letters_Found = 4 then 916 Ada.Text_IO.Put (Seperating_String); end if; 27100 Letters_Found := 0; end if; end loop; ###### Ada.Text_IO.Put ("Invalid end!"); ###### Set_Exit_Status (2); exception 1 when End_Error => 1 Close (Binary_File); ###### when others => ###### raise; end Strings; -- David Starner - dstarner98@aasaa.ofe.org Sent via Deja.com http://www.deja.com/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Optimization Question 2001-01-22 0:05 Optimization Question dvdeug @ 2001-01-22 1:57 ` Robert Dewar 2001-01-22 3:22 ` dvdeug ` (2 more replies) 2001-01-22 22:01 ` Keith Thompson [not found] ` <94ld65$1hs$1@nnrp1.deja.com> 2 siblings, 3 replies; 21+ messages in thread From: Robert Dewar @ 2001-01-22 1:57 UTC (permalink / raw) In article <94ftfu$b59$1@nnrp1.deja.com>, dvdeug@my-deja.com wrote: <<questions about speeding up code snipped>> I am *amazed* that this is only ten times slower when the I/O is done in such a perfectly gruesome manner (sequential I/O instantiated on bytes). It is elementary that you want to read big chunks of a file at a time. What GNAT does is to read the entire source of a program in one read statement. Sent via Deja.com http://www.deja.com/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Optimization Question 2001-01-22 1:57 ` Robert Dewar @ 2001-01-22 3:22 ` dvdeug 2001-01-22 4:05 ` Robert Dewar ` (2 more replies) 2001-01-22 15:24 ` Ted Dennison 2001-01-22 15:26 ` Ted Dennison 2 siblings, 3 replies; 21+ messages in thread From: dvdeug @ 2001-01-22 3:22 UTC (permalink / raw) In article <94g431$ge3$1@nnrp1.deja.com>, Robert Dewar <robert_dewar@my-deja.com> wrote: > In article <94ftfu$b59$1@nnrp1.deja.com>, > dvdeug@my-deja.com wrote: > > <<questions about speeding up code snipped>> > > I am *amazed* that this is only ten times slower when the > I/O is done in such a perfectly gruesome manner (sequential > I/O instantiated on bytes). > > It is elementary that you want to read big chunks of a file > at a time. What GNAT does is to read the entire source of > a program in one read statement. Actually, it's only about 3-5 times slower, after doing more careful measurements. I think GNU strings might be too wrapped up in the details of object code to be a good general purpose strings program. Unfortunately, that "perfectly gruesome manner" is the only one that jumps out at me. I don't want to use Ada.Text_IO, because I want control of CR's and LF's. I don't see how to use Sequential I/O on larger chunks, as I may end up with a piece of file that doesn't fill a chunk. Direct I/O can't be faster than sequential I/O. And reading a 17GB file into memory doesn't seem like a good idea. -- David Starner - dstarner98@aasaa.ofe.org Sent via Deja.com http://www.deja.com/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Optimization Question 2001-01-22 3:22 ` dvdeug @ 2001-01-22 4:05 ` Robert Dewar 2001-01-22 4:06 ` Robert Dewar 2001-01-22 19:04 ` M. Kotiaho 2 siblings, 0 replies; 21+ messages in thread From: Robert Dewar @ 2001-01-22 4:05 UTC (permalink / raw) In article <94g920$k64$1@nnrp1.deja.com>, dvdeug@my-deja.com wrote: > Unfortunately, that "perfectly gruesome manner" is the only > one that jumps out at me. I don't want to use Ada.Text_IO, > because I want control of CR's and LF's. Using sequential I/O is horribly inefficient AND non-portable. You have no way of knowing the format of such a file, and to assume that individual characters are read as records is an asumption not warranted by the RM (though it likely may work). The proper approach is to use stream_io here, and read in giant chunks of the file. Sent via Deja.com http://www.deja.com/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Optimization Question 2001-01-22 3:22 ` dvdeug 2001-01-22 4:05 ` Robert Dewar @ 2001-01-22 4:06 ` Robert Dewar 2001-01-22 19:04 ` M. Kotiaho 2 siblings, 0 replies; 21+ messages in thread From: Robert Dewar @ 2001-01-22 4:06 UTC (permalink / raw) In article <94g920$k64$1@nnrp1.deja.com>, dvdeug@my-deja.com wrote: > I don't want to use Ada.Text_IO, because I want control > of CR's and LF's. You could use Get_Immediate, or you could read from the stream associated with the text file, but certainly using stream_io is the best way to go. Sent via Deja.com http://www.deja.com/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Optimization Question 2001-01-22 3:22 ` dvdeug 2001-01-22 4:05 ` Robert Dewar 2001-01-22 4:06 ` Robert Dewar @ 2001-01-22 19:04 ` M. Kotiaho 2001-01-22 20:22 ` dvdeug 2 siblings, 1 reply; 21+ messages in thread From: M. Kotiaho @ 2001-01-22 19:04 UTC (permalink / raw) dvdeug@my-deja.com wrote: > In article <94g431$ge3$1@nnrp1.deja.com>, > Robert Dewar <robert_dewar@my-deja.com> wrote: > > In article <94ftfu$b59$1@nnrp1.deja.com>, > > dvdeug@my-deja.com wrote: > > > > <<questions about speeding up code snipped>> > > > > I am *amazed* that this is only ten times slower when the > > I/O is done in such a perfectly gruesome manner (sequential > > I/O instantiated on bytes). > > > > It is elementary that you want to read big chunks of a file > > at a time. What GNAT does is to read the entire source of > > a program in one read statement. > > Actually, it's only about 3-5 times slower, after doing more careful > measurements. I think GNU strings might be too wrapped up in the details > of object code to be a good general purpose strings program. > > Unfortunately, that "perfectly gruesome manner" is the only one that > jumps out at me. I don't want to use Ada.Text_IO, because I want control > of CR's and LF's. I don't see how to use Sequential I/O on larger > chunks, as I may end up with a piece of file that doesn't fill a chunk. > Direct I/O can't be faster than sequential I/O. And reading a 17GB file > into memory doesn't seem like a good idea. > > -- > David Starner - dstarner98@aasaa.ofe.org > > Sent via Deja.com > http://www.deja.com/ As has been pointed out elsewhere, you can use Streams to read "big chunks" at a time. As you pointed out, you don't want to read 17 GB into memory. You can still speed things up by using, say, 10K chunks ... the Read procedure has a "Last" parameter that will tell you if you ran out of file before you filled your chunk. You also might want to buffer your output, rather than outputting the strings as soon as they are located. HTH, M. Kotiaho ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Optimization Question 2001-01-22 19:04 ` M. Kotiaho @ 2001-01-22 20:22 ` dvdeug 0 siblings, 0 replies; 21+ messages in thread From: dvdeug @ 2001-01-22 20:22 UTC (permalink / raw) In article <3A6C8432.665A730C@mail.com>, "M. Kotiaho" <mylastname@mail.com> wrote: > As has been pointed out elsewhere, you can use Streams to read "big chunks" > at a time. As you pointed out, you don't want to read 17 GB into memory. That's actually a moot point, as the GNAT runtime on Linux/i386 seems to have as much problems handling the file as most C programs. It appears I'm going to have to run it through a pipe . . . > You also might want to buffer your output, rather than outputting > the > strings as soon as they are located. My biggest problem with this is that it was a nice simple program until buffering started coming in. I guess if I can get it significatly faster than GNU strings, I may as well finish up the details and release it . . . -- David Starner - dstarner98@aasaa.ofe.org -- David Starner - dvdeug@my-dejanews.com, dstarner98@aasaa.ofe.org, dvdeug@hotmail.com GURPS: http://w Sent via Deja.com http://www.deja.com/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Optimization Question 2001-01-22 1:57 ` Robert Dewar 2001-01-22 3:22 ` dvdeug @ 2001-01-22 15:24 ` Ted Dennison 2001-01-22 16:12 ` Robert Dewar 2001-01-22 16:15 ` Robert Dewar 2001-01-22 15:26 ` Ted Dennison 2 siblings, 2 replies; 21+ messages in thread From: Ted Dennison @ 2001-01-22 15:24 UTC (permalink / raw) In article <94g431$ge3$1@nnrp1.deja.com>, Robert Dewar <robert_dewar@my-deja.com> wrote: > In article <94ftfu$b59$1@nnrp1.deja.com>, > dvdeug@my-deja.com wrote: > > <<questions about speeding up code snipped>> > > I am *amazed* that this is only ten times slower when the > I/O is done in such a perfectly gruesome manner (sequential > I/O instantiated on bytes). > > It is elementary that you want to read big chunks of a file > at a time. What GNAT does is to read the entire source of As a point of reference, I'm in the process of writing a little app in Windows NT to split files for the purpose of distributing large files on floppies. My first iteration took the dumb approach and used Direct_IO instantiated on bytes, copying each byte from one disk to the other. Filling a whole floppy in this manner took me about 4.5 minutes. However, I noticed that copying the same file to the floppy using Windows explorer takes about 45 seconds. For my next trick, I changed to using Ada.Streams.Stream_IO. First I tried to copy the whole disks' worth of data into a buffer using 'Read, then copy it to the floppy using 'Write. It still took 4.5 minutes. That's not suprising, since 'Read on an array is defined as individual 'Reads for each element. So next I changed the code to instead call Ada.Streams.Read and Ada.Streams.Write directly for the entire amount of data that is going on the disk (one operation each per disk). When I compiled and ran this, a disk's worht of data copied in....(drumroll please)...45 seconds, just like for Windows. Of course 45 seconds is a bit long to wait with no feedback, so I changed it to only write portions of the disk's worth of data, and output a '*' character to the screen between them. Unsuprisingly, the amount of chunks used has a serious impact on the amount of time the operation takes. So one has to strike a balance. I found a realtively happy medium at 10 copies per floppy. That only increases the copy time by about 10 seconds. Anyway, if you want to try to perform large file operations, it looks like Stream_IO and Ada.Streams.Read and Write are the way to go. The only other way I could think to do it would be to dynamicly instantiate Sequential or Direct IO with the data size you want to use for each I/O operation. That would be a pain. -- T.E.D. http://www.telepath.com/~dennison/Ted/TED.html Sent via Deja.com http://www.deja.com/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Optimization Question 2001-01-22 15:24 ` Ted Dennison @ 2001-01-22 16:12 ` Robert Dewar 2001-01-22 16:48 ` Ted Dennison 2001-01-22 16:15 ` Robert Dewar 1 sibling, 1 reply; 21+ messages in thread From: Robert Dewar @ 2001-01-22 16:12 UTC (permalink / raw) In article <94hjbp$ks6$1@nnrp1.deja.com>, Ted Dennison <dennison@telepath.com> wrote: > My first iteration took the dumb approach and used Direct_IO > instantiated on bytes. This of course is even *WORSE* than using sequential_io, since there is extra positioning overhead. I can't imagine why you would choose Direct_IO for what is obviously a sequential problem. Just shows that if there is a way to abuse things someone will take advantage of it :-) :-) Yes, of course, using Stream_IO in big chunks is the only sensible implementation approach. Sent via Deja.com http://www.deja.com/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Optimization Question 2001-01-22 16:12 ` Robert Dewar @ 2001-01-22 16:48 ` Ted Dennison 0 siblings, 0 replies; 21+ messages in thread From: Ted Dennison @ 2001-01-22 16:48 UTC (permalink / raw) In article <94hm5q$nmc$1@nnrp1.deja.com>, Robert Dewar <robert_dewar@my-deja.com> wrote: > In article <94hjbp$ks6$1@nnrp1.deja.com>, > Ted Dennison <dennison@telepath.com> wrote: > > > My first iteration took the dumb approach and used Direct_IO > > instantiated on bytes. > > This of course is even *WORSE* than using sequential_io, since > there is extra positioning overhead. I can't imagine why you > would choose Direct_IO for what is obviously a sequential > problem. Just shows that if there is a way to abuse things > someone will take advantage of it :-) :-) :-) I used Direct_IO because for my purpose (splitting the file into chunks that will fit on the floppy), I needed to know how big the file actually is. Direct_IO has a length-of-file routine (Size), while Sequential_IO does not. I suppose I could have used Direct_IO just for that purpose and Sequential_IO for the actual IO operations. But that seemed a little silly when Direct_IO also had all the operations I needed. So are you saying that the versions of Read and Write in Direct_IO that don't have the "Positive_Count" typed parameters still take longer than the routines with the exact same parameter profile in Sequential_IO? Not that it really matters that much. We've already ascertained that its going to be slow either way. -- T.E.D. http://www.telepath.com/~dennison/Ted/TED.html Sent via Deja.com http://www.deja.com/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Optimization Question 2001-01-22 15:24 ` Ted Dennison 2001-01-22 16:12 ` Robert Dewar @ 2001-01-22 16:15 ` Robert Dewar 1 sibling, 0 replies; 21+ messages in thread From: Robert Dewar @ 2001-01-22 16:15 UTC (permalink / raw) In article <94hjbp$ks6$1@nnrp1.deja.com>, Ted Dennison <dennison@telepath.com> wrote: > My first iteration took the dumb approach and used Direct_IO > instantiated on bytes. By the way, this is even LESS portable than using Sequential_IO since it is even more likely that an implementation will add some control information to direct_io records. Sent via Deja.com http://www.deja.com/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Optimization Question 2001-01-22 1:57 ` Robert Dewar 2001-01-22 3:22 ` dvdeug 2001-01-22 15:24 ` Ted Dennison @ 2001-01-22 15:26 ` Ted Dennison 2001-01-22 16:17 ` Robert Dewar 2 siblings, 1 reply; 21+ messages in thread From: Ted Dennison @ 2001-01-22 15:26 UTC (permalink / raw) In article <94g431$ge3$1@nnrp1.deja.com>, Robert Dewar <robert_dewar@my-deja.com> wrote: > In article <94ftfu$b59$1@nnrp1.deja.com>, > dvdeug@my-deja.com wrote: > > It is elementary that you want to read big chunks of a file > at a time. What GNAT does is to read the entire source of > a program in one read statement. Does Gnat's implementation of the *_IO packages do no buffering at all? -- T.E.D. http://www.telepath.com/~dennison/Ted/TED.html Sent via Deja.com http://www.deja.com/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Optimization Question 2001-01-22 15:26 ` Ted Dennison @ 2001-01-22 16:17 ` Robert Dewar 2001-01-22 16:59 ` Ted Dennison 0 siblings, 1 reply; 21+ messages in thread From: Robert Dewar @ 2001-01-22 16:17 UTC (permalink / raw) In article <94hje0$kui$1@nnrp1.deja.com>, Ted Dennison <dennison@telepath.com> wrote: > Does Gnat's implementation of the *_IO packages do no > buffering at all? Of course it does buffering (RTFM). That's not the issue! The issue is that if you do a call for byte by byte, there is a lot of computational overhead. For example, this means you have to check for each byte written whether the file is currently open in the appropriate mode. If you think "oh well, surely it will be buffered, so I won't find it that inefficient", you are operating in the dark :-) Sent via Deja.com http://www.deja.com/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Optimization Question 2001-01-22 16:17 ` Robert Dewar @ 2001-01-22 16:59 ` Ted Dennison 0 siblings, 0 replies; 21+ messages in thread From: Ted Dennison @ 2001-01-22 16:59 UTC (permalink / raw) In article <94hmdt$nqd$1@nnrp1.deja.com>, Robert Dewar <robert_dewar@my-deja.com> wrote: > The issue is that if you do a call for byte by byte, there > is a lot of computational overhead. For example, this means > you have to check for each byte written whether the file is > currently open in the appropriate mode. ...transforming a small one-time overhead into 1,450,000 small one-time overheads. Ahhhh....OK. You're right; I should have seen that. -- T.E.D. http://www.telepath.com/~dennison/Ted/TED.html Sent via Deja.com http://www.deja.com/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Optimization Question 2001-01-22 0:05 Optimization Question dvdeug 2001-01-22 1:57 ` Robert Dewar @ 2001-01-22 22:01 ` Keith Thompson 2001-01-22 22:52 ` dvdeug [not found] ` <94ld65$1hs$1@nnrp1.deja.com> 2 siblings, 1 reply; 21+ messages in thread From: Keith Thompson @ 2001-01-22 22:01 UTC (permalink / raw) dvdeug@my-deja.com writes: > I'm trying to write a program similar to the Unix utility strings, as my > copy of strings refuses to run a 17GB file. It seems to work, but it's > about 10x slower than strings, and rough calculations puts running time > on that 17GB file at 10 hours. I was going to suggest using GNU strings, until I realized you were already using it. If it has problems with a 17GB file, it's probably because it uses 32 bit file offsets, limiting the size to 2GB or 4GB. Depending on your OS and compiler, it may be possible to recompile, or possibly modify, the GNU strings program to handle huge files properly. If this is possible, it's likely to be easier than re-implementing it from scratch. (GNU strings is part of the binutils package; the latest sources are at <ftp://ftp.gnu.org/gnu/binutils/binutils-2.10.1.tar.gz>.) Try the gcc and binutils documentation; if that's not illuminating, try a GNU or Linux newsgroup. -- Keith Thompson (The_Other_Keith) kst@cts.com <http://www.ghoti.net/~kst> San Diego Supercomputer Center <*> <http://www.sdsc.edu/~kst> MAKE MONEY FAST!! DON'T FEED IT!! ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Optimization Question 2001-01-22 22:01 ` Keith Thompson @ 2001-01-22 22:52 ` dvdeug 2001-01-23 6:46 ` Keith Thompson 0 siblings, 1 reply; 21+ messages in thread From: dvdeug @ 2001-01-22 22:52 UTC (permalink / raw) In article <yecd7dfjlgf.fsf@king.cts.com>, Keith Thompson <kst@cts.com> wrote: > Depending on your OS and compiler, it may be possible to recompile, or > possibly modify, the GNU strings program to handle huge files > properly. If this is possible, it's likely to be easier than > re-implementing it from scratch. Why? If the program which I included in my first post was faster and worked on 17GB files (GNAT's runtime on my platform seems to have the same problem strings does), it would do what I need it do. It, in fact, produces a better output than strings does. -- David Starner - dstarner98@aasaa.ofe.org Sent via Deja.com http://www.deja.com/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Optimization Question 2001-01-22 22:52 ` dvdeug @ 2001-01-23 6:46 ` Keith Thompson 0 siblings, 0 replies; 21+ messages in thread From: Keith Thompson @ 2001-01-23 6:46 UTC (permalink / raw) dvdeug@my-deja.com writes: > In article <yecd7dfjlgf.fsf@king.cts.com>, > Keith Thompson <kst@cts.com> wrote: > > Depending on your OS and compiler, it may be possible to recompile, or > > possibly modify, the GNU strings program to handle huge files > > properly. If this is possible, it's likely to be easier than > > re-implementing it from scratch. > > Why? If the program which I included in my first post was faster and > worked on 17GB files (GNAT's runtime on my platform seems to have the > same problem strings does), it would do what I need it do. It, in fact, > produces a better output than strings does. Ok, append "would have been" to my last sentence. If you've already re-implemented it, I suppose it doesn't make much difference. <OFFTOPIC> Hmm. How does the GNU strings program fail? I suspect that strings < hugefile might work better than strings hugefile If you feed the file to the program's standard input stream, it won't try to open the whole thing. </OFFTOPIC> -- Keith Thompson (The_Other_Keith) kst@cts.com <http://www.ghoti.net/~kst> San Diego Supercomputer Center <*> <http://www.sdsc.edu/~kst> MAKE MONEY FAST!! DON'T FEED IT!! ^ permalink raw reply [flat|nested] 21+ messages in thread
[parent not found: <94ld65$1hs$1@nnrp1.deja.com>]
[parent not found: <864ryodb1q.fsf@acm.org>]
[parent not found: <3A6F663E.C84B94D8@acm.org>]
* Re: Optimization Question -- Follow up on using the stream read (and write) procedures directly [not found] ` <3A6F663E.C84B94D8@acm.org> @ 2001-01-26 16:30 ` Jeff Creem 2001-01-26 21:46 ` Florian Weimer 0 siblings, 1 reply; 21+ messages in thread From: Jeff Creem @ 2001-01-26 16:30 UTC (permalink / raw) A few days ago there was a discussion about using the direct read procedures which are part of the stream packages rather than the 'read/'write attributes to read in large chunks of data. This got me thinking about what the best way might be to this in a general sense so I thought I'd put this approach out as a strawman before really making a package out of it with a little more robust error checking on the validity of the types the user uses. In any case, at first glance this approach (when it can be used) appears to be up to 30 times faster than 'read (for all of the reasons discussed a few days ago) on a 350 Pent II Win 95 machine. My least favorite part of the approach is the unchecked_conversion of access types. So in any case, here is a partial package and test program.....Any comments? with Ada.Streams; generic type Element_Type is private; type Array_Index_Type is range <>; type Block_Array_Type is array (Array_Index_Type) of Element_Type; package Stream_Utilities is -- -- This procedure attempts to read Item'length elements from -- the stream. If fewer than Item'length elements are read, then -- Last will return < Item'last. This procedure assumes that -- the elements being read are present in the stream in a format -- identical to the layout of the elements in the Block_Array_Type. -- -- procedure Block_Read (Stream : access Ada.Streams.Root_Stream_Type'Class; Item : access Block_Array_Type; Last : out Array_Index_Type); end Stream_Utilities; with Ada.Streams; with Unchecked_Conversion; use Ada.Streams; package body Stream_Utilities is -- -- There are essentially two approaches that immediately -- come to mind to read a block of data efficiently. -- The first is to the the standard stream read utilities -- to read the data into an array of Stream_Elements followed -- by an unchecked_conversion to our actual output data type. -- -- The second, potentially more evil approach is to used -- aliased types to essentially overlay what appears to be -- a stream element array on top of the memory where our -- actual data is. The first approach makes me feel better -- than the second, however, even though we know that not -- every instance of an unchecked_conversion really always -- causes a true copy, I found that in this instance -- at least with GNAT 3.12 under solaris we did end up -- copying the data -- procedure Block_Read ( Stream : access Ada.Streams.Root_Stream_Type'Class; Item : access Block_Array_Type; Last : out Array_Index_Type ) is -- -- Create a named access type to the block array type -- so we can create an unchecked conversion. -- type Access_Block_Data is access all Block_Array_Type; -- -- Create a constrained subtype of stream elements to -- "hold" the data returned from the stream -- subtype Proper_Sized_Stream_Element_Type is Stream_Element_Array(1 .. Item.all'Size / Stream_Element'Size); -- -- And finally, create a named access type to this -- constrained array. -- type Access_Proper_Sized_Stream_Element_Type is access Proper_Sized_Stream_Element_Type; -- -- Create a conversion between the block data and the stream data -- function To_Access is new Unchecked_Conversion ( Source => Access_Block_Data, Target => Access_Proper_Sized_Stream_Element_Type); -- -- Declare an access variable to the stream data, note that -- at this very point we are now creating an overlayed version -- of the now aliased input parameter. -- Proper_Sized_Stream_Data : Access_Proper_Sized_Stream_Element_Type := To_Access (Access_Block_Data (Item)); Local_Last : Stream_Element_Offset; begin -- -- Read the data from the stream into our access to -- stream element array. Note at this moment we are now also -- filling in the Item parameter as well. -- Read(Stream => Stream.all, Item => Proper_Sized_Stream_Data.all, Last => Local_Last); -- -- Calulate the "Last" parameter we are sending back to the -- user so they can see it. -- Last := (Item'First - 1) + Array_Index_Type'Base(Local_Last / ( Item'Size / Stream_Element'Size)); end Block_Read; end Stream_Utilities; with Stream_Utilities; with Ada.Streams.Stream_Io; with Text_Io; with Ada.Calendar; use type Ada.Calendar.Time; procedure Si is subtype Range_Type is Integer range 1 .. 8192; type My_Stream_Data_Type is mod 2**32; for My_Stream_Data_Type'Size use 32; type My_Array_Type is array (Range_Type) of My_Stream_Data_Type; pragma Pack (My_Array_Type); for My_Array_Type'Size use 8192*32; package Sii is new Stream_Utilities(My_Stream_Data_Type, Range_Type, My_Array_Type); My_Data : aliased My_Array_Type; Last : Integer; File : Ada.Streams.Stream_Io.File_Type; Match : Boolean; Repeat_Count : constant := 500; Start : Ada.Calendar.Time; Stop : Ada.Calendar.Time; begin -- -- First, create some stream data -- for I in My_Data'range loop My_Data(I) := My_Stream_Data_Type(I); end loop; Ada.Streams.Stream_Io.Create ( File => File, Name => "test_data.bin", Mode => Ada.Streams.Stream_Io.Out_File); for Repeat in 1 .. Repeat_Count loop for I in My_Data'range loop My_Data(I) := My_Data(I) + 1; end loop; My_Array_Type'Write(Ada.Streams.Stream_Io.Stream(File), My_Data); end loop; Ada.Streams.Stream_Io.Close(File); My_Data := (others => 0); -- -- Now, just double check that our block read appears to work -- at all. -- Ada.Streams.Stream_Io.Open( File => File, Name => "test_data.bin", Mode => Ada.Streams.Stream_Io.In_File); Repeat_Loop : for Repeat in 1 .. Repeat_Count loop Sii.Block_Read(Ada.Streams.Stream_Io.Stream(File), My_Data'access, Last); for I in My_Data'range loop if My_Data(I) /= My_Stream_Data_Type(I) + My_Stream_Data_Type( Repeat) then Match := False; exit Repeat_Loop; end if; end loop; end loop Repeat_Loop; Ada.Streams.Stream_Io.Close(File); Text_Io.Put("Last was :"); Text_Io.Put_Line(Integer'Image(Last)); Match := True; if Match then Text_Io.Put_Line("Match"); else Text_Io.Put_Line("No Match"); end if; -- -- Now, a little benchmarking. -- Ada.Streams.Stream_Io.Open( File => File, Name => "test_data.bin", Mode => Ada.Streams.Stream_Io.In_File); Start := Ada.Calendar.Clock; for Repeat in 1 .. Repeat_Count loop Sii.Block_Read(Ada.Streams.Stream_Io.Stream(File), My_Data'access, Last); end loop; Stop := Ada.Calendar.Clock; Text_Io.Put_Line("Block read in " & Duration'Image(Stop - Start)); Ada.Streams.Stream_Io.Close(File); Ada.Streams.Stream_Io.Open( File => File, Name => "test_data.bin", Mode => Ada.Streams.Stream_Io.In_File); Start := Ada.Calendar.Clock; for Repeat in 1 .. Repeat_Count loop My_Array_Type'Read(Ada.Streams.Stream_Io.Stream(File), My_Data); end loop; Stop := Ada.Calendar.Clock; Text_Io.Put_Line("Normal Stream Read in " & Duration'Image(Stop - Start)); Ada.Streams.Stream_Io.Close(File); end Si; ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Optimization Question -- Follow up on using the stream read (and write) procedures directly 2001-01-26 16:30 ` Optimization Question -- Follow up on using the stream read (and write) procedures directly Jeff Creem @ 2001-01-26 21:46 ` Florian Weimer 2001-01-27 19:14 ` Jeff Creem 0 siblings, 1 reply; 21+ messages in thread From: Florian Weimer @ 2001-01-26 21:46 UTC (permalink / raw) "Jeff Creem" <jeff@thecreems.com> writes: > My least favorite part of the approach is the unchecked_conversion > of access types. You shouldn't use Unchecked_Conversion in this case because the in-memory and stream representations of an object are usually different. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Optimization Question -- Follow up on using the stream read (and write) procedures directly 2001-01-26 21:46 ` Florian Weimer @ 2001-01-27 19:14 ` Jeff Creem 2001-01-28 0:26 ` Robert Dewar 0 siblings, 1 reply; 21+ messages in thread From: Jeff Creem @ 2001-01-27 19:14 UTC (permalink / raw) "Florian Weimer" <fw@deneb.enyo.de> wrote in message news:87elxq0ywd.fsf@deneb.enyo.de... > "Jeff Creem" <jeff@thecreems.com> writes: > > > My least favorite part of the approach is the unchecked_conversion > > of access types. > > You shouldn't use Unchecked_Conversion in this case because the > in-memory and stream representations of an object are usually > different. I basically agree but I was probably not clear about what I was trying to accomplish. This set of routines is for when you want the stream representation of an object to match the in memory representation and further you want to avoid the huge overhead of the individual 'read operations that would happen from the array_type'read attribute. Even the alternate approach of using unchecked_conversion on the data (v.s. the access types) would suffer from the different memory v.s. default stream representation concern (valid concern but not exactly appropriate for the intended use). Note that this package could clearly suffer in cases where element_type'size is < a stream_element'size (and a few other size cases as well) But this could be caught easily with a little more error checking. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Optimization Question -- Follow up on using the stream read (and write) procedures directly 2001-01-27 19:14 ` Jeff Creem @ 2001-01-28 0:26 ` Robert Dewar 0 siblings, 0 replies; 21+ messages in thread From: Robert Dewar @ 2001-01-28 0:26 UTC (permalink / raw) In article <t767e86uif6d5e@corp.supernews.com>, "Jeff Creem" <jeff@thecreems.com> wrote: > I basically agree but I was probably not clear about what I > was trying to accomplish. This set of > routines is for when you want the stream representation of > an object to > match the in memory > representation Perfectly reasonable, and this is EXACTLY the sort of low level target dependent, type breaking operation that Unchecked_Conversion is there for. In some ways I think one mark of an experienced Ada programmer is that they feel VERY uncomfortable when UC is used in an inappropropriate manner, and VERY comfortable when it is used in an appropriate manner. The trouble is that a lot of people have trouble distinguishing these cases, so they play it case and feel uncomfortable all the time. The danger of this attitude, sometimes enshrined in foolish coding standards that forbid the use of UC completely, is that you miss the fact that often the *cleanest* approach is to use UC :-) Sent via Deja.com http://www.deja.com/ ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2001-01-28 0:26 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2001-01-22 0:05 Optimization Question dvdeug 2001-01-22 1:57 ` Robert Dewar 2001-01-22 3:22 ` dvdeug 2001-01-22 4:05 ` Robert Dewar 2001-01-22 4:06 ` Robert Dewar 2001-01-22 19:04 ` M. Kotiaho 2001-01-22 20:22 ` dvdeug 2001-01-22 15:24 ` Ted Dennison 2001-01-22 16:12 ` Robert Dewar 2001-01-22 16:48 ` Ted Dennison 2001-01-22 16:15 ` Robert Dewar 2001-01-22 15:26 ` Ted Dennison 2001-01-22 16:17 ` Robert Dewar 2001-01-22 16:59 ` Ted Dennison 2001-01-22 22:01 ` Keith Thompson 2001-01-22 22:52 ` dvdeug 2001-01-23 6:46 ` Keith Thompson [not found] ` <94ld65$1hs$1@nnrp1.deja.com> [not found] ` <864ryodb1q.fsf@acm.org> [not found] ` <3A6F663E.C84B94D8@acm.org> 2001-01-26 16:30 ` Optimization Question -- Follow up on using the stream read (and write) procedures directly Jeff Creem 2001-01-26 21:46 ` Florian Weimer 2001-01-27 19:14 ` Jeff Creem 2001-01-28 0:26 ` Robert Dewar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox