From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: 103376,4fbd260da735f6f4 X-Google-Attributes: gid103376,public X-Google-Language: ENGLISH,ASCII Path: g2news1.google.com!news2.google.com!news.glorb.com!news.tele.dk!feed118.news.tele.dk!news.tele.dk!small.news.tele.dk!newsfeed1.swip.net!swipnet!newsfeed.sunet.se!news01.sunet.se!213.163.128.165.MISMATCH!news.glocalnet.net!not-for-mail From: Quarc Newsgroups: comp.lang.ada Subject: Re: Reading and writing a big file in Ada (GNAT) on Windows XP Date: Sun, 06 May 2007 23:55:33 +0200 Organization: Glocalnet AB Message-ID: References: <0hj5339mjmond132qhbn2o01unurs61lbj@4ax.com> NNTP-Posting-Host: aristotle.glocalnet.net Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Trace: yggdrasil.glocalnet.net 1178488353 1907 213.163.130.195 (6 May 2007 21:52:33 GMT) X-Complaints-To: abuse@glocalnet.net NNTP-Posting-Date: 6 May 2007 21:52:33 GMT User-Agent: Thunderbird 1.5.0.10 (Windows/20070221) In-Reply-To: Xref: g2news1.google.com comp.lang.ada:15606 Date: 2007-05-06T21:52:33+00:00 List-Id: Fionn Mac Cumhaill wrote: > On Sat, 28 Apr 2007 19:12:33 GMT, "Jeffrey R. Carter" > wrote: > >> Fionn Mac Cumhaill wrote: >>> All it does is read lines in a loop from a text file with >>> Ada.Text_IO.Get_Line, does minor modifications on about 80% of the >>> lines that it reads, and writes the lines to an output file with >>> Put_Line. >>> >>> The modifications consist of replacing a slice of text at the end of a >>> line with another bit of text. The biggest slice is 10 characters, and >>> the replacement slice is always smaller than the original slice. An >>> occasional line of text is about 6000 characters long, but most are >>> about 700 haracters. Get_Line reads them into a String variable that >>> is 10,000 characters long. >>> >>> The problem is that the input file has more than 10 million lines of >>> text in it. The program works perfectly, but takes about 5 hours to >>> run. The Cygwin version of wc can count the lines in the input file in >>> less than one minute. >>> >>> Why is this so slow? >>> Do I have an Ada problem, a GNAT problem, or a MinGW problem? Sorry, but I don't have your full question here so trying to figure out your problem for the quote above. I woul assume the big difference between what you are doing and what WC does is that wc only reads the file, and can therefore do it in one go. I am speculation now, but I think one explanation for this would be that you are reading, then writing each line by itself. Depending on the OS you are running on, and also the binding to the OS this could potentially create a loot of seek times on the disk. This could explain the long times I think. One of two things happen I think (or potentially both). 1) You read 700 bytes from disk, you write 700 bytes to disk. Assume average seek times of 5 ms and you get to 2*50 000 s = 100 000 s = 28 hours !!!! 2) Instead you creating the seektimes on the disk between reads, ther are other processes waiting for disk access in the OS as well. SO qite often some other process get to the top of the waiting queue, and can access disk, causing the seektimes. If one of the above is causing your problem, the only solution is to read in larger chunks at once, and then write larger chunks of data as well. I don't remember how to set that up, and it is of course depending on compiler and OS, but hopefully you should be able to make sure you don't read line by line from disk. regards Peter Atterfj�ll (haven't been working professionally with Ada for 14 years now :-)