From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: 103376,4fbd260da735f6f4 X-Google-Attributes: gid103376,public X-Google-Language: ENGLISH,ASCII-7-bit Path: g2news1.google.com!news4.google.com!out01b.usenetserver.com!news.usenetserver.com!in02.usenetserver.com!news.usenetserver.com!cycny01.gnilink.net!spamkiller.gnilink.net!gnilink.net!trnddc04.POSTED!72fcb693!not-for-mail From: Fionn Mac Cumhaill Newsgroups: comp.lang.ada Subject: Re: Reading and writing a big file in Ada (GNAT) on Windows XP Message-ID: References: <0hj5339mjmond132qhbn2o01unurs61lbj@4ax.com> <1178091967.392381.282510@o5g2000hsb.googlegroups.com> X-Newsreader: Forte Agent 4.2/32.1117 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Date: Thu, 03 May 2007 06:31:49 GMT NNTP-Posting-Host: 71.170.31.60 X-Complaints-To: abuse@verizon.net X-Trace: trnddc04 1178173909 71.170.31.60 (Thu, 03 May 2007 02:31:49 EDT) NNTP-Posting-Date: Thu, 03 May 2007 02:31:49 EDT Xref: g2news1.google.com comp.lang.ada:15472 Date: 2007-05-03T06:31:49+00:00 List-Id: On 2 May 2007 00:46:07 -0700, george@gentoo.org wrote: >Fionn Mac Cumhaill wrote: >> All it does is read lines in a loop from a text file with >> Ada.Text_IO.Get_Line, does minor modifications on about 80% of the >> lines that it reads, and writes the lines to an output file with >> Put_Line. >Looks like a perfect task for sed utility to me. Even if sed does not >do all you need you can at the very least check some simple sed >transformation on your data, which should give you an estimate of what >run times you can expect with your file. >I am not familiar with MinGW environment. If it is anything like >CygWin then you should be able to run GNU tools and perform that test >with sed. At least this should answer the > >> Do I have an Ada problem, a GNAT problem, or a MinGW problem? > >(and CygWin does tend to be slower than native computation, however I >used it only a few times). > > >George I considered sed, but I know next to nothing about it, and decided that by the time I learned enough to use it, I could have done the job in Ada several times over. I did do further investigation; I made a copy of the now-working program and threw most of the program away, leaving only a very simple program which read the large input file, but made no changes and did no output. I added code to track the run time and put the buffer clear back in. It read the 10 million lines in just a little over five minutes. I then put Index back and used it to search the buffer for a short string that would never be found, seatching forwards from the beginning of the input buffer. Bingo. Run time increased to a bit more than 1-1/2 hours. One of the earlier commenters observed that there was a bug in GNAT's Index. It will be interesting to try this again once the bug is fixed.