From: Martin <martin.dowie@btopenworld.com>
Subject: Re: Q: Line_IO
Date: Mon, 31 Aug 2009 01:28:36 -0700 (PDT)
Date: 2009-08-31T01:28:36-07:00 [thread overview]
Message-ID: <7225bda9-8757-4c5c-bb44-b3be21a1e1f9@p36g2000vbn.googlegroups.com> (raw)
In-Reply-To: 4a9b045a$0$31875$9b4e6d93@newsspool3.arcor-online.net
On Aug 30, 11:59 pm, Georg Bauhaus <see.reply...@maps.futureapps.de>
wrote:
> Text_IO seems fairly slow when just reading lines of text.
> Here are two alternative I/O subprograms for Line I/O, in plain Ada,
> based on Stream_IO. They seem to run significantly faster.
>
> However, there is one glitch and I can't find the cause:
> output always has one more line at the end, an empty one.
> Why? If you have got a minute to look at this, you will
> also help us with getting faster programs at the Shootout.
> These read lines by the megabyte.
>
> generic
> Separator_Sequence : in String; -- ends a line
> package Line_IO is
>
> pragma Elaborate_Body;
>
> --
> -- High(er) speed reading and writing of lines via Stream I/O.
> -- Made with Unix pipes in mind.
> --
> -- Assumptions:
> -- - Lines are separated by a sequence of characters.
> -- - Characters and stream elements can be used interchangeably.
> -- - Lines are not longer than internal buffer size.
> --
> -- I/O exceptions are propagated
>
> procedure Print(Item : String);
>
> function Getline return String;
>
> end Line_IO;
>
> with Ada.Streams.Stream_IO;
> with Ada.Unchecked_Conversion;
>
> package body Line_IO is
>
> use Ada.Streams;
>
> Stdout : Stream_IO.File_Type;
> Stdin : Stream_IO.File_Type;
>
> -- writing
>
> procedure Print (Item : String) is
>
> subtype Index is Stream_Element_Offset range
> Stream_Element_Offset(Item'First)
> .. Stream_Element_Offset(Item'Last + Separator_Sequence'Length);
> subtype XString is String (Item'First
> .. Item'Last + Separator_Sequence'Length);
> subtype XBytes is Stream_Element_Array (Index);
> function To_Bytes is new Ada.Unchecked_Conversion
> (Source => XString,
> Target => XBytes);
> begin
> Stream_IO.Write (Stdout, To_Bytes (Item & Separator_Sequence));
> end Print;
>
> -- ----------------
> -- reading
> -- ----------------
> -- Types etc., status variables, and the buffer. `Buffer` is at the
> -- same time an array of Character and and array of Stream_Element
> -- called `Bytes`. They share the same address. This setup makes the
> -- storage at the address either a String (when selecting result
> -- characters) or a Stream_Element_Array (when reading input bytes).
>
> BUFSIZ: constant := 8_192;
> pragma Assert(Character'Size = Stream_Element'Size);
>
> SL : constant Natural := Separator_Sequence'Length;
>
> subtype Extended_Buffer_Index is Positive range 1 .. BUFSIZ + SL;
> subtype Buffer_Index is Extended_Buffer_Index
> range Extended_Buffer_Index'First .. Extended_Buffer_Index'Last - SL;
> subtype Extended_Bytes_Index is Stream_Element_Offset
> range 1 .. Stream_Element_Offset(Extended_Buffer_Index'Last);
> subtype Bytes_Index is Extended_Bytes_Index
> range Extended_Bytes_Index'First
> .. (Extended_Bytes_Index'Last - Stream_Element_Offset(SL));
>
> subtype Buffer_Data is String(Extended_Buffer_Index);
> subtype Buffer_Bytes is Stream_Element_Array(Extended_Bytes_Index);
>
> Buffer : Buffer_Data;
> Bytes : Buffer_Bytes;
> for Bytes'Address use Buffer'Address;
>
> Position : Natural; -- start of next substring
> Last : Natural; -- last valid character in buffer
>
> function Getline return String is
>
> procedure Reload;
> -- move remaining characters to the start of `Buffer` and
> -- fill the following bytes if possible
> -- post: Position in 0 .. 1, and 0 should mean end of file
> -- Last is 0 or else the index of the last valid element in
> Buffer
>
> procedure Reload is
> Remaining : constant Natural := Buffer_Index'Last - Position + 1;
> Last_Index : Stream_Element_Offset;
> begin
> Buffer(1 .. Remaining) := Buffer(Position .. Buffer_Index'Last);
>
> Stream_IO.Read(Stdin,
> Item => Bytes(Stream_Element_Offset(Remaining) + 1 ..
> Bytes_Index'Last),
> Last => Last_Index);
> Last := Natural(Last_Index);
> Buffer(Last + 1 .. Last + SL) := Separator_Sequence;
>
> Position := Boolean'Pos(Last_Index > 0
> and then Buffer(1) /= ASCII.EOT -- ^D
> and then Buffer(1) /= ASCII.SUB); -- ^Z
>
> end Reload;
>
> function Sep_Index return Natural;
> -- position of next Separator_Sequence
> pragma Inline(Sep_Index);
>
> function Sep_Index return Natural is
> K : Natural := Position;
> begin
> pragma Assert(K >= Buffer'First);
> pragma Assert(Buffer(Buffer_Index'Last + 1 .. Buffer'Last)
> = Separator_Sequence);
>
> while Buffer(K) /= Separator_Sequence(1) loop
> K := K + 1;
> end loop;
>
> return K;
> end Sep_Index;
>
> Next_Separator : Natural;
> begin -- Getline
> pragma Assert(Position = 0 or else Position in Extended_Buffer_Index);
> pragma Assert(Last = 0 or else Last in Buffer_Index);
>
> if Position = 0 then
> raise Stream_IO.End_Error;
> end if;
>
> Next_Separator := Sep_Index;
>
> if Next_Separator > Buffer_Index'Last then
> -- must be sentinel
> Reload;
> return Getline;
> end if;
>
> if Next_Separator <= Last then
> declare
> Limit : constant Natural := Natural'Max(0, Next_Separator - SL);
> -- there was trouble (Print) when Integer Limit could be
> negative
> -- (for 2-char SL and Next_Separator = 1)
> Result : constant String := Buffer(Position .. Limit);
> begin
> Position := Limit + SL + 1;
> return Result;
> end;
> else
> -- the separator is among the characters beyond `Last`
> declare
> Limit : constant Positive := Last;
> Result : constant String := Buffer(Position .. Limit);
> begin
> Position := 0; -- next call will raise End_Error
> return Result;
> end;
> end if;
>
> raise Program_Error;
> end Getline;
>
> begin
> -- (see <ILmdnWHx29q5VMrZnZ2dnUVZ_sedn...@megapath.net> for names
> -- of standard I/O streams when using Janus Ada on Windows.)
>
> Stream_IO.Open (Stdout,
> Mode => Stream_IO.Out_File,
> Name => "/dev/stdout");
> Stream_IO.Open (Stdin,
> Mode => Stream_IO.In_File,
> Name => "/dev/stdin");
>
> -- make sure there is no line separator in `Buffer` other than the
> sentinel
> Buffer := Buffer_Data'(others => ASCII.NUL);
> Buffer(Buffer_Index'Last + 1 .. Buffer'Last) := Separator_Sequence;
> Position := Buffer_Index'Last + 1; -- See also
> `Getline.Reload.Remaining`
> Last := 0;
> end Line_IO;
>
> --
> -- A small test program.
> --
> with Line_IO;
> with Ada.Text_IO;
>
> procedure Test_Line_IO is
> Want_Text_IO : constant Boolean := False;
>
> -- pick the correct one for your input files
> UnixLF : constant String := String'(1 => ASCII.LF);
> MacCR : constant String := String'(1 => ASCII.CR);
> OS2CRLF : constant String := String'(1 => ASCII.CR, 2 => ASCII.LF);
>
> package LIO is new Line_IO(Separator_Sequence => UnixLF);
>
> begin
> if Want_Text_IO then
> loop
> declare
> A_Line : constant String := Ada.Text_IO.Get_Line;
> begin
> LIO.Print(A_Line);
> null;
> pragma Inspection_Point(A_Line);
> end;
> end loop;
> else
> loop
> declare
> A_Line : constant String := LIO.Getline;
> begin
> LIO.Print(A_Line);
> null;
> pragma Inspection_Point(A_Line);
> end;
> end loop;
> end if;
>
> end Test_Line_IO;
Nice one...I'll try these out on Win23 and see what happens :-)
But surely "Put_Line" and "Get_Line" are preferable subprogram
names?...
Cheers
-- Martin
next parent reply other threads:[~2009-08-31 8:28 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <4a9b045a$0$31875$9b4e6d93@newsspool3.arcor-online.net>
2009-08-31 8:28 ` Martin [this message]
2009-08-31 10:05 ` Q: Line_IO Georg Bauhaus
2009-08-31 15:33 ` Anh Vo
2009-08-31 16:52 ` Georg Bauhaus
2009-08-31 18:39 ` Dmitry A. Kazakov
2009-08-31 22:51 ` Robert A Duff
2009-09-01 0:35 ` Georg Bauhaus
2009-08-31 23:56 ` Georg Bauhaus
2009-09-01 0:19 ` Georg Bauhaus
2009-09-01 1:08 ` Robert A Duff
2009-09-01 7:02 ` Ludovic Brenta
2009-09-01 9:55 ` Georg Bauhaus
2009-09-01 12:03 ` jonathan
[not found] ` <4a9e2c86$0$30235$9b4e6d93@newsspool1.arcor-online.net>
2009-09-02 8:47 ` Georg Bauhaus
2009-09-05 20:30 ` Georg Bauhaus
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox