comp.lang.ada
 help / color / mirror / Atom feed
From: Jacob Sparre Andersen <sparre@nbi.dk>
Subject: Re: Text Processing in Ada 95
Date: Fri, 23 Feb 2007 08:53:19 +0100
Date: 2007-02-23T08:53:19+01:00	[thread overview]
Message-ID: <ygeslcxl380.fsf@hugsarin.dmusyd.edu> (raw)
In-Reply-To: l22pt2114r910becfeg00e1dl1vb7flesq@4ax.com

Rob Norris wrote:

> Suppose I have a text file such as:
>
> tom
> dick
> harry
>
> Then I want to insert the line "dave" after tom so the file is
>
> tom
> dave
> dick
> harry
>
> Also suppose text file can get rather large.
>
> Currently I'm using text_io which means I have to copy the whole
> thing into memory, insert the line then over write the file with new
> contents. direct_io is not an option (varing string lengths)
>
> What alternatives should I consider for making insertions faster?
> (NB retrieval of a line needs to be fairly quick as well).

I would first of all consider using POSIX.Memory_Mapping.Map_Memory to
get access to the complete file as an in-memory string.  Here is a
piece of code I wrote recently for that purpose:

   Name         : POSIX_String;
   Text_File    : File_Descriptor;
   Text_Size    : System.Storage_Elements.Storage_Offset;
   Text_Address : System.Address;
begin
   [...]
   Text_File := Open (Name => Name,
                      Mode => Read_Only);
   Text_Size := Storage_Offset (File_Size (Text_File)); -- + Inserted_Size ?
   Text_Address := Map_Memory (Length     => Text_Size,
                               Protection => Allow_Read,
                               Mapping    => Map_Shared,
                               File       => Text_File,
                               Offset     => 0);

   declare
      Bit_Count       : constant Natural :=
                          Natural (Text_Size) * Storage_Element'Size;
      Character_Count : constant Natural :=
                          Bit_Count / Character'Size;

      Text : String (1 .. Character_Count);
      for Text'Address use Text_Address;
   begin

My reason for suggesting that you map the file into memory is that you
can avoid messing with buffers, caching and several copies of the file
content.

If you need to make lots of insertions, then I would consider mapping
the lines into a insertion-friendly data structure such as a linked
list.  This data structure should keep track of 'First and 'Last for
each line in the file.  Inserting new lines would simply be a matter
of writing the text of the lines to the end of the "Text" string, and
inserting a pointer at the appropriate place in the data structure
keeping track of the lines.

The costly part of this method is to write back the lines to the file.
Since it will have to be done one line at a time.  Depending on the
number of insertions needed, it may be cheaper simply to do the
insertions with plain string slices on "Text".

Greetings,

Jacob
-- 
"I've got _plenty_ of common sense!"
"I just choose to ignore it."



      parent reply	other threads:[~2007-02-23  7:53 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-02-21 18:09 Text Processing in Ada 95 Rob Norris
2007-02-21 19:07 ` Pascal Obry
2007-02-22 14:02   ` Larry Kilgallen
2007-02-21 20:16 ` Randy Brukardt
2007-02-22  2:45 ` Jeffrey R. Carter
2007-02-22 11:56 ` Rob Norris
2007-02-23  4:30   ` Jeffrey R. Carter
2007-02-23 13:51   ` Stephen Leake
2007-02-26 12:11     ` Rob Norris
2007-02-23  4:55 ` Steve
2007-02-23  5:19   ` Randy Brukardt
2007-02-23  7:53 ` Jacob Sparre Andersen [this message]
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox