comp.lang.ada
 help / color / mirror / Atom feed
* Text Processing in Ada 95
@ 2007-02-21 18:09 Rob Norris
  2007-02-21 19:07 ` Pascal Obry
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Rob Norris @ 2007-02-21 18:09 UTC (permalink / raw)



Suppose I have a text file such as:

tom
dick
harry

Then I want to insert the line "dave" after tom so the file is

tom
dave
dick
harry

Also suppose text file can get rather large.

Currently I'm using text_io which means I have to copy the whole thing into memory, insert the line
then over write the file with new contents. direct_io is not an option (varing string lengths)

What alternatives should I consider for making insertions faster?
(NB retrieval of a line needs to be fairly quick as well).

Thanks in advance.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Text Processing in Ada 95
  2007-02-21 18:09 Text Processing in Ada 95 Rob Norris
@ 2007-02-21 19:07 ` Pascal Obry
  2007-02-22 14:02   ` Larry Kilgallen
  2007-02-21 20:16 ` Randy Brukardt
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Pascal Obry @ 2007-02-21 19:07 UTC (permalink / raw)
  To: Rob Norris

Rob Norris a �crit :
> Currently I'm using text_io which means I have to copy the whole thing
> into memory, insert the line then over write the file with new
> contents. direct_io is not an option (varing string lengths)

Use Text_IO ok, but why copy all in memory. Just write to a temp file,
delete the original one and rename the temp file.

> What alternatives should I consider for making insertions faster?
> (NB retrieval of a line needs to be fairly quick as well).

If you have a lot of modifications to do on the file, then you probably
need to read all the file in memory to have good performances. In this
case, read file in blocks using Stream_IO. Do the changes in memory and
write it back in blocks using Stream_IO.

Pascal.

-- 

--|------------------------------------------------------
--| Pascal Obry                           Team-Ada Member
--| 45, rue Gabriel Peri - 78114 Magny Les Hameaux FRANCE
--|------------------------------------------------------
--|              http://www.obry.net
--| "The best way to travel is by means of imagination"
--|
--| gpg --keyserver wwwkeys.pgp.net --recv-key C1082595



^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: Text Processing in Ada 95
  2007-02-21 18:09 Text Processing in Ada 95 Rob Norris
  2007-02-21 19:07 ` Pascal Obry
@ 2007-02-21 20:16 ` Randy Brukardt
  2007-02-22  2:45 ` Jeffrey R. Carter
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Randy Brukardt @ 2007-02-21 20:16 UTC (permalink / raw)
  To: comp.lang.ada

Rob Norris writes:
...
> Currently I'm using text_io which means I have to copy the whole
> thing into memory, insert the line
> then over write the file with new contents. direct_io is not an
> option (varing string lengths)
>
> What alternatives should I consider for making insertions faster?
> (NB retrieval of a line needs to be fairly quick as well).

An insertion into a file requires that you're going to have to rewrite
everything after the insertion anyway. So there isn't any way to do that
cheaply if you have to do it one insertion at a time. Thus my recommendation
is to not do it - that is, find a better way to accomplish whatever it is
you need to do. For instance, create a file listing the insertions as an
adjunct to the original file, and do the merge only on rare occassions. That
would allow copying the file only rarely, and allows doing a large number of
insertions at once.

If you absolutely have do this as you described, Pascal Obry's suggestions
are probably the best. My Trash Finder spam filter has to do this to add
header lines to stored messages, and it uses Stream_IO to read and write the
file (that can be much cheaper than using Text_IO, because it does not need
to look for the ends of lines once it has determined the insertion point).

                 Randy.




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Text Processing in Ada 95
  2007-02-21 18:09 Text Processing in Ada 95 Rob Norris
  2007-02-21 19:07 ` Pascal Obry
  2007-02-21 20:16 ` Randy Brukardt
@ 2007-02-22  2:45 ` Jeffrey R. Carter
  2007-02-22 11:56 ` Rob Norris
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Jeffrey R. Carter @ 2007-02-22  2:45 UTC (permalink / raw)


Rob Norris wrote:
> 
> Currently I'm using text_io which means I have to copy the whole thing into memory, insert the line
> then over write the file with new contents. direct_io is not an option (varing string lengths)

Is there an upper limit to the string lengths (other than 
Positive'Last)? If so, you could use Ada.Strings.Bounded and Direct_IO.

-- 
Jeff Carter
"You tiny-brained wipers of other people's bottoms!"
Monty Python & the Holy Grail
18



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Text Processing in Ada 95
  2007-02-21 18:09 Text Processing in Ada 95 Rob Norris
                   ` (2 preceding siblings ...)
  2007-02-22  2:45 ` Jeffrey R. Carter
@ 2007-02-22 11:56 ` Rob Norris
  2007-02-23  4:30   ` Jeffrey R. Carter
  2007-02-23 13:51   ` Stephen Leake
  2007-02-23  4:55 ` Steve
  2007-02-23  7:53 ` Jacob Sparre Andersen
  5 siblings, 2 replies; 12+ messages in thread
From: Rob Norris @ 2007-02-22 11:56 UTC (permalink / raw)


On Wed, 21 Feb 2007 18:09:09 +0000, Rob Norris <firstname.lastname@baesystems.com> wrote:

>
>Suppose I have a text file such as:
>
<cut>

Thanks everyone for the input.
I suspected as much that I would have to do some stream_io.

Unfortunately I can't do things any other way. The requirement is for a text file :(

I may investigate bounded string option as mentioned by Jeffrey C.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Text Processing in Ada 95
  2007-02-21 19:07 ` Pascal Obry
@ 2007-02-22 14:02   ` Larry Kilgallen
  0 siblings, 0 replies; 12+ messages in thread
From: Larry Kilgallen @ 2007-02-22 14:02 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1127 bytes --]

In article <45DC988E.7040800@obry.net>, Pascal Obry <pascal@obry.net> writes:
> Rob Norris a �crit :
>> Currently I'm using text_io which means I have to copy the whole thing
>> into memory, insert the line then over write the file with new
>> contents. direct_io is not an option (varing string lengths)
> 
> Use Text_IO ok, but why copy all in memory. Just write to a temp file,
> delete the original one and rename the temp file.
> 
>> What alternatives should I consider for making insertions faster?
>> (NB retrieval of a line needs to be fairly quick as well).
> 
> If you have a lot of modifications to do on the file, then you probably
> need to read all the file in memory to have good performances. In this
> case, read file in blocks using Stream_IO. Do the changes in memory and
> write it back in blocks using Stream_IO.

Or use an operating system feature that allows insertions.  GNAT Ada 95
on VMS is supposed to emulate DEC Ada features, so that should include
the Mixed_Indexed_IO package.  There will be more overhead in disk files
but for large files it is much better for inserting data in the middle.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Text Processing in Ada 95
  2007-02-22 11:56 ` Rob Norris
@ 2007-02-23  4:30   ` Jeffrey R. Carter
  2007-02-23 13:51   ` Stephen Leake
  1 sibling, 0 replies; 12+ messages in thread
From: Jeffrey R. Carter @ 2007-02-23  4:30 UTC (permalink / raw)


Rob Norris wrote:
> 
> Unfortunately I can't do things any other way. The requirement is for a text file :(
> 
> I may investigate bounded string option as mentioned by Jeffrey C.

Then using Direct_IO as I suggested wouldn't meet the requirement.

-- 
Jeff Carter
"Your mother was a hamster and your father smelt of elderberries."
Monty Python & the Holy Grail
06



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Text Processing in Ada 95
  2007-02-21 18:09 Text Processing in Ada 95 Rob Norris
                   ` (3 preceding siblings ...)
  2007-02-22 11:56 ` Rob Norris
@ 2007-02-23  4:55 ` Steve
  2007-02-23  5:19   ` Randy Brukardt
  2007-02-23  7:53 ` Jacob Sparre Andersen
  5 siblings, 1 reply; 12+ messages in thread
From: Steve @ 2007-02-23  4:55 UTC (permalink / raw)


"Rob Norris" <firstname.lastname@baesystems.com> wrote in message 
news:l22pt2114r910becfeg00e1dl1vb7flesq@4ax.com...
>
> Suppose I have a text file such as:
>
> tom
> dick
> harry
>
> Then I want to insert the line "dave" after tom so the file is
>
> tom
> dave
> dick
> harry
>
> Also suppose text file can get rather large.
>
> Currently I'm using text_io which means I have to copy the whole thing 
> into memory, insert the line
> then over write the file with new contents. direct_io is not an option 
> (varing string lengths)

Actually Direct_IO is an option, and probably the fastest way to handle the 
operation.

  Step 1. Determine the initial file size
  Step 2. Allocate a buffer that is the size of the file plus the size of 
the string you want to add (including a line terminator)
  Step 3. Create an instance Direct_IO that is the file size
  Step 5. Read the file into the start of the allocated buffer in one gulp
  Step 6. Insert your string in the buffer (a little tricky, but doable)
  Step 7. Create an instance of Direct_IO that is the size of the buffer 
with the new string
  Step 8. Write the buffer to a file as one operation.

It's kind of messy to deal with the content of a text file in a buffer, and 
will not work on systems with structured files (like VMS) but will work on 
most modern systems.

Regards,
Steve
(The Duck)

>
> What alternatives should I consider for making insertions faster?
> (NB retrieval of a line needs to be fairly quick as well).
>
> Thanks in advance. 





^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: Text Processing in Ada 95
  2007-02-23  4:55 ` Steve
@ 2007-02-23  5:19   ` Randy Brukardt
  0 siblings, 0 replies; 12+ messages in thread
From: Randy Brukardt @ 2007-02-23  5:19 UTC (permalink / raw)
  To: comp.lang.ada

Steve writes:

...
> Actually Direct_IO is an option, and probably the fastest way to
> handle the
> operation.
>
>   Step 1. Determine the initial file size
>   Step 2. Allocate a buffer that is the size of the file plus the size of
>     the string you want to add (including a line terminator)
>   Step 3. Create an instance Direct_IO that is the file size
>   Step 5. Read the file into the start of the allocated buffer in one gulp
>   Step 6. Insert your string in the buffer (a little tricky, but doable)
>   Step 7. Create an instance of Direct_IO that is the size of the buffer
>     with the new string
>   Step 8. Write the buffer to a file as one operation.

That's how you'd do it in Ada 83, but that's an awful lot of unnecessary
complication in Ada 95 (not to mention Ada 2007). Just use Stream_IO for
this, and you don't need instances to fill and write your buffer. (And you
can easily start in the middle of the file and only read part of it if that
works for your application.)

I.e.

>   Step 1. Determine the initial file size
    Use Stream_IO.Size(File).
>   Step 2. Allocate a buffer that is the size of the file plus the size of
>     the string you want to add (including a line terminator)
    Buffer : Stream_Element_Array (1 .. Size); -- But you can make it
bigger.
>   Step 3. Create an instance Direct_IO that is the file size
    null;
>   Step 5. Read the file into the start of the allocated buffer in one gulp
    Stream_IO.Read (File, Buffer, Last);
>   Step 6. Insert your string in the buffer (a little tricky, but doable)
    Exercise for the reader. ;-)
>   Step 7. Create an instance of Direct_IO that is the size of the buffer
>     with the new string
    null;
>   Step 8. Write the buffer to a file as one operation.
    Stream_IO.Set_Mode(File, Out_File); -- Or Reset.
    Stream_IO.Write(File, Buffer);

                          Randy.




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Text Processing in Ada 95
  2007-02-21 18:09 Text Processing in Ada 95 Rob Norris
                   ` (4 preceding siblings ...)
  2007-02-23  4:55 ` Steve
@ 2007-02-23  7:53 ` Jacob Sparre Andersen
  5 siblings, 0 replies; 12+ messages in thread
From: Jacob Sparre Andersen @ 2007-02-23  7:53 UTC (permalink / raw)


Rob Norris wrote:

> Suppose I have a text file such as:
>
> tom
> dick
> harry
>
> Then I want to insert the line "dave" after tom so the file is
>
> tom
> dave
> dick
> harry
>
> Also suppose text file can get rather large.
>
> Currently I'm using text_io which means I have to copy the whole
> thing into memory, insert the line then over write the file with new
> contents. direct_io is not an option (varing string lengths)
>
> What alternatives should I consider for making insertions faster?
> (NB retrieval of a line needs to be fairly quick as well).

I would first of all consider using POSIX.Memory_Mapping.Map_Memory to
get access to the complete file as an in-memory string.  Here is a
piece of code I wrote recently for that purpose:

   Name         : POSIX_String;
   Text_File    : File_Descriptor;
   Text_Size    : System.Storage_Elements.Storage_Offset;
   Text_Address : System.Address;
begin
   [...]
   Text_File := Open (Name => Name,
                      Mode => Read_Only);
   Text_Size := Storage_Offset (File_Size (Text_File)); -- + Inserted_Size ?
   Text_Address := Map_Memory (Length     => Text_Size,
                               Protection => Allow_Read,
                               Mapping    => Map_Shared,
                               File       => Text_File,
                               Offset     => 0);

   declare
      Bit_Count       : constant Natural :=
                          Natural (Text_Size) * Storage_Element'Size;
      Character_Count : constant Natural :=
                          Bit_Count / Character'Size;

      Text : String (1 .. Character_Count);
      for Text'Address use Text_Address;
   begin

My reason for suggesting that you map the file into memory is that you
can avoid messing with buffers, caching and several copies of the file
content.

If you need to make lots of insertions, then I would consider mapping
the lines into a insertion-friendly data structure such as a linked
list.  This data structure should keep track of 'First and 'Last for
each line in the file.  Inserting new lines would simply be a matter
of writing the text of the lines to the end of the "Text" string, and
inserting a pointer at the appropriate place in the data structure
keeping track of the lines.

The costly part of this method is to write back the lines to the file.
Since it will have to be done one line at a time.  Depending on the
number of insertions needed, it may be cheaper simply to do the
insertions with plain string slices on "Text".

Greetings,

Jacob
-- 
"I've got _plenty_ of common sense!"
"I just choose to ignore it."



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Text Processing in Ada 95
  2007-02-22 11:56 ` Rob Norris
  2007-02-23  4:30   ` Jeffrey R. Carter
@ 2007-02-23 13:51   ` Stephen Leake
  2007-02-26 12:11     ` Rob Norris
  1 sibling, 1 reply; 12+ messages in thread
From: Stephen Leake @ 2007-02-23 13:51 UTC (permalink / raw)


Rob Norris <firstname.lastname@baesystems.com> writes:

> On Wed, 21 Feb 2007 18:09:09 +0000, Rob Norris
> <firstname.lastname@baesystems.com> wrote:
>
>>
>>Suppose I have a text file such as:
>>
> <cut>
>
> Thanks everyone for the input.
> I suspected as much that I would have to do some stream_io.
>
> Unfortunately I can't do things any other way. The requirement is
> for a text file :(

Just be cause the file is "text" on the disk, doesn't mean you have to
use Ada.Text_IO to read and write it. 

Ada.Stream_IO reads and writes "text" files perfectly well. 

If you were writing this program in C, you would have no choice other
than the C equivalent of Ada.Stream_IO, and no one would claim you
were not using "text" files.

And what is a "text" file, precisely?

-- 
-- Stephe



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Text Processing in Ada 95
  2007-02-23 13:51   ` Stephen Leake
@ 2007-02-26 12:11     ` Rob Norris
  0 siblings, 0 replies; 12+ messages in thread
From: Rob Norris @ 2007-02-26 12:11 UTC (permalink / raw)


On Fri, 23 Feb 2007 08:51:23 -0500, Stephen Leake <stephen_leake@stephe-leake.org> wrote:

>Rob Norris <firstname.lastname@baesystems.com> writes:
>
>> On Wed, 21 Feb 2007 18:09:09 +0000, Rob Norris
>> <firstname.lastname@baesystems.com> wrote:
>>
>>>
>>>Suppose I have a text file such as:
>>>
>> <cut>
>>
>> Thanks everyone for the input.
>> I suspected as much that I would have to do some stream_io.
>>
>> Unfortunately I can't do things any other way. The requirement is
>> for a text file :(

I think I meant in some direct_io way would not make it easier.

>Just be cause the file is "text" on the disk, doesn't mean you have to
>use Ada.Text_IO to read and write it. 
>
>Ada.Stream_IO reads and writes "text" files perfectly well. 
>
>If you were writing this program in C, you would have no choice other
>than the C equivalent of Ada.Stream_IO, and no one would claim you
>were not using "text" files.
>
>And what is a "text" file, precisely?

Well quite, the file is just a bunch of bytes that when interpretated as ASCII format gives
something humanly readable.



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2007-02-26 12:11 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-21 18:09 Text Processing in Ada 95 Rob Norris
2007-02-21 19:07 ` Pascal Obry
2007-02-22 14:02   ` Larry Kilgallen
2007-02-21 20:16 ` Randy Brukardt
2007-02-22  2:45 ` Jeffrey R. Carter
2007-02-22 11:56 ` Rob Norris
2007-02-23  4:30   ` Jeffrey R. Carter
2007-02-23 13:51   ` Stephen Leake
2007-02-26 12:11     ` Rob Norris
2007-02-23  4:55 ` Steve
2007-02-23  5:19   ` Randy Brukardt
2007-02-23  7:53 ` Jacob Sparre Andersen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox