From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: 103376,bed2755a22ee69a X-Google-Attributes: gid103376,public X-Google-Language: ENGLISH,ASCII-7-bit Path: g2news2.google.com!news4.google.com!border1.nntp.dca.giganews.com!nntp.giganews.com!newsfeed00.sul.t-online.de!t-online.de!feeder.news-service.com!newsgate.cistron.nl!xs4all!amsnews11.chello.com!nuzba.szn.dk!news.jacob-sparre.dk!pnx.dk!not-for-mail From: Jacob Sparre Andersen Newsgroups: comp.lang.ada Subject: Re: Text Processing in Ada 95 Date: Fri, 23 Feb 2007 08:53:19 +0100 Organization: Jacob's private Usenet server Message-ID: References: NNTP-Posting-Host: taasingegade.news.jacob-sparre.dk Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: jacob-sparre.dk 1172217201 27721 85.82.239.166 (23 Feb 2007 07:53:21 GMT) X-Complaints-To: news@jacob-sparre.dk NNTP-Posting-Date: Fri, 23 Feb 2007 07:53:21 +0000 (UTC) User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/21.4 (gnu/linux) Cancel-Lock: sha1:ZXfmsvUpP0tok+7ZkHXwiS4USSU= Xref: g2news2.google.com comp.lang.ada:9444 Date: 2007-02-23T08:53:19+01:00 List-Id: Rob Norris wrote: > Suppose I have a text file such as: > > tom > dick > harry > > Then I want to insert the line "dave" after tom so the file is > > tom > dave > dick > harry > > Also suppose text file can get rather large. > > Currently I'm using text_io which means I have to copy the whole > thing into memory, insert the line then over write the file with new > contents. direct_io is not an option (varing string lengths) > > What alternatives should I consider for making insertions faster? > (NB retrieval of a line needs to be fairly quick as well). I would first of all consider using POSIX.Memory_Mapping.Map_Memory to get access to the complete file as an in-memory string. Here is a piece of code I wrote recently for that purpose: Name : POSIX_String; Text_File : File_Descriptor; Text_Size : System.Storage_Elements.Storage_Offset; Text_Address : System.Address; begin [...] Text_File := Open (Name => Name, Mode => Read_Only); Text_Size := Storage_Offset (File_Size (Text_File)); -- + Inserted_Size ? Text_Address := Map_Memory (Length => Text_Size, Protection => Allow_Read, Mapping => Map_Shared, File => Text_File, Offset => 0); declare Bit_Count : constant Natural := Natural (Text_Size) * Storage_Element'Size; Character_Count : constant Natural := Bit_Count / Character'Size; Text : String (1 .. Character_Count); for Text'Address use Text_Address; begin My reason for suggesting that you map the file into memory is that you can avoid messing with buffers, caching and several copies of the file content. If you need to make lots of insertions, then I would consider mapping the lines into a insertion-friendly data structure such as a linked list. This data structure should keep track of 'First and 'Last for each line in the file. Inserting new lines would simply be a matter of writing the text of the lines to the end of the "Text" string, and inserting a pointer at the appropriate place in the data structure keeping track of the lines. The costly part of this method is to write back the lines to the file. Since it will have to be done one line at a time. Depending on the number of insertions needed, it may be cheaper simply to do the insertions with plain string slices on "Text". Greetings, Jacob -- "I've got _plenty_ of common sense!" "I just choose to ignore it."