reading a text file into a string

comp.lang.ada
 help / color / mirror / Atom feed

* reading a text file into a string
@ 2004-07-15 17:27 zork
  2004-07-15 17:49 ` Marius Amado Alves
                   ` (5 more replies)
  0 siblings, 6 replies; 44+ messages in thread
From: zork @ 2004-07-15 17:27 UTC (permalink / raw)


hi, i would like to read a whole text file into a string. I thought of using
an unbounded_string for this:

----------
c, char: character;
text : unbounded_string;
...
-- read in text file
while not end_of_file ( File ) loop
    Get ( File, c );
    append ( text, c );
end loop;
...
put ( to_string ( text ) ); -- display content of unbounded_string

-- process text
for i in 1 .. length ( text ) loop
  char := element ( text, i );
  ...
end loop;
-----------

... is this the general way of going about it? or is there a more prefered
method of reading in a whole text file (into whatever format) for
processing?

Thanks again!

cheers,
zork





^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: reading a text file into a string
  2004-07-15 17:27 reading a text file into a string zork
@ 2004-07-15 17:49 ` Marius Amado Alves
  2004-07-15 19:57   ` Nick Roberts
  2004-07-15 17:59 ` Marius Amado Alves
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 44+ messages in thread
From: Marius Amado Alves @ 2004-07-15 17:49 UTC (permalink / raw)
  Cc: comp.lang.ada

Unbounded_String is the right container for this. But note Get for 
characters skips over newlines and relatives (I think!) Do you really 
want to loose that information? If not then consider using Get_Immediate 
or Get_Line. This assuming you are using Ada.Text_IO.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: reading a text file into a string
  2004-07-15 17:27 reading a text file into a string zork
  2004-07-15 17:49 ` Marius Amado Alves
@ 2004-07-15 17:59 ` Marius Amado Alves
  2004-07-15 19:18   ` Nick Roberts
  2004-07-15 19:18 ` Nick Roberts
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 44+ messages in thread
From: Marius Amado Alves @ 2004-07-15 17:59 UTC (permalink / raw)
  To: comp.lang.ada

"...is there a more prefered method of reading in a whole text file?"

The preference depends. The methods are:

- declare a string S of size equal to the file size and then call a 
standard string reading procedure with S as the out parameter

- use stream attributes 'Read, 'Input of type String and facilities in 
Ada.Streams.Stream_IO




^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: reading a text file into a string
  2004-07-15 17:59 ` Marius Amado Alves
@ 2004-07-15 19:18   ` Nick Roberts
  0 siblings, 0 replies; 44+ messages in thread
From: Nick Roberts @ 2004-07-15 19:18 UTC (permalink / raw)

On Thu, 15 Jul 2004 18:59:06 +0100, Marius Amado Alves  
<amado.alves@netcabo.pt> wrote:

> "...is there a more prefered method of reading in a whole text file?"
>
> The preference depends.

That's true, there are many ways to skin a cat, and how you read and  
process the file will very much depend on what you are trying to do.

> The methods are:
>
> - declare a string S of size equal to the file size and
> then call a standard string reading procedure with S as
> the out parameter

The problem with this idea is that there is no standard way to determine  
the size of a text file. For some kinds of text 'file' (such as a device  
or pipe), it may not be possible to tell in advance by any means.

> - use stream attributes 'Read, 'Input of type String and
> facilities in Ada.Streams.Stream_IO

I think this is a terrible idea, unless you know what the character  
encoding is, and wish to do the decoding yourself!

-- 
Nick Roberts

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: reading a text file into a string
  2004-07-15 17:27 reading a text file into a string zork
  2004-07-15 17:49 ` Marius Amado Alves
  2004-07-15 17:59 ` Marius Amado Alves
@ 2004-07-15 19:18 ` Nick Roberts
  2004-07-15 20:02   ` Nick Roberts
  2004-07-16  1:23 ` Jeffrey Carter
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 44+ messages in thread
From: Nick Roberts @ 2004-07-15 19:18 UTC (permalink / raw)

On Fri, 16 Jul 2004 03:27:57 +1000, zork <zork@nospam.com> wrote:

> hi, i would like to read a whole text file into a string. I thought of  
> using
> an unbounded_string for this:
>
> ----------
> c, char: character;
> text : unbounded_string;
> ...
> -- read in text file
> while not end_of_file ( File ) loop
>     Get ( File, c );
>     append ( text, c );
> end loop;
> ...
> put ( to_string ( text ) ); -- display content of unbounded_string
>
> -- process text
> for i in 1 .. length ( text ) loop
>   char := element ( text, i );
>   ...
> end loop;
> -----------
>
> ... is this the general way of going about it? or is there a more  
> prefered
> method of reading in a whole text file (into whatever format) for
> processing?

It is usual to read and process information from files a piece at a time.

Quite often a file is read a piece at a time, each piece is interpreted in  
some way, and then some structure is built up in memory from the  
interpreted pieces. Then, typically, further processing is done using the  
whole structure.

It is unusual to read an entire text file into a string in memory.  
However, sometimes this may be a quick and convenient technique for  
achieving results in a hurry. An unbounded string will generally be the  
appropriate data structure to use for this purpose. The problem you do not  
address with the code you suggest above -- as another poster has pointed  
out -- is that of line breaks. One easy possibility might be to insert:

    if End_of_Line(File) then
       Append( text, Ada.Characters.Latin_1.NUL );
       Skip_Line(File);
    end if;

between the Get and the append. Line breaks are then indicated by the NUL  
character, and could be processed as such. This should work provided the  
file itself does not contain any NULs.

-- 
Nick Roberts

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: reading a text file into a string
  2004-07-15 17:49 ` Marius Amado Alves
@ 2004-07-15 19:57   ` Nick Roberts
  0 siblings, 0 replies; 44+ messages in thread
From: Nick Roberts @ 2004-07-15 19:57 UTC (permalink / raw)


On Thu, 15 Jul 2004 18:49:17 +0100, Marius Amado Alves  
<amado.alves@netcabo.pt> wrote:

> Unbounded_String is the right container for this. But note Get
> for characters skips over newlines and relatives (I think!) Do
> you really want to loose that information? If not then consider
> using Get_Immediate or Get_Line. This assuming you are using
> Ada.Text_IO.

I don't suggest using Get_Immediate for this purpose. You could use:

(a) a combination of Get and the End_of_Line function; or

(b) Get_Line.

Option (a) is likely to be slower, but this might not worry you.

If you use option (b), you must either not care about lines which are too  
long (or know that none are) or you must write special code to deal with  
overlong lines.

Example of option (a):

    c, char: character;
    text : unbounded_string;
    Line_Break: constant Character := Ada.Characters.Latin_1.NUL;
    ...
    -- read in text file
    while not End_of_File(File) loop
       if End_of_Line(File) then
          Append( text, Line_Break );
          Skip_Line(File);
       end if;
       Get( File, c );
       Append( text, c );
    end loop;
    ...
    -- display content of unbounded_string:
    for i in 1..Length(text) loop
       if Element(text,i) = Line_Break then
          Put_Line;
       else
          Put( Element(text,i) );
       end if;
    end loop;

    -- process text
    for i in 1 .. length ( text ) loop
      char := element ( text, i );
      ...
    end loop;

This will work, but it may not be very efficient.

Example of option (b):

    with AI302.Containers.Vectors;
    with Ada.Strings.Unbounded;
    use Ada.Strings.Unbounded;
    ...
    package Line_Vectors is
       new AI302.Containers.Vectors(Positive,Unbounded_String);
    use Line_Vectors;
    ...
    Text: Line_Vectors.Vector_Type;
    Line: String(1..100);
    LL: Natural;
    ...
    -- Read in text file:
    while not End_of_File(File) loop
       Read( File, Line, LL ); -- read line or line segment
       Append( Text, To_Unbounded_String(Line(1..LL)) );
    end loop;
    -- Display content of unbounded_string:
    for i in 1..Natural(Length(Text)) loop
	Put_Line( To_String( Element(Text,i) ) );
    end loop;
    ...

If a line is in the file which is longer than 100 characters, it will be  
broken into two or more lines in the Text variable. Try this yourself.

You can download the AI-302 sample implementation packages from:

    http://home.earthlink.net/~matthewjheaney/charles/ai302-20040227.zip

courtesy of Matthew Heaney (thanks Matt).

-- 
Nick Roberts



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: reading a text file into a string
  2004-07-15 19:18 ` Nick Roberts
@ 2004-07-15 20:02   ` Nick Roberts
  0 siblings, 0 replies; 44+ messages in thread
From: Nick Roberts @ 2004-07-15 20:02 UTC (permalink / raw)


On Thu, 15 Jul 2004 20:18:35 +0100, Nick Roberts <nick.roberts@acm.org>  
wrote:

> insert:
>
>     if End_of_Line(File) then
>        Append( text, Ada.Characters.Latin_1.NUL );
>        Skip_Line(File);
>     end if;
>
> between the Get and the append.

Sorry, I should have said /before/ the Get.

-- 
Nick Roberts



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: reading a text file into a string
  2004-07-15 17:27 reading a text file into a string zork
                   ` (2 preceding siblings ...)
  2004-07-15 19:18 ` Nick Roberts
@ 2004-07-16  1:23 ` Jeffrey Carter
  2004-07-16  2:20 ` Steve
  2004-07-16  2:26 ` Steve
  5 siblings, 0 replies; 44+ messages in thread
From: Jeffrey Carter @ 2004-07-16  1:23 UTC (permalink / raw)

zork wrote:

> while not end_of_file ( File ) loop
>     Get ( File, c );
>     append ( text, c );
> end loop;

This will work. As others have pointed out, Get skips line terminators. 
I'll assume you're not interested in them.

A "better" way to do this is to use Get_Line:

Line : String (1 .. Max);
Last : Natural;
...
Read : loop
    exit Read when End_Of_File (File);

    Get_Line (File => File, Item => Line, Last => Last);
    Append (Text, Line (1 .. Last) );
end loop Read;

Get_Line returns when the String (Line) is filled (in which case Last = 
Max) or a line terminator is encountered (in which case Last < Max), 
whichever comes first; if a line terminator is encountered, it is skipped.

You can also use a function such as PragmARC.Get_Line, which reads an 
entire line and skips the line terminator:

Read : loop
    exit Read when End_Of_File (File);

    Append (Text, PragmARC.Get_Line (File) );
end loop Read;

This is especially convenient if you want to add a special Character to 
indicate line terminators:

Append (Text, PragmARC.Get_Line (File) & EOT);

-- 
Jeff Carter
"The time has come to act, and act fast. I'm leaving."
Blazing Saddles
36

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: reading a text file into a string
  2004-07-15 17:27 reading a text file into a string zork
                   ` (3 preceding siblings ...)
  2004-07-16  1:23 ` Jeffrey Carter
@ 2004-07-16  2:20 ` Steve
  2004-07-16  2:26 ` Steve
  5 siblings, 0 replies; 44+ messages in thread
From: Steve @ 2004-07-16  2:20 UTC (permalink / raw)



"zork" <zork@nospam.com> wrote in message news:40f6bf21@dnews.tpgi.com.au...
> hi, i would like to read a whole text file into a string. I thought of
using
> an unbounded_string for this:
>
> ----------
> c, char: character;
> text : unbounded_string;
> ...
> -- read in text file
> while not end_of_file ( File ) loop
>     Get ( File, c );
>     append ( text, c );
> end loop;
> ...
> put ( to_string ( text ) ); -- display content of unbounded_string
>
> -- process text
> for i in 1 .. length ( text ) loop
>   char := element ( text, i );
>   ...
> end loop;
> -----------
>
> ... is this the general way of going about it? or is there a more prefered
> method of reading in a whole text file (into whatever format) for
> processing?
>
> Thanks again!
>
> cheers,
> zork
>
>





^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: reading a text file into a string
  2004-07-15 17:27 reading a text file into a string zork
                   ` (4 preceding siblings ...)
  2004-07-16  2:20 ` Steve
@ 2004-07-16  2:26 ` Steve
  2004-07-16 16:16   ` Jeffrey Carter
  2004-07-16 21:19   ` Randy Brukardt
  5 siblings, 2 replies; 44+ messages in thread
From: Steve @ 2004-07-16  2:26 UTC (permalink / raw)


Sorry for the blank reply (obese finger)

The easiest way to read an entire file into a string, if you're looking for
speed, is to
use Ada.Direct_Io;

Here is a small working example:

with Ada.Direct_Io;
with Ada.Text_Io;

procedure Demo is

    function Get_File_Size( file_name : String ) return Natural is
        package Direct_Io_Char_File is new Ada.Direct_IO( Character );
        use Direct_Io_Char_File;
        size_file : File_Type;
        result : Natural;
    begin
        Open( size_file, In_File, file_name );
        result := Natural( Size( size_file ) );
        Close( size_file );
        return result;
    end Get_File_Size;

begin
    declare
        File_Size : Natural := Get_File_Size( "demo.txt" );
        subtype File_String is String( 1 .. File_Size );
        package File_Reader is new Ada.Direct_Io( File_String );
        in_file : File_Reader.File_Type;
        file_data : File_String;
    begin
        File_Reader.Open( in_file, File_Reader.In_File, "demo.txt" );
        File_Reader.Read( in_file, file_data );
        File_Reader.Close( in_file );
        -- Do what you will with file_data
        Ada.Text_Io.Put( file_data );
    end;
end Demo;

The basic idea is to first create an instance of direct_io for characters,
just to use the
"size" function to find out how many characters are in the file.  Then
create an instance
of direct_io for a string of the same length as the file, and do one read.

This gives you the content of the file as one raw string.  Which may or may
not be
what you want, but it is what you asked for.

Steve
(The Duck)

"zork" <zork@nospam.com> wrote in message news:40f6bf21@dnews.tpgi.com.au...
> hi, i would like to read a whole text file into a string. I thought of
using
> an unbounded_string for this:
>
> ----------
> c, char: character;
> text : unbounded_string;
> ...
> -- read in text file
> while not end_of_file ( File ) loop
>     Get ( File, c );
>     append ( text, c );
> end loop;
> ...
> put ( to_string ( text ) ); -- display content of unbounded_string
>
> -- process text
> for i in 1 .. length ( text ) loop
>   char := element ( text, i );
>   ...
> end loop;
> -----------
>
> ... is this the general way of going about it? or is there a more prefered
> method of reading in a whole text file (into whatever format) for
> processing?
>
> Thanks again!
>
> cheers,
> zork
>
>





^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: reading a text file into a string
  2004-07-16  2:26 ` Steve
@ 2004-07-16 16:16   ` Jeffrey Carter
  2004-07-16 17:45     ` Nick Roberts
  2004-07-16 21:19   ` Randy Brukardt
  1 sibling, 1 reply; 44+ messages in thread
From: Jeffrey Carter @ 2004-07-16 16:16 UTC (permalink / raw)

Steve wrote:

> This gives you the content of the file as one raw string.  Which may
> or may not be what you want, but it is what you asked for.

Right, including line terminators, page terminators, and file 
terminators, if they exist, and which vary from system to system (under 
UNIX, lines are terminated by LF; under DOS/Windows, by CR LF; perhaps 
someone can comment on what happens with this approach under VMS). 
Therefore, this approach is rarely used for production programs, which 
usually want a platform-independent representation of the file.

-- 
Jeff Carter
"If you think you got a nasty taunting this time,
you ain't heard nothing yet!"
Monty Python and the Holy Grail
23

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: reading a text file into a string
  2004-07-16 16:16   ` Jeffrey Carter
@ 2004-07-16 17:45     ` Nick Roberts
  0 siblings, 0 replies; 44+ messages in thread
From: Nick Roberts @ 2004-07-16 17:45 UTC (permalink / raw)

On Fri, 16 Jul 2004 16:16:33 GMT, Jeffrey Carter <spam@spam.com> wrote:

> Steve wrote:
>
>> This gives you the content of the file as one raw string.  Which
>> may or may not be what you want, but it is what you asked for.
>
> Right, including line terminators, page terminators, and file  
> terminators, if they exist, and which vary from system to system
> (under UNIX, lines are terminated by LF; under DOS/Windows, by CR
> LF; perhaps someone can comment on what happens with this approach
> under VMS).  Therefore, this approach is rarely used for
> production programs, which usually want a platform-independent
> representation of the file.

Not to mention other differences in basic file format and character  
encoding. On some systems, trying this will produce complete and utter  
rubbish, and on some systems it will fail (with an exception) when you try  
to open the file (because text and binary files are immiscible). I think  
the latter is the case on VMS with DEC Ada.

-- 
Nick Roberts

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: reading a text file into a string
  2004-07-16  2:26 ` Steve
  2004-07-16 16:16   ` Jeffrey Carter
@ 2004-07-16 21:19   ` Randy Brukardt
  2004-07-17  2:27     ` Robert I. Eachus
  1 sibling, 1 reply; 44+ messages in thread
From: Randy Brukardt @ 2004-07-16 21:19 UTC (permalink / raw)


"Steve" <nospam_steved94@comcast.net> wrote in message
news:E1HJc.101277$Oq2.96646@attbi_s52...
> Sorry for the blank reply (obese finger)
>
> The easiest way to read an entire file into a string, if you're looking
for
> speed, is to
> use Ada.Direct_Io;
...

That's how you'd do that in Ada 83, but in Ada 95 you could do the same with
Stream_IO, without needing the weird instantiations. (Yes, you'd be assuming
that Stream_Element'Size = Character'Size, but since this technique really
only works on Windows and Unix anyway [as noted by others], that's not an
issue.)

                        Randy.






^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: reading a text file into a string
  2004-07-16 21:19   ` Randy Brukardt
@ 2004-07-17  2:27     ` Robert I. Eachus
  2004-07-17 11:31       ` Mats Weber
                         ` (2 more replies)
  0 siblings, 3 replies; 44+ messages in thread
From: Robert I. Eachus @ 2004-07-17  2:27 UTC (permalink / raw)


There have been a lot of useful tips in this thread on how to accomplish 
the stated goal.  But what is really missing is a discussion of HOW a 
newbie should decide what he actually wants to do.

What you have to do is refine your requirements, and that can be the 
most important, and most time consuming step when programming in Ada.

I usually state it as Ada is much better at doing what you tell it to do 
than other languages.  But it is like a four-year old child, always 
asking, "Why?"

So before you decide whether to represent line-breaks with nulls, 
linefeeds, or copy the existing characters exactly, you have to know the 
answer to the "Why?" question.  In this case, "Why are you reading the 
file?"

Once you know whether you need a bitwise copy of the file, to parse the 
text and reformat it, or merely to scan through the contents of the 
file, then you can decide the right way to read the file.  I usually 
find that when I have though about it enough, I want to do line at a 
time processing, rather than character at a time, or reading the entire 
file in one gulp.

For this reason, I find myself contructing or using a Get_Line FUNCTION 
inside a loop and a declare block:

while not End_of_Line(Somefile) loop
   declare
      Buffer: String := Get_Line(Somefile);
   begin
      -- process buffer
   exception
      ...
   end;
end loop;

Each iteration of the loop, the Buffer contains a CONSTANT String, but 
it is potentially different in length and content every time through.

Incidently, GNAT has a special package to allow you to do a Get_Line 
into an Unbounded_String, no matter how long.  I think I posted a 
"clever example" of how to do it here, and if you need it I can find it 
again.  (The code is an elegant example of the use of recursion.  Using 
the GNAT equivalent is better performance-wise if you really are reading 
multi-megabyte lines.)

-- 

                                           Robert I. Eachus

"The flames kindled on the Fourth of July, 1776, have spread over too 
much of the globe to be extinguished by the feeble engines of despotism; 
on the contrary, they will consume these engines and all who work them." 
-- Thomas Jefferson, 1821




^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: reading a text file into a string
  2004-07-17  2:27     ` Robert I. Eachus
@ 2004-07-17 11:31       ` Mats Weber
  2004-07-17 15:52         ` Robert I. Eachus
  2004-07-19  8:07       ` Dale Stanbrough
  2004-07-19 11:51       ` Ada2005 (was " Peter Hermann
  2 siblings, 1 reply; 44+ messages in thread
From: Mats Weber @ 2004-07-17 11:31 UTC (permalink / raw)


In article <fOednXzORfHlE2Xd4p2dnA@comcast.com>,
 "Robert I. Eachus" <rieachus@comcast.net> wrote:

>while not End_of_Line(Somefile) loop
>   declare
>      Buffer: String := Get_Line(Somefile);
>   begin
>      -- process buffer
>   exception
>      ...
>   end;
>end loop;
>
>Each iteration of the loop, the Buffer contains a CONSTANT String, but

It's constant only if you declare it constant, as in

       Buffer: constant String := Get_Line(Somefile);
 
>it is potentially different in length and content every time through.



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: reading a text file into a string
  2004-07-17 11:31       ` Mats Weber
@ 2004-07-17 15:52         ` Robert I. Eachus
  2004-07-17 22:38           ` Jeffrey Carter
  0 siblings, 1 reply; 44+ messages in thread
From: Robert I. Eachus @ 2004-07-17 15:52 UTC (permalink / raw)

Mats Weber wrote:

>>Each iteration of the loop, the Buffer contains a CONSTANT String, but
> 
> 
> It's constant only if you declare it constant, as in
> 
>        Buffer: constant String := Get_Line(Somefile);
>  
> 
>>it is potentially different in length and content every time through.

When I woke up this morning my mind told me I'd goofed. What it really 
said was something like, "You IDIOT, you left out the word length, and 
worse you emphasized the wrong word."  My brain is pretty nasty before 
it gets its morning fix of caffine. ;-)

I meant to say "the Buffer contains a constant LENGTH String..."  In Ada 
83 you had to declare the String a constant for this idiom to work, but 
that wasn't what I was trying to say.  The magic is that each time 
through the loop the buffer is exactly the right size to hold the line. 
  If you need to be able to change the length of the buffer though, you 
have to use Unbounded_String.

-- 

                                           Robert I. Eachus

"The flames kindled on the Fourth of July, 1776, have spread over too 
much of the globe to be extinguished by the feeble engines of despotism; 
on the contrary, they will consume these engines and all who work them." 
-- Thomas Jefferson, 1821

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: reading a text file into a string
  2004-07-17 15:52         ` Robert I. Eachus
@ 2004-07-17 22:38           ` Jeffrey Carter
  2004-07-18 13:44             ` zork
  0 siblings, 1 reply; 44+ messages in thread
From: Jeffrey Carter @ 2004-07-17 22:38 UTC (permalink / raw)


Robert I. Eachus wrote:

> I meant to say "the Buffer contains a constant LENGTH String..."  In Ada 
> 83 you had to declare the String a constant for this idiom to work, but 
> that wasn't what I was trying to say.  The magic is that each time 
> through the loop the buffer is exactly the right size to hold the line. 
>  If you need to be able to change the length of the buffer though, you 
> have to use Unbounded_String.

In Ada 83, I often did something like

Buffer_C : constant String := Some_Function;
Buffer : String (1 .. Buffer_C'Length) := Buffer_C;

so I could modify Buffer, and hoped that the compiler would be smart 
enough to only keep one copy of the string.

-- 
Jeff Carter
"Nobody expects the Spanish Inquisition!"
Monty Python's Flying Circus
22




^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: reading a text file into a string
  2004-07-17 22:38           ` Jeffrey Carter
@ 2004-07-18 13:44             ` zork
  0 siblings, 0 replies; 44+ messages in thread
From: zork @ 2004-07-18 13:44 UTC (permalink / raw)


Thanks everyone for your help on this one. I wasn't too worried about not
being able to read in newlines. Everyone has been really helpful. This is a
great board! Really appreciate the responses :)

cheers
zork





^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: reading a text file into a string
  2004-07-17  2:27     ` Robert I. Eachus
  2004-07-17 11:31       ` Mats Weber
@ 2004-07-19  8:07       ` Dale Stanbrough
  2004-07-19  8:58         ` Martin Dowie
  2004-07-19 11:51       ` Ada2005 (was " Peter Hermann
  2 siblings, 1 reply; 44+ messages in thread
From: Dale Stanbrough @ 2004-07-19  8:07 UTC (permalink / raw)


Robert I. Eachus wrote:


> 
> For this reason, I find myself contructing or using a Get_Line FUNCTION 
> inside a loop and a declare block:
> 
> while not End_of_Line(Somefile) loop
>    declare
>       Buffer: String := Get_Line(Somefile);
>    begin
>       -- process buffer
>    exception
>       ...
>    end;
> end loop;


I use a generic procedure that has a process procedure as a parameter.
It gets called with each line of the string...

--  Apply the procedure Process to each line of the file
--  This allows for very simple file processing, with all of the
--  control bits (not much really) hidden away.
--
--  Each line is read from the file, and then passed to the 
--  procedure
--  The maximum line size for the file is 1000 chars.
--
--  Typical use is
--
--    with Ada.Text_IO; use Ada.Text_IO;
--    with Ada.Integer_Text_IO; use Ada.Integer_Text_IO;
--
--    with Process_File;
--
--    procedure Count_Chars is
--
--       Count : Natural := 0;
--
--       procedure Count_Letters (Item : String) is
--       begin
--          Count := Count + Item'Length;
--       end;
--
--       procedure Count_Em is
--          new Process_File (Process => Count_Letters);
--   begin
--      Count_Em (<Somefilename>);
--      Put ("There are ..."); Put (Count); Put (" characters");
--   end;
--

generic
   with procedure Process (Line : String);
   Max_Line_Size : Positive := 1000;
   -- The maximum number of characters on any one line

procedure Process_File (Filename : String);


-----------------------------------------------------

it presumes a maximum line length, which is not so great, but
is otherwise a very convenient generic.

Dale

-- 
dstanbro@spam.o.matic.bigpond.net.au



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: reading a text file into a string
  2004-07-19  8:07       ` Dale Stanbrough
@ 2004-07-19  8:58         ` Martin Dowie
  2004-07-21  0:17           ` Robert I. Eachus
  0 siblings, 1 reply; 44+ messages in thread
From: Martin Dowie @ 2004-07-19  8:58 UTC (permalink / raw)


Dale Stanbrough wrote:
> generic
>    with procedure Process (Line : String);
>    Max_Line_Size : Positive := 1000;
>    -- The maximum number of characters on any one line
>
> procedure Process_File (Filename : String);
>
>
> -----------------------------------------------------
>
> it presumes a maximum line length, which is not so great, but
> is otherwise a very convenient generic.

But you could always override the default of 1000 with a more appropriate
value if you find it necessary.

Isn't there an arguement for defaulting to 250 characters/line? Can't
remember what it was off the top of my head but it is ringing a bell...





^ permalink raw reply	[flat|nested] 44+ messages in thread

* Ada2005 (was Re: reading a text file into a string
  2004-07-17  2:27     ` Robert I. Eachus
  2004-07-17 11:31       ` Mats Weber
  2004-07-19  8:07       ` Dale Stanbrough
@ 2004-07-19 11:51       ` Peter Hermann
  2004-07-19 12:51         ` Dmitry A. Kazakov
  2004-07-19 13:01         ` Nick Roberts
  2 siblings, 2 replies; 44+ messages in thread
From: Peter Hermann @ 2004-07-19 11:51 UTC (permalink / raw)


Robert I. Eachus <rieachus@comcast.net> wrote:
> For this reason, I find myself contructing or using a Get_Line FUNCTION 
> inside a loop and a declare block:
> 
> while not End_of_Line(Somefile) loop
>    declare
>       Buffer: String := Get_Line(Somefile);
>    begin
>       -- process buffer
>    exception
>       ...
>    end;
> end loop;

There is no compelling reason why such a FUNCTION get_line
should not be in package specification Ada.text_io of Ada2005.
Or did I miss something?


-- 
--Peter Hermann(49)0711-685-3611 fax3758 ica2ph@csv.ica.uni-stuttgart.de
--Pfaffenwaldring 27 Raum 114, D-70569 Stuttgart Uni Computeranwendungen
--http://www.csv.ica.uni-stuttgart.de/homes/ph/
--Team Ada: "C'mon people let the world begin" (Paul McCartney)



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Ada2005 (was Re: reading a text file into a string
  2004-07-19 11:51       ` Ada2005 (was " Peter Hermann
@ 2004-07-19 12:51         ` Dmitry A. Kazakov
  2004-07-19 13:01         ` Nick Roberts
  1 sibling, 0 replies; 44+ messages in thread
From: Dmitry A. Kazakov @ 2004-07-19 12:51 UTC (permalink / raw)


On Mon, 19 Jul 2004 11:51:52 +0000 (UTC), Peter Hermann wrote:

> Robert I. Eachus <rieachus@comcast.net> wrote:
>> For this reason, I find myself contructing or using a Get_Line FUNCTION 
>> inside a loop and a declare block:
>> 
>> while not End_of_Line(Somefile) loop
>>    declare
>>       Buffer: String := Get_Line(Somefile);
>>    begin
>>       -- process buffer
>>    exception
>>       ...
>>    end;
>> end loop;
> 
> There is no compelling reason why such a FUNCTION get_line
> should not be in package specification Ada.text_io of Ada2005.

It would be nice.

> Or did I miss something?

In Ada community there is a strong resistance against functions having
side-effects, even when side-effect is hidden in an *in* File_Type
parameter. A counter example would be:

   What_Is_This : String := Get_Line (File) & Get_Line (File);

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Ada2005 (was Re: reading a text file into a string
  2004-07-19 11:51       ` Ada2005 (was " Peter Hermann
  2004-07-19 12:51         ` Dmitry A. Kazakov
@ 2004-07-19 13:01         ` Nick Roberts
  2004-07-19 13:35           ` Martin Dowie
  2004-07-19 23:50           ` Randy Brukardt
  1 sibling, 2 replies; 44+ messages in thread
From: Nick Roberts @ 2004-07-19 13:01 UTC (permalink / raw)


On Mon, 19 Jul 2004 11:51:52 +0000 (UTC), Peter Hermann  
<ica2ph@sinus.csv.ica.uni-stuttgart.de> wrote:

> ...
> There is no compelling reason why such a FUNCTION get_line
> should not be in package specification Ada.text_io of Ada2005.
> Or did I miss something?

AI95-301 suggests: I/O operations on unbounded strings are provided
in a new child package of Ada.Text_IO.

But I'm not sure if this one will get in.

-- 
Nick Roberts



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Ada2005 (was Re: reading a text file into a string
  2004-07-19 13:01         ` Nick Roberts
@ 2004-07-19 13:35           ` Martin Dowie
  2004-07-19 17:22             ` Nick Roberts
  2004-07-19 23:50           ` Randy Brukardt
  1 sibling, 1 reply; 44+ messages in thread
From: Martin Dowie @ 2004-07-19 13:35 UTC (permalink / raw)


Nick Roberts wrote:
> On Mon, 19 Jul 2004 11:51:52 +0000 (UTC), Peter Hermann
> <ica2ph@sinus.csv.ica.uni-stuttgart.de> wrote:
>
>> ...
>> There is no compelling reason why such a FUNCTION get_line
>> should not be in package specification Ada.text_io of Ada2005.
>> Or did I miss something?
>
> AI95-301 suggests: I/O operations on unbounded strings are provided
> in a new child package of Ada.Text_IO.
>
> But I'm not sure if this one will get in.

Its current state is "Amendment 200Y", so I'd imagine its chances are "quite
good"! :-)






^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Ada2005 (was Re: reading a text file into a string
  2004-07-19 13:35           ` Martin Dowie
@ 2004-07-19 17:22             ` Nick Roberts
  0 siblings, 0 replies; 44+ messages in thread
From: Nick Roberts @ 2004-07-19 17:22 UTC (permalink / raw)


On Mon, 19 Jul 2004 14:35:30 +0100, Martin Dowie  
<martin.dowie@baesystems.com> wrote:

>> AI95-301 suggests: I/O operations on unbounded strings are
>> provided in a new child package of Ada.Text_IO.
>>
>> But I'm not sure if this one will get in.
>
> Its current state is "Amendment 200Y", so I'd imagine its
> chances are "quite good"! :-)

Hooray! I hope it does.

I notice this amendment does also include a string function
to get a line of text. Although I usually dislike functions
with side effects (in a procedural language), I think this
one makes sense.

-- 
Nick Roberts



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Ada2005 (was Re: reading a text file into a string
  2004-07-19 13:01         ` Nick Roberts
  2004-07-19 13:35           ` Martin Dowie
@ 2004-07-19 23:50           ` Randy Brukardt
  1 sibling, 0 replies; 44+ messages in thread
From: Randy Brukardt @ 2004-07-19 23:50 UTC (permalink / raw)


"Nick Roberts" <nick.roberts@acm.org> wrote in message
news:opsbdygatdp4pfvb@bram-2...
> On Mon, 19 Jul 2004 11:51:52 +0000 (UTC), Peter Hermann
> <ica2ph@sinus.csv.ica.uni-stuttgart.de> wrote:
>
> > ...
> > There is no compelling reason why such a FUNCTION get_line
> > should not be in package specification Ada.text_io of Ada2005.
> > Or did I miss something?
>
> AI95-301 suggests: I/O operations on unbounded strings are provided
> in a new child package of Ada.Text_IO.
>
> But I'm not sure if this one will get in.

It was approved by WG9 at the June meeting, so it's in at this time. Of
course, things remain subject to change because of integration issues, but I
wouldn't expect the string functions to need modifications. So it's probably
going to be in the Amendment.

                      Randy.








^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: reading a text file into a string
  2004-07-19  8:58         ` Martin Dowie
@ 2004-07-21  0:17           ` Robert I. Eachus
  2004-07-21 21:39             ` Randy Brukardt
  0 siblings, 1 reply; 44+ messages in thread
From: Robert I. Eachus @ 2004-07-21  0:17 UTC (permalink / raw)

Martin Dowie wrote:

> Isn't there an arguement for defaulting to 250 characters/line? Can't
> remember what it was off the top of my head but it is ringing a bell...

There is, and it is probably worth spelling out for programmers...

If you are buffering lines and want to avoid unintentional cache 
pressure, you should try to force buffers into a single cache line if 
possible.  But what size to use to do that?

Well first of all, Intel CPUs tend to have a 256-byte cache line.  If a 
cache read is in progress when another read occurs, the CPU may cut the 
read off at 128 bytes.  AMD processors (Athlon, Opteron, etc.) have 
64-byte cache lines, but will normally read two lines on any memory 
read.  So Intel asks for 256 bytes, but may take 128, and AMD asks for 
128 but may take 64.  So 128 or 256-bytes is a good size for objects to 
be fit in a single cache line.

However, in Ada a String will have a descriptor associated with it. 
Also when a line is stored in a file, it will usually have some sort of 
decoration, either a length field, a line terminator, or an appended null.

So what would like to do is:

subtype Index is Integer range 0..250;
type Buffer (Length: Index := 0) is record
    Contents: String(1..Length);
end record;

for Buffer'Alignment use 256;  -- or 128 on Athlons. ;-)

Language lawyer note:  RM 13.3(32) says: "An implementation need not 
support specified Alignments that are greater than the maximum Alignment 
the implementation ever returns by default."  This applies to subtypes, 
for stand-alone objects RM 13.3(35) says: "For stand-alone library-level 
objects of statically constrained subtypes, the implementation should 
support all Alignments supported by the target linker. For example, page 
alignment is likely to be supported for such objects, but not for subtypes."

So in theory you may need to put the 'Alignment clause on buffer objects 
instead.  GNAT however, won't even recognize those alignment clauses, 
which IMHO is a shame:
------------------------------------------------------
package Test_Align is

    subtype Index is Integer range 0..250;

    type Buffer (Length: Index := 0) is record
      Contents: String(1..Length);
    end record;

    Buff: Buffer;
    for Buff'Alignment use 256;

end Test_Align;
-------------------------------------------------------
gnatmake test_align
gcc -c test_align.ads
test_align.ads:10:27: largest supported alignment for "Buff" is 4
gnatmake: "test_align.ads" compilation error

When the programmer can do something simple like this to improve program 
performance, it should be supported by all compilers.  (Notice that this 
is an error, not a warning.) I can see not supporting the subtype case 
due to the (potential) requirement for either large stack frames or 
varying size stack frames.  But for an object that can be allocated at 
link time, I don't see why it shouldn't be supported.

-- 

                                           Robert I. Eachus

"The flames kindled on the Fourth of July, 1776, have spread over too 
much of the globe to be extinguished by the feeble engines of despotism; 
on the contrary, they will consume these engines and all who work them." 
-- Thomas Jefferson, 1821

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: reading a text file into a string
  2004-07-21  0:17           ` Robert I. Eachus
@ 2004-07-21 21:39             ` Randy Brukardt
  2004-07-22 22:34               ` Robert I. Eachus
  0 siblings, 1 reply; 44+ messages in thread
From: Randy Brukardt @ 2004-07-21 21:39 UTC (permalink / raw)


"Robert I. Eachus" <rieachus@comcast.net> wrote in message
news:nM-dnegXLbmdK2DdRVn-hQ@comcast.com...
...
> When the programmer can do something simple like this to improve program
> performance, it should be supported by all compilers.  (Notice that this
> is an error, not a warning.) I can see not supporting the subtype case
> due to the (potential) requirement for either large stack frames or
> varying size stack frames.  But for an object that can be allocated at
> link time, I don't see why it shouldn't be supported.

What's an object allocated at link time? I don't know of any such thing (you
can allocate segments at link time, but the number of those is quite
limited).

Similarly, do you know of *any* compiler for *any* language that supports
256 byte alignment? I don't, at least on Windows. As far as I know, the
largest alignment on Windows is paragraph. (There may be choices for larger
alignments in the linker structures, but I would guess that if they are't
used by the C compiler, they don't work. That's certainly been our
experience with linkers on Windows, SunOS, SCO Unix, the U2000, etc.
Virtually nothing we tried would work until we duplicated precisely what the
local C compiler generated. Then all is fine...)

                       Randy.









^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: reading a text file into a string
  2004-07-21 21:39             ` Randy Brukardt
@ 2004-07-22 22:34               ` Robert I. Eachus
  2004-07-23  0:49                 ` Randy Brukardt
  0 siblings, 1 reply; 44+ messages in thread
From: Robert I. Eachus @ 2004-07-22 22:34 UTC (permalink / raw)

Randy Brukardt wrote:

> "Robert I. Eachus" <rieachus@comcast.net> wrote in message
> news:nM-dnegXLbmdK2DdRVn-hQ@comcast.com...
> ...
> 
>>When the programmer can do something simple like this to improve program
>>performance, it should be supported by all compilers.  (Notice that this
>>is an error, not a warning.) I can see not supporting the subtype case
>>due to the (potential) requirement for either large stack frames or
>>varying size stack frames.  But for an object that can be allocated at
>>link time, I don't see why it shouldn't be supported.
> 
> 
> What's an object allocated at link time? I don't know of any such thing (you
> can allocate segments at link time, but the number of those is quite
> limited).

Um, a compiler can (but most Ada compilers don't) allocate objects in 
library packages on the heap, in static storage, or even in code 
segments.  But you are right that this is very definitely not the usual 
in x86 compilers and environments.

> Similarly, do you know of *any* compiler for *any* language that supports
> 256 byte alignment? I don't, at least on Windows.

You are probably correct with regards to Windows.  I do know of 
compilers that do support such alignments, but only for supercomputers. 
  With the rate at which x86 chips are taking over the supercomputer 
market though, I'll have to check.

But the real reason I posted all this is that Ada compilers for x86, 
including x86 Windows SHOULD support this alignment, even if it is 
relatively painful to do so. (Painful in terms of gaps in the stack, or 
doing the extra effort required when allocating space on the heap.)  The 
case of a String buffer when reading files is a perfect case in point. 
If the buffer is 128 (AMD) or 256 (Intel) byte aligned when reading from 
a memory-mapped file, you will reduce the number of cache line misses 
during the execution of the program.  (If the buffer is in a single 
cache line, then that line will stay resident in L1 cache.  If the 
buffer is distributed over two (Intel) or more (AMD) cache lines, the 
lines that are not referenced every line may get paged out.

This is much more likely on an Intel CPU, and is potentially much more 
painful when it happens.  The 'exclusive' cache feature on AMD 
processors means that if the line gets replaced in L1, it will be copied 
back to L2.  So it takes two cache line replacements to push the line 
out of cache entirely.  With Intel the line can get moved to the much 
smaller L1 cache, then overwritten in L2 cache.  When it is overwritten 
in L1, then it will have to be pulled in from main memory next time around.

We may only be talking say, a 2 or 3% slowdown from such a misaligned 
text buffer.  Not really worth going to all the trouble to align buffers 
for casual programming.  But when I am working on a linear algebra code, 
I do go to the effort where it does pay off.  (For example, if you 
represent the basis in a linear programming subroutine as a matrix and a 
series of pivots, it makes for a significant improvement in performance 
to start the pivot rows on the correct cache line boundary.)

-- 

                                           Robert I. Eachus

"The flames kindled on the Fourth of July, 1776, have spread over too 
much of the globe to be extinguished by the feeble engines of despotism; 
on the contrary, they will consume these engines and all who work them." 
-- Thomas Jefferson, 1821

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: reading a text file into a string
  2004-07-22 22:34               ` Robert I. Eachus
@ 2004-07-23  0:49                 ` Randy Brukardt
  2004-07-23 21:56                   ` Nick Roberts
  2004-07-24  2:56                   ` Robert I. Eachus
  0 siblings, 2 replies; 44+ messages in thread
From: Randy Brukardt @ 2004-07-23  0:49 UTC (permalink / raw)

"Robert I. Eachus" <rieachus@comcast.net> wrote in message
news:OrednWv2_cdw3Z3cRVn-uQ@comcast.com...
> Randy Brukardt wrote:
...
> > Similarly, do you know of *any* compiler for *any* language that
supports
> > 256 byte alignment? I don't, at least on Windows.
>
> You are probably correct with regards to Windows.  I do know of
> compilers that do support such alignments, but only for supercomputers.
>   With the rate at which x86 chips are taking over the supercomputer
> market though, I'll have to check.
>
> But the real reason I posted all this is that Ada compilers for x86,
> including x86 Windows SHOULD support this alignment, even if it is
> relatively painful to do so. (Painful in terms of gaps in the stack, or
> doing the extra effort required when allocating space on the heap.)  The
> case of a String buffer when reading files is a perfect case in point.
> If the buffer is 128 (AMD) or 256 (Intel) byte aligned when reading from
> a memory-mapped file, you will reduce the number of cache line misses
> during the execution of the program.  (If the buffer is in a single
> cache line, then that line will stay resident in L1 cache.  If the
> buffer is distributed over two (Intel) or more (AMD) cache lines, the
> lines that are not referenced every line may get paged out.

That could only be done at run-time, as you couldn't insure anything about
the alignment of the stack at compile-time. (That's probably why GNAT will
support only 4 byte alignment, which is about all you can guarentee.) So
you're asking to make subprogram linkage more expensive, to make heap
allocation more expensive, and probably to use indirect access to statically
allocated objects (in order to align the starting address). I don't doubt
that there are cases where you might gain a tiny bit of performance from
doing so, but it seems a large burden on all of the users to insist on it.

Indeed, it would make the most sense to allocate such objects from a storage
pool (with enough extra memory to support the alignment); align the
resulting address, and use an address clause to force the object to use that
memory. That would get the performance benefit in the rare case where it
would help without costing anything to implementors or to users of programs
that don't need the alignment.

                             Randy.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: reading a text file into a string
  2004-07-23  0:49                 ` Randy Brukardt
@ 2004-07-23 21:56                   ` Nick Roberts
  2004-07-24  0:34                     ` tmoran
  2004-07-24  1:42                     ` Randy Brukardt
  2004-07-24  2:56                   ` Robert I. Eachus
  1 sibling, 2 replies; 44+ messages in thread
From: Nick Roberts @ 2004-07-23 21:56 UTC (permalink / raw)

On Thu, 22 Jul 2004 19:49:34 -0500, Randy Brukardt <randy@rrsoftware.com>  
wrote:

> ...
> That could only be done at run-time, as you couldn't insure anything  
> about the alignment of the stack at compile-time. (That's probably
> why GNAT will support only 4 byte alignment, which is about all you
> can guarentee.) So you're asking to make subprogram linkage more
> expensive, to make heap allocation more expensive, and probably to
> use indirect access to statically allocated objects (in order to align
> the starting address). I don't doubt that there are cases where you
> might gain a tiny bit of performance from doing so, but it seems a
> large burden on all of the users to insist on it.

Randy, this is weird. It is a well established technique for highly
optimising compilers to align things for cache efficiency. Good grief
there are whole books on the subject. Not only do they advocate the
possibility of aligning both basic blocks (code) and data objects on
cache-line boundaries, but they advocate that the compiler do it
automatically wherever possible.

It may be that there aren't any highly optimising Ada compilers
(yet ;-) but Robert is suggesting compilers /should/ support this
kind of alignment, so how can you disagree? Do you think all those
computer scientists have got it terribly wrong?

If you think having big 'gaps' is an efficiency concern, I think the
idea is that you fill in the gaps with smaller objects (or basic
blocks). If you are worried about the fact that all stacks and heaps/
pools must be cache-line aligned (32, 64 bytes?), you have missed the
RAM revolution that has been going on for the last two decades ;-)

My cheapo off-the-back-of-a-lorry PC has 1/2 GiB of RAM.

-- 
Nick Roberts

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: reading a text file into a string
  2004-07-23 21:56                   ` Nick Roberts
@ 2004-07-24  0:34                     ` tmoran
  2004-07-24  1:16                       ` Nick Roberts
  2004-07-24  1:42                     ` Randy Brukardt
  1 sibling, 1 reply; 44+ messages in thread
From: tmoran @ 2004-07-24  0:34 UTC (permalink / raw)


>RAM revolution that has been going on for the last two decades ;-)
>
>My cheapo off-the-back-of-a-lorry PC has 1/2 GiB of RAM.
  And how much fast memory (ie, cache) does it have?



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: reading a text file into a string
  2004-07-24  0:34                     ` tmoran
@ 2004-07-24  1:16                       ` Nick Roberts
  0 siblings, 0 replies; 44+ messages in thread
From: Nick Roberts @ 2004-07-24  1:16 UTC (permalink / raw)

On Sat, 24 Jul 2004 00:34:29 GMT, <tmoran@acm.org> wrote:

>> RAM revolution that has been going on for the last two
>> decades ;-)
>>
>> My cheapo off-the-back-of-a-lorry PC has 1/2 GiB of RAM.

> And how much fast memory (ie, cache) does it have?

I'm not sure if you meant it, Tom, but that's precisely the
point.

Every machine has far less cache than main memory, but the
cache memory is much faster (and on an SMP machine, private
to each processor). So it can be a big advantage for the
compiler to generate code and object placements that make
optimum use of the cache (or to permit such placements).

-- 
Nick Roberts

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: reading a text file into a string
  2004-07-23 21:56                   ` Nick Roberts
  2004-07-24  0:34                     ` tmoran
@ 2004-07-24  1:42                     ` Randy Brukardt
  2004-07-24 15:14                       ` Nick Roberts
  1 sibling, 1 reply; 44+ messages in thread
From: Randy Brukardt @ 2004-07-24  1:42 UTC (permalink / raw)

"Nick Roberts" <nick.roberts@acm.org> wrote in message
news:opsbl1vsgsp4pfvb@bram-2...
> On Thu, 22 Jul 2004 19:49:34 -0500, Randy Brukardt <randy@rrsoftware.com>
> wrote:
>
> > ...
> > That could only be done at run-time, as you couldn't insure anything
> > about the alignment of the stack at compile-time. (That's probably
> > why GNAT will support only 4 byte alignment, which is about all you
> > can guarentee.) So you're asking to make subprogram linkage more
> > expensive, to make heap allocation more expensive, and probably to
> > use indirect access to statically allocated objects (in order to align
> > the starting address). I don't doubt that there are cases where you
> > might gain a tiny bit of performance from doing so, but it seems a
> > large burden on all of the users to insist on it.
>
> Randy, this is weird. It is a well established technique for highly
> optimising compilers to align things for cache efficiency. Good grief
> there are whole books on the subject. Not only do they advocate the
> possibility of aligning both basic blocks (code) and data objects on
> cache-line boundaries, but they advocate that the compiler do it
> automatically wherever possible.

Well, first of all, books don't necessarily equal practice. If aligning
things causes a program to use more pages, it can make it run slower,
because it makes it load code from disk more frequently. (And if you think
that everything is always in main memory, you forget one of the primary
rules of computing: programs and data always expand to fill - and overfill -
available resources).

Anyway, I wasn't arguing that alignment per-se is a bad idea. We do it on
integers, for instance, and I think that virtually all compilers do that. I
was arguing that on the x86, stack alignments beyond 4 can only be done at
run-time. (Unless *all* software in the system in under your control, and
there are no interrupts/signals on your stack -- never true in practice.)
That's a distributed penalty that gets paid everywhere. Similarly, existing
Windows linkers don't support alignments beyond 16 to my knowledge -- so
again you would have to do something at runtime with a penalty. In both
cases, the penalty might very well cost more than the time savings possible.

Given there is a penalty, doing alignments automatically is a bad idea.

> If you think having big 'gaps' is an efficiency concern, I think the
> idea is that you fill in the gaps with smaller objects (or basic
> blocks).

Last time I checked, Intel was recommending that labels in code not be
aligned further than 4 byte boundaries. I don't know precisely why they
recommended that, but I don't claim to know better than Intel!

> If you are worried about the fact that all stacks and heaps/
> pools must be cache-line aligned (32, 64 bytes?), you have missed the
> RAM revolution that has been going on for the last two decades ;-)

That's only possible if you build a new OS from the ground up. Stacks aren't
aligned in Windows or Linux. So you have a pay a penalty to make them so;
and because of interrupt handlers and the like, you can't even trust your
own stack. Heap allocations aren't aligned in Windows, either. (Although you
could build you own heap on top of the page management in Windows -- but you
better be prepared to allocate 64K at a time.) Again, you can fix this with
run-time overhead. But if you're willing to spend run-time overhead, an
address clause does the same thing without any work.

                  Randy.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: reading a text file into a string
  2004-07-23  0:49                 ` Randy Brukardt
  2004-07-23 21:56                   ` Nick Roberts
@ 2004-07-24  2:56                   ` Robert I. Eachus
  1 sibling, 0 replies; 44+ messages in thread
From: Robert I. Eachus @ 2004-07-24  2:56 UTC (permalink / raw)

Randy Brukardt wrote:

> Indeed, it would make the most sense to allocate such objects from a storage
> pool (with enough extra memory to support the alignment); align the
> resulting address, and use an address clause to force the object to use that
> memory. That would get the performance benefit in the rare case where it
> would help without costing anything to implementors or to users of programs
> that don't need the alignment.

You may have a winner Randy.  The problem (to me) is that the pain of 
aligning buffers 'by hand' is high enough that I only do it when it is 
necessary to get decent performance.  I am talking of cases where there 
is a 2x or 3x speedup if some objects are cache aligned.  For ordinary 
programs where there may be a 5 to 10% benefit for aligning a particular 
buffer, it is just too much work without compiler support.

But if I create a cache-aligned storage pool, where all objects are 
allocated on natural cache boundaries then I just need a couple extra 
lines to do the alignment.  Of course the buffers will have to be on the 
heap, I'm thinking about the right garbage collection approach to use...

-- 

                                           Robert I. Eachus

"The flames kindled on the Fourth of July, 1776, have spread over too 
much of the globe to be extinguished by the feeble engines of despotism; 
on the contrary, they will consume these engines and all who work them." 
-- Thomas Jefferson, 1821

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: reading a text file into a string
  2004-07-24  1:42                     ` Randy Brukardt
@ 2004-07-24 15:14                       ` Nick Roberts
  2004-07-26 23:48                         ` Randy Brukardt
  0 siblings, 1 reply; 44+ messages in thread
From: Nick Roberts @ 2004-07-24 15:14 UTC (permalink / raw)

On Fri, 23 Jul 2004 20:42:53 -0500, Randy Brukardt <randy@rrsoftware.com>  
wrote:

> ...
> Well, first of all, books don't necessarily equal practice.

In other words, you /are/ trying to say all those computer
scientists got it wrong ;-)

> If aligning things causes a program to use more pages, it
> can make it run slower, because it makes it load code from
> disk more frequently.

But we (Robert and I) are talking about using alignments
sparingly, to improve the efficiency of the speed-critical
parts of a program. Surely you've heard of the 80-20 rule?
(Which is, of course, silly, being the 99-1 rule in reality.)

> Anyway, I wasn't arguing that alignment per-se is a bad
> idea. We do it on integers, for instance, and I think that
> virtually all compilers do that.

> I was arguing that on the x86, stack alignments beyond 4
> can only be done at run-time. (Unless *all* software in
> the system in under your control, and there are no
> interrupts/signals on your stack -- never true in
> practice.)

But Randy, it you get a signal/interrupt on your stack, it
all happens on the top of your stack. It doesn't affect the
stack's alignment! Were you actually talking about
callbacks?

In any event, all the compiler has to do to align the stack
to 2^n bytes just prior to (parameter pushing and) subroutine
call is to emit:

    and esp, -2^n

et voila!

> That's a distributed penalty that gets paid everywhere.

No it isn't. Only in calling those subroutines which require
alignment, and even then the penalty is an 'and' instruction
which, as you know, can probably be scheduled to take zero
time on a superscalar target.

> Similarly, existing Windows linkers don't support alignments
> beyond 16 to my knowledge -- so again you would have to do
> something at runtime with a penalty.

But then the point is that the linkers /should/ support other
alignments. It's no good saying "Oh, we can't do that because
the linker doesn't support it!" Obviously, you need to change
the linker. It's called not letting the tail wag the dog :-)

> In both cases, the penalty might very well cost more than
> the time savings possible.

I think I've demonstrated that this is very unlikely.

> Given there is a penalty, doing alignments automatically is
> a bad idea.

All I can say is that, given that there /isn't/ a penalty,
doing (cache-line) alignments automatically is a /good/
idea :-)

> Last time I checked, Intel was recommending that labels in
> code not be aligned further than 4 byte boundaries.

The latest advice is:

    Loop entry labels should be 16-byte-aligned when less than
    eight bytes away from a 16-byte boundary.

    Labels that follow a conditional branch need not be aligned.

    Labels that follow an unconditional branch or function call
    should be 16-byte-aligned when less than eight bytes away
    from a 16-byte boundary.

    Use a compiler that will assure these rules are met for the
    generated code.

[Section 2, Intel Architecture Optimization Reference Manual,
Copyright (c) 1998, 1999 Intel Corporation All Rights Reserved
Issued in U.S.A., Order Number: 245127-001]

> I don't know precisely why they recommended that, but I don't
> claim to know better than Intel!

Well, I don't think they ever did; maybe you need to do some
re-reading.

>> If you are worried about the fact that all stacks and heaps/
>> pools must be cache-line aligned (32, 64 bytes?), you have
>> missed the RAM revolution that has been going on for the last
>> two decades ;-)
>
> That's only possible if you build a new OS from the ground up.

Hehe :-)

> Stacks aren't aligned in Windows or Linux. So you have a pay
> a penalty to make them so;

Again, I think the penalty is tiny (or zero), and not universal.

> and because of interrupt handlers and the like,

Did you mean callbacks?

> you can't even trust your own stack.

Indeed, so you have to align it yourself using an 'and'.

> Heap allocations aren't aligned in Windows, either. (Although you could  
> build you own heap on top of the page management in
> Windows -- but you better be prepared to allocate 64K at a
> time.) Again, you can fix this with run-time overhead.

Okay, but the example that Robert gave was of a (presumably)
stack allocated object, and nobody mentioned anything about
Windows or the IA-32 before you did. In general, there's
nothing to prevent heaps/pools being capable of cache-line
aligned allocation; I guess it would be harder to use the
gaps for smaller allocations, but I'm sure that doesn't
really matter.

> But if you're willing to spend run-time overhead, an
> address clause does the same thing without any work.

Well, I would argue that a good highly optimising compiler
should provide a convenient and portable way of enabling the
programmer to achieve cache-line optimisations, for both code
and data. Probably the best way is by providing appropriate
pragmas (that will be harmlessly ignored when irrelevant).

A possibility is to interpret the humble

    pragma Optimize(Time);

to mean doing the cache-line alignments recommended for the
target processor (group or architecture).

In general, it is better for the compiler to make decisions
about code or data placement for optimisation purposes,
since only the compiler can know /all/ the other implement-
ational details which could affect these decisions. I think
it is best for the compiler to make these decisions guided
by hints given in the form of pragmas.

However, if a compiler does not do cache-line optimisations
itself (automatically), it ought to support some reasonable
method by which it can be done explicitly (and I don't think
using an address clause is ideal for this purpose). I think
think it is implicit that by 'compiler' Robert and I mean
'the toolchain necessary to get from source to executable'.

-- 
Nick Roberts

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: reading a text file into a string
  2004-07-24 15:14                       ` Nick Roberts
@ 2004-07-26 23:48                         ` Randy Brukardt
  2004-07-27 12:08                           ` Nick Roberts
  0 siblings, 1 reply; 44+ messages in thread
From: Randy Brukardt @ 2004-07-26 23:48 UTC (permalink / raw)

"Nick Roberts" <nick.roberts@acm.org> wrote in message
news:opsbndy0o1p4pfvb@bram-2...
> On Fri, 23 Jul 2004 20:42:53 -0500, Randy Brukardt <randy@rrsoftware.com>
> wrote:
>
> > ...
> > Well, first of all, books don't necessarily equal practice.
>
> In other words, you /are/ trying to say all those computer
> scientists got it wrong ;-)

No, they're just ignoring the realities of the target systems. Most articles
I see make that mistake. (Including this one. :-)

> > If aligning things causes a program to use more pages, it
> > can make it run slower, because it makes it load code from
> > disk more frequently.
>
> But we (Robert and I) are talking about using alignments
> sparingly, to improve the efficiency of the speed-critical
> parts of a program. Surely you've heard of the 80-20 rule?
> (Which is, of course, silly, being the 99-1 rule in reality.)

The largest alignment that you allow impacts the design of your stack and of
your storage pool, at least if you intend to do it at compile-time. That's a
distributed overhead - it's small, but certainly not zero.

...
> In any event, all the compiler has to do to align the stack
> to 2^n bytes just prior to (parameter pushing and) subroutine
> call is to emit:
>
>     and esp, -2^n
>
> et voila!

How do you undo this when you leave the scope? You have to save the ESP
value somewhere and restore it to do that, and *that* is an extra overhead.

...
> > Similarly, existing Windows linkers don't support alignments
> > beyond 16 to my knowledge -- so again you would have to do
> > something at runtime with a penalty.
>
> But then the point is that the linkers /should/ support other
> alignments. It's no good saying "Oh, we can't do that because
> the linker doesn't support it!" Obviously, you need to change
> the linker. It's called not letting the tail wag the dog :-)

You know as well I as do that you don't get to change your target system to
your whim. You have to use the tools that users want to use, such as the
Microsoft linker.

But even if you wrote your own linker, I don't think that there is any
guarentee of alignment in the loading of the parts of an .EXE file. So I
don't know if any alignment that you have in your linker would actually be
preserved.

...
> > Last time I checked, Intel was recommending that labels in
> > code not be aligned further than 4 byte boundaries.
>
> The latest advice is:
>
>     Loop entry labels should be 16-byte-aligned when less than
>     eight bytes away from a 16-byte boundary.
>
>     Labels that follow a conditional branch need not be aligned.
>
>     Labels that follow an unconditional branch or function call
>     should be 16-byte-aligned when less than eight bytes away
>     from a 16-byte boundary.
>
>     Use a compiler that will assure these rules are met for the
>     generated code.
>
> [Section 2, Intel Architecture Optimization Reference Manual,
> Copyright (c) 1998, 1999 Intel Corporation All Rights Reserved
> Issued in U.S.A., Order Number: 245127-001]
>
> > I don't know precisely why they recommended that, but I don't
> > claim to know better than Intel!
>
> Well, I don't think they ever did; maybe you need to do some
> re-reading.

That's it. That's the third time in the last few months that you've
essentially called me a liar - or senile - and I'm done taking it without
comment. Either we're going to talk without personal attacks, or we're not
going to talk at all. OK?

For the record, my knowledge of Intel's recommendations primarily comes from
an Intel seminar I attended some years ago. Since it was covered by an NDA
(non-disclosure agreement), I can't even show you - or tell you for that
matter - much more than that.

In any case, the rules that you gave above are weaker in most areas than the
ones I remember (labels at 4, subprograms at 16), and certainly give no
indication of the value of cache-line sized optimizations -- which is what I
think we were talking about. I see nothing above recommending alignments
greater than 16 for anything.

                   Randy.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: reading a text file into a string
  2004-07-26 23:48                         ` Randy Brukardt
@ 2004-07-27 12:08                           ` Nick Roberts
  2004-07-27 23:24                             ` Robert I. Eachus
  2004-07-29  0:53                             ` Randy Brukardt
  0 siblings, 2 replies; 44+ messages in thread
From: Nick Roberts @ 2004-07-27 12:08 UTC (permalink / raw)

[I've put my replies out of order, because I think there's a bit
in the middle that needs to be said first.]

On Mon, 26 Jul 2004 18:48:04 -0500, Randy Brukardt
<randy@rrsoftware.com> wrote:

> ...
>> > Last time I checked, Intel was recommending that labels in
>> > code not be aligned further than 4 byte boundaries.
>>
>> The latest advice is:
>>
>>     Loop entry labels should be 16-byte-aligned when less than
>>     eight bytes away from a 16-byte boundary.
>>
>>     Labels that follow a conditional branch need not be aligned.
>>
>>     Labels that follow an unconditional branch or function call
>>     should be 16-byte-aligned when less than eight bytes away
>>     from a 16-byte boundary.
>>
>>     Use a compiler that will assure these rules are met for the
>>     generated code.
>>
>> [Section 2, Intel Architecture Optimization Reference Manual,
>> Copyright (c) 1998, 1999 Intel Corporation All Rights Reserved
>> Issued in U.S.A., Order Number: 245127-001]
>>
>> > I don't know precisely why they recommended that, but I don't
>> > claim to know better than Intel!
>>
>> Well, I don't think they ever did; maybe you need to do some
>> re-reading.
>
> That's it. That's the third time in the last few months that
> you've essentially called me a liar - or senile - and I'm done
> taking it without comment. Either we're going to talk without
> personal attacks, or we're not going to talk at all. OK?

Well, that comes as a bolt out of the blue, Randy.

Let me first assure you that neither this time nor at any time in
the past have I intended to imply that were lying or to make any
personal slight against you.

On consideration, I feel that I should not have made the remark
"maybe you need to do some re-reading", and I do truly apologise
for it. It was intended to be lighthearted and to be taken in a
friendly manner. Usenet is a medium given to stripping away all
the extra cues that a different medium (such as a telephone call)
would convey that help to disambiguate communications. It is easy,
sometimes, to forget this, but I should have known better.

In fact, I'm very unhappy that this seems to be the impression
that you have got of me Randy, because the truth is -- though
sadly you may not believe it now -- I have the greatest respect for
you, and I honestly admire you: for what you have done and continue
to do for the ARG and Ada standards and to champion the use of Ada;
for your contributions to the Ada community (as I know it, in terms
of Usenet and other Internet venues), and the friendly and helpful
manner of those contributions.

We may have had disagreements about lots of things during the
course of discussions between us, but there is big, big difference,
as far as I am concerned, between disagreeing with someone and
having less respect for them.

I do really hope that I have not permanently destroyed any faith
you may have had in me, and I regret anything I may have said in
the past to this effect. I often have a clumsy and hasty style of
writing on Usenet, and I'm sure that often what I say comes across
with a different meaning or emphasis to what I intended.

That said, I hope my remaining replies will be taken in good part.

> For the record, my knowledge of Intel's recommendations primarily
> comes from an Intel seminar I attended some years ago. Since it
> was covered by an NDA (non-disclosure agreement), I can't even
> show you - or tell you for that matter - much more than that.

I think I once read a magazine article that said Intel were no
longer recommending cache-line (or half-line) alignments for code,
for their (as it was then) upcoming Pentium model. I have read
this sort of thing before, and dismissed it as hype or gossip,
since the official (published) Intel recommendations never changed
in the event. So I have tended to assume that repetitions of the
idea have simply been repetitions of gossip.

Obviously, since your information in fact comes from direct from
Intel, I was wrong, and I was wrong to have doubted you.

> In any case, the rules that you gave above are weaker in most
> areas than the ones I remember (labels at 4, subprograms at 16),
> and certainly give no indication of the value of cache-line
> sized optimizations -- which is what I think we were talking
> about. I see nothing above recommending alignments greater than
> 16 for anything.

According to the manual, the 16-byte alignments are to do with
the way the instruction pre-decoding unit loads code, which is
16-bytes (a cache 'half-line') at a time. But is the manual
correct?

>> > If aligning things causes a program to use more pages,
>> > it can make it run slower, because it makes it load
>> > code from disk more frequently.
>>
>> But we (Robert and I) are talking about using alignments
>> sparingly, to improve the efficiency of the speed-critical
>> parts of a program. Surely you've heard of the 80-20 rule?
>> (Which is, of course, silly, being the 99-1 rule in
>> reality.)
>
> The largest alignment that you allow impacts the design of
> your stack and of your storage pool, at least if you intend
> to do it at compile-time. That's a distributed overhead -
> it's small, but certainly not zero.

Well, that's true and I cannot argue with it per se.

However, based on the presumption that typical software does
spend something like 99% of the time in 1% of the code (and
that 1% tends to be fairly 'tight' loops), I am not convinced
that the extra memory space that a program will take up (both
code and data) due to cache-line alignments is more likely to
cause the program to slow down more than it will cause it to
speed up (in that critical 1% of the code).

This will be dependent on how big the working set is during
the execution of that speed-critical code, in particular
whether the working set is caused to exceed available RAM; if
it is, then the program will indeed be slowed down. But, of
course, I am saying that even cheap computers have a lot of
RAM these days, so I think that eventuality is unlikely.

> ...
>> In any event, all the compiler has to do to align the stack
>> to 2^n bytes just prior to (parameter pushing and) subroutine
>> call is to emit:
>>
>>     and esp, -2^n
>>
>> et voila!
>
> How do you undo this when you leave the scope? You have to
> save the ESP value somewhere and restore it to do that, and
> *that* is an extra overhead.

Well, I don't think so. The usual thing is to do is to save
ESP in the EBP register at stack frame creation, and restore it
 from EBP just prior to return. There is, I grant, a need for a
little care, in that one would (I guess) need to do the stack
alignment I suggested before pushing anything onto the stack
that you might want to pop off it afterwards. Otherwise, I
think the 'and' instruction is the only extra thing required.

I vaguely remember that I have actually used this technique,
but a long time ago.

>
> ...
>> > Similarly, existing Windows linkers don't support
>> > alignments beyond 16 to my knowledge -- so again you would
>> > have to do something at runtime with a penalty.
>>
>> But then the point is that the linkers /should/ support other
>> alignments. It's no good saying "Oh, we can't do that because
>> the linker doesn't support it!" Obviously, you need to change
>> the linker. It's called not letting the tail wag the dog :-)
>
> You know as well I as do that you don't get to change your
> target system to your whim. You have to use the tools that
> users want to use, such as the Microsoft linker.
>
> But even if you wrote your own linker, I don't think that there
> is any guarentee of alignment in the loading of the parts of an
> .EXE file. So I don't know if any alignment that you have in
> your linker would actually be preserved.

I can't quickly find information on the subject, but I rather
suspect that an .EXE or .DLL is likely to be loaded page
aligned. That would mean alignments up to the page size would
be safe.

Also, I think possibly we're arguing at crossed purposes on
this point. I'm only arging that linkers and execution
environments /should/ support cache-line alignments. I accept
that many do not, in practice, and I accept that a compiler
targetting such a linker or environment cannot be expected
to so so either. I think this is how Robert's original comment
can be construed, also.

-- 
Nick Roberts

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: reading a text file into a string
  2004-07-27 12:08                           ` Nick Roberts
@ 2004-07-27 23:24                             ` Robert I. Eachus
  2004-07-29  0:55                               ` Randy Brukardt
  2004-07-29  0:53                             ` Randy Brukardt
  1 sibling, 1 reply; 44+ messages in thread
From: Robert I. Eachus @ 2004-07-27 23:24 UTC (permalink / raw)

Nick Roberts wrote:

> Let me first assure you that neither this time nor at any time in
> the past have I intended to imply that were lying or to make any
> personal slight against you.

I didn't read Nick's words as indicating anything other than "things 
have changed in this area."  But I wasn't the target.

Personally, though, I think this is VERY important discussion, and I 
hope we can keep to the issues.  I was surprised to see GNAT saying it 
would only do doubleword (4-byte) alignment, because 8-byte alignment 
has gone into and out of programming guides with each new hardware 
generation.

> According to the manual, the 16-byte alignments are to do with
> the way the instruction pre-decoding unit loads code, which is
> 16-bytes (a cache 'half-line') at a time. But is the manual
> correct?

Don't know about the Intel IA-32 manual, but the AMD "Software 
Optimization Guide for AMD Athlonï¿½ 64 and AMD Opteronï¿½ Processors" 
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/25112.PDF
indicates that the latest AMD processors now use a 32-byte code decoding 
window.  The Intel Itanium2 also loads two instruction bundles 
(32-bytes) at a time.

> Well, I don't think so. The usual thing is to do is to save
> ESP in the EBP register at stack frame creation, and restore it
> from EBP just prior to return. There is, I grant, a need for a
> little care, in that one would (I guess) need to do the stack
> alignment I suggested before pushing anything onto the stack
> that you might want to pop off it afterwards. Otherwise, I
> think the 'and' instruction is the only extra thing required.
> 
> I vaguely remember that I have actually used this technique,
> but a long time ago.

The AMD manual referenced above gives the example code to do this on 
page 128 (in Section 5.13):

prologue:
    push ebp
    mov ebp, esp
    sub esp, SIZE_OF_LOCALS ; Size of local variables
    and esp, ï¿½8
    ... ; Push registers that need to be preserved.

epilogue: ; Pop register that needed to be preserved.
    leave
    ret

This example is explictly showing a quadword alignment (8-bytes). 
Compilers definitely should do this for code with quadword (usually 
Long_Float in Ada) values.  Of course, to do cache boundary alignment as 
well, you replace -8 with -64 (or -256 on Intel Pentium4 CPUs).  The 
waste of space on the stack is minor, or should be if it is only done 
when the programmer explicitly requests it.  Again, in the code where I 
need to do this, the _execution_time_ cost should be zero, since the 
stack frame needs to be quad-word aligned for other reasons.

> Also, I think possibly we're arguing at crossed purposes on
> this point. I'm only arging that linkers and execution
> environments /should/ support cache-line alignments. I accept
> that many do not, in practice, and I accept that a compiler
> targetting such a linker or environment cannot be expected
> to so so either. I think this is how Robert's original comment
> can be construed, also.

Right.  But as discussed above, aligning stack frames is something any 
compiler can do, whether on x86 or elsewhere.  Also the heap management 
software can/should allow for an allocation request to specify 
alignment. MicroQuill sells a very nice library to replace malloc and 
free with better performing versions, if the 'native' OS functions are 
not aware of cache line and disk page sizes.  (Heap objects should never 
be allocated across vitrual memory page boundaries unless they are too 
big to fit in a single page.  But some versions of malloc ignore page 
boundaries when allocating objects in the heap.)

-- 

                                           Robert I. Eachus

"The flames kindled on the Fourth of July, 1776, have spread over too 
much of the globe to be extinguished by the feeble engines of despotism; 
on the contrary, they will consume these engines and all who work them." 
-- Thomas Jefferson, 1821

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: reading a text file into a string
  2004-07-27 12:08                           ` Nick Roberts
  2004-07-27 23:24                             ` Robert I. Eachus
@ 2004-07-29  0:53                             ` Randy Brukardt
  2004-07-29  7:25                               ` Martin Dowie
  2004-07-29 20:08                               ` Robert I. Eachus
  1 sibling, 2 replies; 44+ messages in thread
From: Randy Brukardt @ 2004-07-29  0:53 UTC (permalink / raw)


"Nick Roberts" <nick.roberts@acm.org> wrote in message
news:opsbspb4bep4pfvb@bram-2...
...
> >> > I don't know precisely why they recommended that, but I don't
> >> > claim to know better than Intel!
> >>
> >> Well, I don't think they ever did; maybe you need to do some
> >> re-reading.
> >
> > That's it. That's the third time in the last few months that
> > you've essentially called me a liar - or senile - and I'm done
> > taking it without comment. Either we're going to talk without
> > personal attacks, or we're not going to talk at all. OK?
>
> Well, that comes as a bolt out of the blue, Randy.
>
> Let me first assure you that neither this time nor at any time in
> the past have I intended to imply that were lying or to make any
> personal slight against you.
>
> On consideration, I feel that I should not have made the remark
> "maybe you need to do some re-reading", and I do truly apologise
> for it. It was intended to be lighthearted and to be taken in a
> friendly manner. Usenet is a medium given to stripping away all
> the extra cues that a different medium (such as a telephone call)
> would convey that help to disambiguate communications. It is easy,
> sometimes, to forget this, but I should have known better.

For the record, I was most upset about the first part, not the second. I
have no problem believing that the recommendations have changed - and you
quoted some that are different (they seem rather old, but that's another
story). But to say "I don't think that they ever did" recommend what I
originally reported says that they *never* recommended what I remember and
essentially that I was trying to mislead the conversation by saying that.
Not good.

Anyway, I accept your apology, and I'll try to be less sensitive next time.


...
> > How do you undo this when you leave the scope? You have to
> > save the ESP value somewhere and restore it to do that, and
> > *that* is an extra overhead.
>
> Well, I don't think so. The usual thing is to do is to save
> ESP in the EBP register at stack frame creation, and restore it
>  from EBP just prior to return. There is, I grant, a need for a
> little care, in that one would (I guess) need to do the stack
> alignment I suggested before pushing anything onto the stack
> that you might want to pop off it afterwards. Otherwise, I
> think the 'and' instruction is the only extra thing required.

I realize that we're weird here, but EBP points at the bottom of the stack
frame in Janus/Ada; that gives us positive stack offsets. We had a lot of
trouble with negative ones in the early days, and I just gave up on that.

In any case, we spend quite a bit of effort trying to avoid setting EBP at
all. For small leaf subprograms, the overhead of writing then restoring EBP
can be a significant percentage of the cost of the whole routine. Thus, we
get rid of the stack frame with an Add, and that leaves us with no obvious
way to do an alignment. (Alignment is not reflected in our intermediate
code, as that is supposed to be done by the data layout earlier in the
compiler. So it's either all or nothing - it has to be done for all stack
frames or not supported; I suspect many other compilers are similar.)

Anyway, my opinion these days is that spending a lot of effort making
something run 2% faster is wasted effort. You're always better off changing
to a different way of solving the problem. The most recent instance was in
my web log analyzer. It was running too slow on the AdaIC site's logs, and I
wasted a lot of time trying to improve it. But replacing the binary lookups
(log N, N being around 200,000) by a hashed lookup (very similar to
switching from Sorted _Sets to Hashed_Maps in the Containers library)
improved the speed by a factor of 5 (a result I didn't expect, because I had
to use an expensive hash function -- all of the cheap ones I tried didn't
work well on the actual data -- and log N wasn't that large -- between 12
and 19 on the data I tested with). Moral: make sure you've exhausted
algorithmic improvements before even thinking about squeezing a few extra
percent out of the code. And when you think you've exhaused algorithmic
improvements, try again, because sometimes non-obvious things work! (We
hadn't originally used a hash because of the need to write sorted reports.
But it turned out that using a hash and a quicksort on the report was faster
than keeping the data sorted.)

               Randy.



> I vaguely remember that I have actually used this technique,
> but a long time ago.
>
> >
> > ...
> >> > Similarly, existing Windows linkers don't support
> >> > alignments beyond 16 to my knowledge -- so again you would
> >> > have to do something at runtime with a penalty.
> >>
> >> But then the point is that the linkers /should/ support other
> >> alignments. It's no good saying "Oh, we can't do that because
> >> the linker doesn't support it!" Obviously, you need to change
> >> the linker. It's called not letting the tail wag the dog :-)
> >
> > You know as well I as do that you don't get to change your
> > target system to your whim. You have to use the tools that
> > users want to use, such as the Microsoft linker.
> >
> > But even if you wrote your own linker, I don't think that there
> > is any guarentee of alignment in the loading of the parts of an
> > .EXE file. So I don't know if any alignment that you have in
> > your linker would actually be preserved.
>
> I can't quickly find information on the subject, but I rather
> suspect that an .EXE or .DLL is likely to be loaded page
> aligned. That would mean alignments up to the page size would
> be safe.
>
> Also, I think possibly we're arguing at crossed purposes on
> this point. I'm only arging that linkers and execution
> environments /should/ support cache-line alignments. I accept
> that many do not, in practice, and I accept that a compiler
> targetting such a linker or environment cannot be expected
> to so so either. I think this is how Robert's original comment
> can be construed, also.
>
> --
> Nick Roberts





^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: reading a text file into a string
  2004-07-27 23:24                             ` Robert I. Eachus
@ 2004-07-29  0:55                               ` Randy Brukardt
  0 siblings, 0 replies; 44+ messages in thread
From: Randy Brukardt @ 2004-07-29  0:55 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 684 bytes --]

"Robert I. Eachus" <rieachus@comcast.net> wrote in message
news:KOadnc3XCbaleZvcRVn-qQ@comcast.com...
...
> The AMD manual referenced above gives the example code to do this on
> page 128 (in Section 5.13):
>
> prologue:
>     push ebp
>     mov ebp, esp
>     sub esp, SIZE_OF_LOCALS ; Size of local variables
>     and esp, �8
>     ... ; Push registers that need to be preserved.
>
> epilogue: ; Pop register that needed to be preserved.
>     leave
>     ret

"Leave" used to be one of the instructions that Intel told you to avoid,
although they were rather ambigious about it. Anyway, we put EBP at the
bottom of the frame, so "leave" doesn't work.

             Randy.







^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: reading a text file into a string
  2004-07-29  0:53                             ` Randy Brukardt
@ 2004-07-29  7:25                               ` Martin Dowie
  2004-07-29 20:08                               ` Robert I. Eachus
  1 sibling, 0 replies; 44+ messages in thread
From: Martin Dowie @ 2004-07-29  7:25 UTC (permalink / raw)


Randy Brukardt wrote:
> Anyway, my opinion these days is that spending a lot of effort making
> something run 2% faster is wasted effort. You're always better off
> changing to a different way of solving the problem. The most recent
> instance was in my web log analyzer. It was running too slow on the
> AdaIC site's logs, and I wasted a lot of time trying to improve it.
> But replacing the binary lookups (log N, N being around 200,000) by a
> hashed lookup (very similar to switching from Sorted _Sets to
> Hashed_Maps in the Containers library) improved the speed by a factor
> of 5 (a result I didn't expect, because I had to use an expensive
> hash function -- all of the cheap ones I tried didn't work well on
> the actual data -- and log N wasn't that large -- between 12 and 19
> on the data I tested with). Moral: make sure you've exhausted
> algorithmic improvements before even thinking about squeezing a few
> extra percent out of the code. And when you think you've exhaused
> algorithmic improvements, try again, because sometimes non-obvious
> things work! (We hadn't originally used a hash because of the need to
> write sorted reports. But it turned out that using a hash and a
> quicksort on the report was faster than keeping the data sorted.)

I'd whole heartedly second this advice. It reminds me of a recent case
where a colleague had a program that seemed to be taking forever. I can't
recall what data structure he was using but he worked out that at it's
current speed it was going to need something fast approximating to the
entire life of the universe so far to complete! He switched to a 'quadtree'
and "bingo" - it only took a few (tens of) hours!

Cheers

-- Martin






^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: reading a text file into a string
  2004-07-29  0:53                             ` Randy Brukardt
  2004-07-29  7:25                               ` Martin Dowie
@ 2004-07-29 20:08                               ` Robert I. Eachus
  2004-07-30  0:14                                 ` tmoran
  1 sibling, 1 reply; 44+ messages in thread
From: Robert I. Eachus @ 2004-07-29 20:08 UTC (permalink / raw)

Randy Brukardt wrote:

> Anyway, my opinion these days is that spending a lot of effort making
> something run 2% faster is wasted effort. You're always better off changing
> to a different way of solving the problem...

In general agreed.  The only place I currently go through the pain of 
cache aligning data structures is in matrix multiplication and other 
linear algebra code for large matrix sizes.

But there the difference is often a factor of 3 or more in execution time.

How many people actually WRITE such code?  Very few.  That is what ATLAS 
is all about.  It allows you to create a LINPACK and LAPACK version that 
is optimized for your exact execution environment, without worrying 
about all this.  Of course that means that the people who port ATLAS to 
different architectures, are the ones that have to worry about such 
grody details. (And note that the right ATLAS version for Pentium 3 is 
not the right version for Pentium 4, same for Athlon XP and Athlon64 and 
so on.)

-- 

                                           Robert I. Eachus

"The flames kindled on the Fourth of July, 1776, have spread over too 
much of the globe to be extinguished by the feeble engines of despotism; 
on the contrary, they will consume these engines and all who work them." 
-- Thomas Jefferson, 1821

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: reading a text file into a string
  2004-07-29 20:08                               ` Robert I. Eachus
@ 2004-07-30  0:14                                 ` tmoran
  0 siblings, 0 replies; 44+ messages in thread
From: tmoran @ 2004-07-30  0:14 UTC (permalink / raw)


>cache aligning data structures is in matrix multiplication and other
>linear algebra code for large matrix sizes.
>
>But there the difference is often a factor of 3 or more in execution time.
>
>How many people actually WRITE such code?  Very few.  That is what ATLAS
  Are there compilers with a "pragma Really_Really_Optimize" option?  Or
post-compilers that read a small piece of object code, mull over it for
quite some time, and write a very highly optimized replacement?



^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2004-07-30  0:14 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-07-15 17:27 reading a text file into a string zork
2004-07-15 17:49 ` Marius Amado Alves
2004-07-15 19:57   ` Nick Roberts
2004-07-15 17:59 ` Marius Amado Alves
2004-07-15 19:18   ` Nick Roberts
2004-07-15 19:18 ` Nick Roberts
2004-07-15 20:02   ` Nick Roberts
2004-07-16  1:23 ` Jeffrey Carter
2004-07-16  2:20 ` Steve
2004-07-16  2:26 ` Steve
2004-07-16 16:16   ` Jeffrey Carter
2004-07-16 17:45     ` Nick Roberts
2004-07-16 21:19   ` Randy Brukardt
2004-07-17  2:27     ` Robert I. Eachus
2004-07-17 11:31       ` Mats Weber
2004-07-17 15:52         ` Robert I. Eachus
2004-07-17 22:38           ` Jeffrey Carter
2004-07-18 13:44             ` zork
2004-07-19  8:07       ` Dale Stanbrough
2004-07-19  8:58         ` Martin Dowie
2004-07-21  0:17           ` Robert I. Eachus
2004-07-21 21:39             ` Randy Brukardt
2004-07-22 22:34               ` Robert I. Eachus
2004-07-23  0:49                 ` Randy Brukardt
2004-07-23 21:56                   ` Nick Roberts
2004-07-24  0:34                     ` tmoran
2004-07-24  1:16                       ` Nick Roberts
2004-07-24  1:42                     ` Randy Brukardt
2004-07-24 15:14                       ` Nick Roberts
2004-07-26 23:48                         ` Randy Brukardt
2004-07-27 12:08                           ` Nick Roberts
2004-07-27 23:24                             ` Robert I. Eachus
2004-07-29  0:55                               ` Randy Brukardt
2004-07-29  0:53                             ` Randy Brukardt
2004-07-29  7:25                               ` Martin Dowie
2004-07-29 20:08                               ` Robert I. Eachus
2004-07-30  0:14                                 ` tmoran
2004-07-24  2:56                   ` Robert I. Eachus
2004-07-19 11:51       ` Ada2005 (was " Peter Hermann
2004-07-19 12:51         ` Dmitry A. Kazakov
2004-07-19 13:01         ` Nick Roberts
2004-07-19 13:35           ` Martin Dowie
2004-07-19 17:22             ` Nick Roberts
2004-07-19 23:50           ` Randy Brukardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox