From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Thread: 103376,a3fe2aac201210c0
X-Google-Attributes: gid103376,public
X-Google-Language: ENGLISH,ASCII-7-bit
Path: 
 g2news1.google.com!news2.google.com!fu-berlin.de!uni-berlin.de!not-for-mail
From: "Nick Roberts" <nick.roberts@acm.org>
Newsgroups: comp.lang.ada
Subject: Re: reading a text file into a string
Date: Sat, 24 Jul 2004 16:14:50 +0100
Message-ID: <opsbndy0o1p4pfvb@bram-2>
References: <40f6bf21@dnews.tpgi.com.au> <E1HJc.101277$Oq2.96646@attbi_s52>
 <LNOdnQjer4Hf22XdRVn-sA@megapath.net> <fOednXzORfHlE2Xd4p2dnA@comcast.com>
 <MrNoSpam-C3D3BB.18074619072004@news-server.bigpond.net.au>
 <40fb8c00$1_1@baen1673807.greenlnk.net> <nM-dnegXLbmdK2DdRVn-hQ@comcast.com>
 <XMCdnQjqXrrDf2PdRVn-pw@megapath.net> <OrednWv2_cdw3Z3cRVn-uQ@comcast.com>
 <rOWdnYp1K5bk_Z3cRVn-vA@megapath.net> <opsbl1vsgsp4pfvb@bram-2>
 <nvCdnXEk6MblI5zcRVn-pg@megapath.net>
Mime-Version: 1.0
Content-Type: text/plain; format=flowed; delsp=yes; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de NCWMmB1JMnm5NHqBjelN0AIS7jtTdtR10eHmfTSfj0ZqYC0dY=
User-Agent: Opera M2/7.51 (Win32, build 3798)
Xref: g2news1.google.com comp.lang.ada:2374
Date: 2004-07-24T16:14:50+01:00
List-Id: <comp.lang.ada>

On Fri, 23 Jul 2004 20:42:53 -0500, Randy Brukardt <randy@rrsoftware.com>  
wrote:

> ...
> Well, first of all, books don't necessarily equal practice.

In other words, you /are/ trying to say all those computer
scientists got it wrong ;-)

> If aligning things causes a program to use more pages, it
> can make it run slower, because it makes it load code from
> disk more frequently.

But we (Robert and I) are talking about using alignments
sparingly, to improve the efficiency of the speed-critical
parts of a program. Surely you've heard of the 80-20 rule?
(Which is, of course, silly, being the 99-1 rule in reality.)

> Anyway, I wasn't arguing that alignment per-se is a bad
> idea. We do it on integers, for instance, and I think that
> virtually all compilers do that.

> I was arguing that on the x86, stack alignments beyond 4
> can only be done at run-time. (Unless *all* software in
> the system in under your control, and there are no
> interrupts/signals on your stack -- never true in
> practice.)

But Randy, it you get a signal/interrupt on your stack, it
all happens on the top of your stack. It doesn't affect the
stack's alignment! Were you actually talking about
callbacks?

In any event, all the compiler has to do to align the stack
to 2^n bytes just prior to (parameter pushing and) subroutine
call is to emit:

    and esp, -2^n

et voila!

> That's a distributed penalty that gets paid everywhere.

No it isn't. Only in calling those subroutines which require
alignment, and even then the penalty is an 'and' instruction
which, as you know, can probably be scheduled to take zero
time on a superscalar target.

> Similarly, existing Windows linkers don't support alignments
> beyond 16 to my knowledge -- so again you would have to do
> something at runtime with a penalty.

But then the point is that the linkers /should/ support other
alignments. It's no good saying "Oh, we can't do that because
the linker doesn't support it!" Obviously, you need to change
the linker. It's called not letting the tail wag the dog :-)

> In both cases, the penalty might very well cost more than
> the time savings possible.

I think I've demonstrated that this is very unlikely.

> Given there is a penalty, doing alignments automatically is
> a bad idea.

All I can say is that, given that there /isn't/ a penalty,
doing (cache-line) alignments automatically is a /good/
idea :-)

> Last time I checked, Intel was recommending that labels in
> code not be aligned further than 4 byte boundaries.

The latest advice is:

    Loop entry labels should be 16-byte-aligned when less than
    eight bytes away from a 16-byte boundary.

    Labels that follow a conditional branch need not be aligned.

    Labels that follow an unconditional branch or function call
    should be 16-byte-aligned when less than eight bytes away
    from a 16-byte boundary.

    Use a compiler that will assure these rules are met for the
    generated code.

[Section 2, Intel Architecture Optimization Reference Manual,
Copyright (c) 1998, 1999 Intel Corporation All Rights Reserved
Issued in U.S.A., Order Number: 245127-001]

> I don't know precisely why they recommended that, but I don't
> claim to know better than Intel!

Well, I don't think they ever did; maybe you need to do some
re-reading.

>> If you are worried about the fact that all stacks and heaps/
>> pools must be cache-line aligned (32, 64 bytes?), you have
>> missed the RAM revolution that has been going on for the last
>> two decades ;-)
>
> That's only possible if you build a new OS from the ground up.

Hehe :-)

> Stacks aren't aligned in Windows or Linux. So you have a pay
> a penalty to make them so;

Again, I think the penalty is tiny (or zero), and not universal.

> and because of interrupt handlers and the like,

Did you mean callbacks?

> you can't even trust your own stack.

Indeed, so you have to align it yourself using an 'and'.

> Heap allocations aren't aligned in Windows, either. (Although you could  
> build you own heap on top of the page management in
> Windows -- but you better be prepared to allocate 64K at a
> time.) Again, you can fix this with run-time overhead.

Okay, but the example that Robert gave was of a (presumably)
stack allocated object, and nobody mentioned anything about
Windows or the IA-32 before you did. In general, there's
nothing to prevent heaps/pools being capable of cache-line
aligned allocation; I guess it would be harder to use the
gaps for smaller allocations, but I'm sure that doesn't
really matter.

> But if you're willing to spend run-time overhead, an
> address clause does the same thing without any work.

Well, I would argue that a good highly optimising compiler
should provide a convenient and portable way of enabling the
programmer to achieve cache-line optimisations, for both code
and data. Probably the best way is by providing appropriate
pragmas (that will be harmlessly ignored when irrelevant).

A possibility is to interpret the humble

    pragma Optimize(Time);

to mean doing the cache-line alignments recommended for the
target processor (group or architecture).

In general, it is better for the compiler to make decisions
about code or data placement for optimisation purposes,
since only the compiler can know /all/ the other implement-
ational details which could affect these decisions. I think
it is best for the compiler to make these decisions guided
by hints given in the form of pragmas.

However, if a compiler does not do cache-line optimisations
itself (automatically), it ought to support some reasonable
method by which it can be done explicitly (and I don't think
using an address clause is ideal for this purpose). I think
think it is implicit that by 'compiler' Robert and I mean
'the toolchain necessary to get from source to executable'.

-- 
Nick Roberts