From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: 103376,a3fe2aac201210c0 X-Google-Attributes: gid103376,public X-Google-Language: ENGLISH,ASCII-7-bit Path: g2news1.google.com!news2.google.com!fu-berlin.de!uni-berlin.de!not-for-mail From: "Nick Roberts" Newsgroups: comp.lang.ada Subject: Re: reading a text file into a string Date: Sat, 24 Jul 2004 16:14:50 +0100 Message-ID: References: <40f6bf21@dnews.tpgi.com.au> <40fb8c00$1_1@baen1673807.greenlnk.net> Mime-Version: 1.0 Content-Type: text/plain; format=flowed; delsp=yes; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Trace: news.uni-berlin.de NCWMmB1JMnm5NHqBjelN0AIS7jtTdtR10eHmfTSfj0ZqYC0dY= User-Agent: Opera M2/7.51 (Win32, build 3798) Xref: g2news1.google.com comp.lang.ada:2374 Date: 2004-07-24T16:14:50+01:00 List-Id: On Fri, 23 Jul 2004 20:42:53 -0500, Randy Brukardt wrote: > ... > Well, first of all, books don't necessarily equal practice. In other words, you /are/ trying to say all those computer scientists got it wrong ;-) > If aligning things causes a program to use more pages, it > can make it run slower, because it makes it load code from > disk more frequently. But we (Robert and I) are talking about using alignments sparingly, to improve the efficiency of the speed-critical parts of a program. Surely you've heard of the 80-20 rule? (Which is, of course, silly, being the 99-1 rule in reality.) > Anyway, I wasn't arguing that alignment per-se is a bad > idea. We do it on integers, for instance, and I think that > virtually all compilers do that. > I was arguing that on the x86, stack alignments beyond 4 > can only be done at run-time. (Unless *all* software in > the system in under your control, and there are no > interrupts/signals on your stack -- never true in > practice.) But Randy, it you get a signal/interrupt on your stack, it all happens on the top of your stack. It doesn't affect the stack's alignment! Were you actually talking about callbacks? In any event, all the compiler has to do to align the stack to 2^n bytes just prior to (parameter pushing and) subroutine call is to emit: and esp, -2^n et voila! > That's a distributed penalty that gets paid everywhere. No it isn't. Only in calling those subroutines which require alignment, and even then the penalty is an 'and' instruction which, as you know, can probably be scheduled to take zero time on a superscalar target. > Similarly, existing Windows linkers don't support alignments > beyond 16 to my knowledge -- so again you would have to do > something at runtime with a penalty. But then the point is that the linkers /should/ support other alignments. It's no good saying "Oh, we can't do that because the linker doesn't support it!" Obviously, you need to change the linker. It's called not letting the tail wag the dog :-) > In both cases, the penalty might very well cost more than > the time savings possible. I think I've demonstrated that this is very unlikely. > Given there is a penalty, doing alignments automatically is > a bad idea. All I can say is that, given that there /isn't/ a penalty, doing (cache-line) alignments automatically is a /good/ idea :-) > Last time I checked, Intel was recommending that labels in > code not be aligned further than 4 byte boundaries. The latest advice is: Loop entry labels should be 16-byte-aligned when less than eight bytes away from a 16-byte boundary. Labels that follow a conditional branch need not be aligned. Labels that follow an unconditional branch or function call should be 16-byte-aligned when less than eight bytes away from a 16-byte boundary. Use a compiler that will assure these rules are met for the generated code. [Section 2, Intel Architecture Optimization Reference Manual, Copyright (c) 1998, 1999 Intel Corporation All Rights Reserved Issued in U.S.A., Order Number: 245127-001] > I don't know precisely why they recommended that, but I don't > claim to know better than Intel! Well, I don't think they ever did; maybe you need to do some re-reading. >> If you are worried about the fact that all stacks and heaps/ >> pools must be cache-line aligned (32, 64 bytes?), you have >> missed the RAM revolution that has been going on for the last >> two decades ;-) > > That's only possible if you build a new OS from the ground up. Hehe :-) > Stacks aren't aligned in Windows or Linux. So you have a pay > a penalty to make them so; Again, I think the penalty is tiny (or zero), and not universal. > and because of interrupt handlers and the like, Did you mean callbacks? > you can't even trust your own stack. Indeed, so you have to align it yourself using an 'and'. > Heap allocations aren't aligned in Windows, either. (Although you could > build you own heap on top of the page management in > Windows -- but you better be prepared to allocate 64K at a > time.) Again, you can fix this with run-time overhead. Okay, but the example that Robert gave was of a (presumably) stack allocated object, and nobody mentioned anything about Windows or the IA-32 before you did. In general, there's nothing to prevent heaps/pools being capable of cache-line aligned allocation; I guess it would be harder to use the gaps for smaller allocations, but I'm sure that doesn't really matter. > But if you're willing to spend run-time overhead, an > address clause does the same thing without any work. Well, I would argue that a good highly optimising compiler should provide a convenient and portable way of enabling the programmer to achieve cache-line optimisations, for both code and data. Probably the best way is by providing appropriate pragmas (that will be harmlessly ignored when irrelevant). A possibility is to interpret the humble pragma Optimize(Time); to mean doing the cache-line alignments recommended for the target processor (group or architecture). In general, it is better for the compiler to make decisions about code or data placement for optimisation purposes, since only the compiler can know /all/ the other implement- ational details which could affect these decisions. I think it is best for the compiler to make these decisions guided by hints given in the form of pragmas. However, if a compiler does not do cache-line optimisations itself (automatically), it ought to support some reasonable method by which it can be done explicitly (and I don't think using an address clause is ideal for this purpose). I think think it is implicit that by 'compiler' Robert and I mean 'the toolchain necessary to get from source to executable'. -- Nick Roberts