From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,dab7d920e4340f12 X-Google-Attributes: gid103376,public X-Google-Thread: 1014db,dab7d920e4340f12 X-Google-Attributes: gid1014db,public From: clodius@hotspec.lanl.gov (William Clodius) Subject: Re: C is 'better' than Ada because... Date: 1996/07/30 Message-ID: X-Deja-AN: 171146119 sender: clodius@hotspec.lanl.gov references: <31e02c32.342948604@netline-fddi.jpl.nasa.gov> organization: Los Alamos National Laboratory newsgroups: comp.lang.ada,comp.lang.c Date: 1996-07-30T00:00:00+00:00 List-Id: In article <01bb7e2f$aeef4660$87ee6fce@timpent.airshields.com> "Tim Behrendsen" writes: I'm not going to claim to have used every compiler, obviously, but I base this opinion on often viewing the assembly language output when I get a core dump or during debugging sessions. Now, I've used a lot of assembly language, so I usually write my C code to be easy on the optimizer (use pointers when needed rather indexes, for example), and I admit that sometimes optimizers are very good. The problem is, the compiler often doesn't have enough contextual information to know what to do compared to the human writing the program, which bring us to ... There is no reason why a good compiler should generate worse code for an array access than for a pointer access. An array can be constructed by the compiler as a single block of memory whose access via indices is the same as through a pointer. Au contraire, because a pointer can be associated with any object of compatible type the analysis for a highly optimizing compiler is more difficult with pointers than with arrays. The problem with arrays in C is their lack of flexibility in term of dynamic memory allocation, that if they are associated with a pointer at any point in the scope alias analysis becomes more compicated for the compiler, and that when pased as an argument to a user defined function they are not distinguished from pointers. Global optimization: The compiler can often make decisions better than the human can because of the large scope of global optimizations, but again, the information needed to know when certain areas are going to be used a lot versus other areas is not known at compile time, which moves us to ... Most compilers, the ones O'Keefe cites are exceptions, do relatively little global analysis. O'Keefe's were developed for languages that would otherwise be extremely difficult to optimize. Many languages need much less global information for the most useful optimizations, due to the presence of typing, data type structure, side effect, and aliasing information that is required in standard conforming code. Standard conforming C code provide the essential typing and data structure information, but does not provide full side effect and aliasing information, so that it is intermediate in optimizability. Run time profiling/compiler feedback: This can certainly provide good information back to the compiler, but it seems a little dangerous. I have not used this type of optimization, so my opinion means little in this case, except for just my general impression. Anything other than a trivial program is going to be very difficult to simulate real-world conditions in order to get an optimal result. The downside is if your test conditions are wrong, you may end up with a "most pessimum" optimization. But if your test conditions are wrong you have typically not appropriately tested all aspects of the code, including the correctness of its output, its robustness, and its general efficiency. It just seems to me that there is no substitute for a well thought-out design. Depending on the compiler to save bad design is just a bad idea. True, a compiler will not change a 0(n*n) algorithm to a 0(n*ln(n)) algorithm, but it can have a significant effect on the multiplicative factor. Further, probably every language has limitations in terms of expressibility of design. Algorithms that are essentially independent of data type, e.g., lists, are difficult to specify in a type independent form in languages such as C, and must be reimplemented for each new special case. Aspects of the design that are independent of the details of the implementation, but can be used by the compiler for optimization, e.g., that a list wil not be cyclic or that an object will not be modified after its creation, are also difficult to express in C, and many other commonly used languages. The above are all worthwhile opimizations; I'm not claiming that optimizers do nothing. All I'm saying is that too often programmers fall back on the lazy line, "Oh well, the compiler should take care of that." For example, if I have ... for (i = 0; i < 10000000; ++i) array[i] *= abs(values[i]); Should I be *sure* this is going to be done efficiently, and use pointers, or should I just let the compiler handle it? I say that a competent programmer should use pointers out of habit, because its just good design and practice. And when the problem becomes less trivial than the one above, the programmer will already be using the proper techniques *the first time*, rather than having to go back and fix the inevitable performance bottlenecks. Again a good compiler, if anything, should generate better code for an array than for a pointer. I think you are being too optimistic that optimizers produce better code than "almost every programmer there is". If that were true, then projects would *never* be written in assembly language. As it stands, we know that often times things are written in assembly for performance reasons, such as device drivers. The perfect example of this are computer games. If your theory were true, then they would all be written in high-level languages to gain the best performance. We know that this isn't the case, however. A bright human can always outdo the optimizer. Again history affects useage. First, the most efficient code generators are produced first in a research environment, and only propagate to commercial compilers after their robustness has been thoroughly tested. Second, we rarely judge anything without the predjudices of our years of experience. People typically expect what was standard when they first started in a field, not what is currently standard. Optimizers are typically better now than they were ten years ago. Third, users do not always put a large emphasis on general code efficiency. For example, the original Fortran I and II compilers used a number of optimizations that were not used in any other compilers, Fortran or otherwise, untill the late sixties, and were only common in compilers by the mid to late seventies. Fourth, while high level languages are not commonly used for such tasks they are becoming more commonly used for such tasks, Erlang in particular comes to mind. > And you know what? C is one of the *harder* languages to optimise! You are probably right, if you let the optimizer handle everything. But C also gives the capability to do things such as ... if ((array[++n] = GetValue(arg)) != 0) { .... v.s. a typical non-C language ... n = n + 1; array[n] = GetValue(arg); if (array[n] != 0) { ... } Now, which is easier to optimize? For this trivial case, a modern compiler would probably do exactly the same thing, but what about when it starts getting non-trivial? If I code my expressions to take advantage of the "data flow" nature of C's syntax, I can make it very easy for the optimizer to do a great job. And it's not any more work, it's just a matter of experience and living in reality, and not pretending that the compiler is smarter than it really is. Actually C has relatively little in the way of data flow concepts as data flow is commonly understood. A true data flow language would avoid side effects in almost every context except I/O. Your example is full of side effects, some of which, particularly GetValue(arg), are difficult to optimize. Depending on the context a data flow language might have the syntax array[1:n] = GetValye(arg(1:n)); where (array /= 0) { ... } in the above GetValue is required to not have side effects so that the definition array[1:n] = GetValye(arg(1:n)); can be understood as occuring in parallel. In C for full optimization the compiler has to determine that arg is not aliased with elements of array, and that GetValue has no side effects. Note that a highly pipelined architecture is comparable to a truly parallel architecture in many optimization issues. -- William B. Clodius Phone: (505)-665-9370 Los Alamos National Laboratory Email: wclodius@lanl.gov Los Alamos, NM 87545