From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: fc89c,97188312486d4578 X-Google-Attributes: gidfc89c,public X-Google-Thread: 103376,97188312486d4578 X-Google-Attributes: gid103376,public X-Google-Thread: 109fba,baaf5f793d03d420 X-Google-Attributes: gid109fba,public X-Google-Thread: 1014db,6154de2e240de72a X-Google-Attributes: gid1014db,public From: Dan.Pop@cern.ch (Dan Pop) Subject: Re: What's the best language to start with? [was: Re: Should I learn C or Pascal?] Date: 1996/08/11 Message-ID: X-Deja-AN: 173599805 sender: news@news.cern.ch (USENET News System) x-nntp-posting-host: ues5.cern.ch references: <31FBC584.4188@ivic.qc.ca> <01bb8342$88cc6f40$32ee6fcf@timhome2> <4u7grn$eb0@news1.mnsinc.com> <01bb83ad$29c3cfa0$87ee6fce@timpent.airshields.com> <4u89c4$p7p@solutions.solon.com> <01bb83f5$923391e0$87ee6fce@timpent.airshields.com> <01bb8534$b2718bc0$87ee6fce@timpent.airshields.com> <01bb87cf$97ae8e80$87ee6fce@timpent.airshields.com> organization: CERN European Lab for Particle Physics newsgroups: comp.lang.c,comp.lang.c++,comp.unix.programmer,comp.lang.ada Date: 1996-08-11T00:00:00+00:00 List-Id: In <01bb87cf$97ae8e80$87ee6fce@timpent.airshields.com> "Tim Behrendsen" writes: >Dan Pop wrote in article >... >> In <01bb8534$b2718bc0$87ee6fce@timpent.airshields.com> "Tim Behrendsen" > writes: >> >> >Here's an example: >> > >> >int a[50000],b[50000],c[50000],d[50000],e[50000]; >> > >> >void test1() >> >{ >> > int i, j; >> > for (j = 0; j < 10; ++j) { >> > for (i = 0; i < 50000; ++i) { >> > ++a[i]; ++b[i]; ++c[i]; ++d[i]; ++e[i]; >> > } >> > } >> >} >> > >> >void test2() >> >{ >> > int i, j; >> > for (j = 0; j < 10; ++j) { >> > for (i = 0; i < 50000; ++i) ++a[i]; >> > for (i = 0; i < 50000; ++i) ++b[i]; >> > for (i = 0; i < 50000; ++i) ++c[i]; >> > for (i = 0; i < 50000; ++i) ++d[i]; >> > for (i = 0; i < 50000; ++i) ++e[i]; >> > } >> >} >> > >> >On my AIX system, test1 runs in 2.47 seconds, and test2 >> >runs in 1.95 seconds using maximum optimization (-O3). The >> >reason I knew the second would be faster is because I know >> >to limit the amount of context information the optimizer has >> >to deal with in the inner loops, and I know to keep memory >> >localized. >> >> 1. For a marginal speed increase (~25%), you compromised the readability >> of the code. > >You call 25% a "marginal" increase? It's attitudes like this >that give us the slow code world we have right now. Yes, I do call 25% a marginal increase. Would you be considerably happier if everything would run 25% faster, at the expense of everything being harder to develop and maintain by a factor of 2? >And how is one less readable than the other? Methinks one loop is easier to read than five. >> 2. Another compiler, on another system, might generate faster code >> out of the test1. This is especially true for supercomputers, >> which have no cache memory (and where the micro-optimizations are done >> based on a completely different set of criteria) and where the cpu >time >> is really expensive. > >Show me the computer where test1 comes out faster. Already done: my notebook. >Or shouldn't we depend on the compiler to optimize this? Exactly, it's compiler's job to optimize it, not mine. >> Let's see what happens on my 486DX33 box: >> >> So, it's 1.10 + 0.23 = 1.33 seconds of cpu time for test1 versus >> 1.17 + 0.18 = 1.35 seconds for test2. Conclusions: >> >> 1. My 486DX33 is faster than your AIX system (or maybe gcc is faster than >> xlc) :-) > >Oops! I just realized my AIX program had 100 for the outer loop, not >10 (had me worried for a minute there ...) > >> 2. Your results are not reproducible. Your extra "insight" simply >"helped" >> you to generate less readable code. > >Actually, try again with 500 or 5000 in the inner loops; 50000 >probably flushed the cache. Get a clue. If test1 flushes the cache, test2 will flush it, as well. Both implementations behave the same way WRT cache utilization: all 5 x 50000 array elements are accessed in a single iteration of the outer loop, hence the same thing (cache hit or cache miss) will happen at the next iteration in both versions. Cache is simply a non-issue in your example (even if you intended it to be :-) >In any case, the code is identically >readable either way IMO, and costs you nothing to implement >the efficient way. You forgot to provide a VALID justification for your claim that test2 is the "efficient way". >So it wasn't on one architecture, big deal. Can you prove that it will be on any other architecture? According to your line of argumentation, I could say: "so it was on one architecture, big deal" :-) >On the average, it will be, and costs nothing to do it that way/ I still have to see the proof. If your time is worth nothing, then you're right, it costs nothing. >> The key to proper optimization is profiling. Additional knowledge about >> a certain platform is a lot less important. > >Agreed; optimization shouldn't be to a certain platform, but >optimizations can be made on general basis. Right. By selecting the proper algorithm. Most other kinds of optimizations will be specific to a certain machine, or, at best, class of machines. For example, an optimization which tries to improve the cache hit rate might hurt on a supercomputer which has no cache, but it is adversely affected by certain memory access patterns. When programming for a single platform/compiler combination, it makes sense to perform micro-optimizations, _if they're rewarding enough_. But then, if portability is not an issue, the critical code could be written in assembly, as well. When writing portable code in a HLL, micro-optimizations are usually a waste of effort and programmer time. Dan -- Dan Pop CERN, CN Division Email: Dan.Pop@cern.ch Mail: CERN - PPE, Bat. 31 R-004, CH-1211 Geneve 23, Switzerland