From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: fc89c,97188312486d4578
X-Google-Attributes: gidfc89c,public
X-Google-Thread: 103376,97188312486d4578
X-Google-Attributes: gid103376,public
X-Google-Thread: 109fba,baaf5f793d03d420
X-Google-Attributes: gid109fba,public
X-Google-Thread: 1014db,6154de2e240de72a
X-Google-Attributes: gid1014db,public
From: Dan.Pop@cern.ch (Dan Pop)
Subject: Re: What's the best language to start with? [was: Re: Should I learn
 C or Pascal?]
Date: 1996/08/11
Message-ID: <danpop.839807980@news.cern.ch>
X-Deja-AN: 173599805
sender: news@news.cern.ch (USENET News System)
x-nntp-posting-host: ues5.cern.ch
references: <31FBC584.4188@ivic.qc.ca> <01bb8342$88cc6f40$32ee6fcf@timhome2>
 <4u7grn$eb0@news1.mnsinc.com>
 <01bb83ad$29c3cfa0$87ee6fce@timpent.airshields.com>
 <4u89c4$p7p@solutions.solon.com>
 <01bb83f5$923391e0$87ee6fce@timpent.airshields.com>
 <danpop.839450672@news.cern.ch>
 <01bb8534$b2718bc0$87ee6fce@timpent.airshields.com>
 <danpop.839594575@news.cern.ch>
 <01bb87cf$97ae8e80$87ee6fce@timpent.airshields.com>
organization: CERN European Lab for Particle Physics
newsgroups: comp.lang.c,comp.lang.c++,comp.unix.programmer,comp.lang.ada
Date: 1996-08-11T00:00:00+00:00
List-Id: <comp.lang.ada>


In <01bb87cf$97ae8e80$87ee6fce@timpent.airshields.com> "Tim Behrendsen" <tim@airshields.com> writes:

>Dan Pop <Dan.Pop@cern.ch> wrote in article
><danpop.839594575@news.cern.ch>...
>> In <01bb8534$b2718bc0$87ee6fce@timpent.airshields.com> "Tim Behrendsen"
><tim@airshields.com> writes:
>> 
>> >Here's an example:
>> >
>> >int a[50000],b[50000],c[50000],d[50000],e[50000];
>> >
>> >void test1()
>> >{
>> >    int i, j;
>> >    for (j = 0; j < 10; ++j) {
>> >        for (i = 0; i < 50000; ++i) {
>> >            ++a[i]; ++b[i]; ++c[i]; ++d[i]; ++e[i];
>> >        }
>> >    }
>> >}
>> >
>> >void test2()
>> >{
>> >    int i, j;
>> >    for (j = 0; j < 10; ++j) {
>> >        for (i = 0; i < 50000; ++i) ++a[i];
>> >        for (i = 0; i < 50000; ++i) ++b[i];
>> >        for (i = 0; i < 50000; ++i) ++c[i];
>> >        for (i = 0; i < 50000; ++i) ++d[i];
>> >        for (i = 0; i < 50000; ++i) ++e[i];
>> >    }
>> >}
>> >
>> >On my AIX system, test1 runs in 2.47 seconds, and test2
>> >runs in 1.95 seconds using maximum optimization (-O3).  The
>> >reason I knew the second would be faster is because I know
>> >to limit the amount of context information the optimizer has
>> >to deal with in the inner loops, and I know to keep memory
>> >localized.
>> 
>> 1. For a marginal speed increase (~25%), you compromised the readability
>>    of the code.
>
>You call 25% a "marginal" increase?  It's attitudes like this
>that give us the slow code world we have right now.

Yes, I do call 25% a marginal increase.  Would you be considerably happier
if everything would run 25% faster, at the expense of everything being
harder to develop and maintain by a factor of 2?

>And how is one less readable than the other?

Methinks one loop is easier to read than five.

>> 2. Another compiler, on another system, might generate faster code
>>    out of the test1.  This is especially true for supercomputers,
>>    which have no cache memory (and where the micro-optimizations are done
>>    based on a completely different set of criteria) and where the cpu
>time
>>    is really expensive.
>
>Show me the computer where test1 comes out faster. 

Already done: my notebook.

>Or shouldn't we depend on the compiler to optimize this?

Exactly, it's compiler's job to optimize it, not mine.

>> Let's see what happens on my 486DX33 box:
>> 
>> So, it's 1.10 + 0.23 = 1.33 seconds of cpu time for test1 versus
>> 1.17 + 0.18 = 1.35 seconds for test2.  Conclusions:
>> 
>> 1. My 486DX33 is faster than your AIX system (or maybe gcc is faster than
>>    xlc) :-)
>
>Oops!  I just realized my AIX program had 100 for the outer loop, not
>10 (had me worried for a minute there ...)
>
>> 2. Your results are not reproducible.  Your extra "insight" simply
>"helped"
>>    you to generate less readable code.
>
>Actually, try again with 500 or 5000 in the inner loops; 50000
>probably flushed the cache.

Get a clue.  If test1 flushes the cache, test2 will flush it, as well.
Both implementations behave the same way WRT cache utilization: all
5 x 50000 array elements are accessed in a single iteration of the outer
loop, hence the same thing (cache hit or cache miss) will happen at the
next iteration in both versions.  Cache is simply a non-issue in your
example (even if you intended it to be :-)

>In any case, the code is identically
>readable either way IMO, and costs you nothing to implement
>the efficient way.

You forgot to provide a VALID justification for your claim that test2
is the "efficient way".

>So it wasn't on one architecture, big deal.

Can you prove that it will be on any other architecture?  According
to your line of argumentation, I could say: "so it was on one architecture,
big deal" :-)

>On the average, it will be, and costs nothing to do it that way/

I still have to see the proof.  If your time is worth nothing, then you're
right, it costs nothing.

>> The key to proper optimization is profiling.  Additional knowledge about
>> a certain platform is a lot less important.  
>
>Agreed; optimization shouldn't be to a certain platform, but
>optimizations can be made on general basis.

Right.  By selecting the proper algorithm.  Most other kinds of
optimizations will be specific to a certain machine, or, at best, class
of machines.  For example, an optimization which tries to improve the
cache hit rate might hurt on a supercomputer which has no cache, but it
is adversely affected by certain memory access patterns.

When programming for a single platform/compiler combination, it makes
sense to perform micro-optimizations, _if they're rewarding enough_.
But then, if portability is not an issue, the critical code could be
written in assembly, as well.  When writing portable code in a HLL,
micro-optimizations are usually a waste of effort and programmer time.

Dan
--
Dan Pop
CERN, CN Division
Email: Dan.Pop@cern.ch 
Mail:  CERN - PPE, Bat. 31 R-004, CH-1211 Geneve 23, Switzerland