From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: ***
X-Spam-Status: No, score=3.2 required=5.0 tests=BAYES_00,RATWARE_MS_HASH,
	RATWARE_OUTLOOK_NONAME autolearn=no autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: 109fba,baaf5f793d03d420
X-Google-Attributes: gid109fba,public
X-Google-Thread: fc89c,97188312486d4578
X-Google-Attributes: gidfc89c,public
X-Google-Thread: 103376,97188312486d4578
X-Google-Attributes: gid103376,public
X-Google-Thread: 1014db,6154de2e240de72a
X-Google-Attributes: gid1014db,public
From: "Tim Behrendsen" <tim@airshields.com>
Subject: Re: What's the best language to start with? [was: Re: Should I learn
 C or Pascal?]
Date: 1996/08/13
Message-ID: <01bb88d4$022bf4a0$32ee6fce@timhome2>
X-Deja-AN: 174039550
references: <31FBC584.4188@ivic.qc.ca> <01bb8342$88cc6f40$32ee6fcf@timhome2>
 <4u7grn$eb0@news1.mnsinc.com>
 <01bb83ad$29c3cfa0$87ee6fce@timpent.airshields.com>
 <4u89c4$p7p@solutions.solon.com>
 <01bb83f5$923391e0$87ee6fce@timpent.airshields.com>
 <danpop.839450672@news.cern.ch>
 <01bb8534$b2718bc0$87ee6fce@timpent.airshields.com>
 <danpop.839594575@news.cern.ch>
 <01bb87cf$97ae8e80$87ee6fce@timpent.airshields.com>
 <danpop.839807980@news.cern.ch>
content-type: text/plain; charset=ISO-8859-1
organization: A-SIS
mime-version: 1.0
newsgroups: comp.lang.c,comp.lang.c++,comp.unix.programmer,comp.lang.ada
Date: 1996-08-13T00:00:00+00:00
List-Id: <comp.lang.ada>


Dan Pop <Dan.Pop@cern.ch> wrote in article
<danpop.839807980@news.cern.ch>...
> In <01bb87cf$97ae8e80$87ee6fce@timpent.airshields.com> "Tim Behrendsen"
<tim@airshields.com> writes:
> 
> >Dan Pop <Dan.Pop@cern.ch> wrote in article
> ><danpop.839594575@news.cern.ch>...
> >> In <01bb8534$b2718bc0$87ee6fce@timpent.airshields.com> "Tim
Behrendsen"
> ><tim@airshields.com> writes:
> >> 
> >> >Here's an example:
> >> >
> >> >int a[50000],b[50000],c[50000],d[50000],e[50000];
> >> >
> >> >void test1()
> >> >{
> >> >    int i, j;
> >> >    for (j = 0; j < 10; ++j) {
> >> >        for (i = 0; i < 50000; ++i) {
> >> >            ++a[i]; ++b[i]; ++c[i]; ++d[i]; ++e[i];
> >> >        }
> >> >    }
> >> >}
> >> >
> >> >void test2()
> >> >{
> >> >    int i, j;
> >> >    for (j = 0; j < 10; ++j) {
> >> >        for (i = 0; i < 50000; ++i) ++a[i];
> >> >        for (i = 0; i < 50000; ++i) ++b[i];
> >> >        for (i = 0; i < 50000; ++i) ++c[i];
> >> >        for (i = 0; i < 50000; ++i) ++d[i];
> >> >        for (i = 0; i < 50000; ++i) ++e[i];
> >> >    }
> >> >}
> >> >
> >> >On my AIX system, test1 runs in 2.47 seconds, and test2
> >> >runs in 1.95 seconds using maximum optimization (-O3).  The
> >> >reason I knew the second would be faster is because I know
> >> >to limit the amount of context information the optimizer has
> >> >to deal with in the inner loops, and I know to keep memory
> >> >localized.
> >> 
> >> 1. For a marginal speed increase (~25%), you compromised the
readability
> >>    of the code.
> >
> >You call 25% a "marginal" increase?  It's attitudes like this
> >that give us the slow code world we have right now.
> 
> Yes, I do call 25% a marginal increase.  Would you be considerably
happier
> if everything would run 25% faster, at the expense of everything being
> harder to develop and maintain by a factor of 2?

No, I wouldn't be happier with "a factor of 2", but that's a
very large presumption.  Who's doing anything strange and
arcane?  I just made it easier on the compiler / computer.

> >And how is one less readable than the other?
> 
> Methinks one loop is easier to read than five.

Sometimes, but other times I would rather read five small
loops right in a row than one mammoth loop that I have to
take in all at once.

In this particular case, I think they are both equally
readable.

> >> 2. Another compiler, on another system, might generate faster code
> >>    out of the test1.  This is especially true for supercomputers,
> >>    which have no cache memory (and where the micro-optimizations are
done
> >>    based on a completely different set of criteria) and where the cpu
> >time
> >>    is really expensive.
> >
> >Show me the computer where test1 comes out faster. 
> 
> Already done: my notebook.

Yes, trivially faster.  But how about significantly faster?

> >Or shouldn't we depend on the compiler to optimize this?
> 
> Exactly, it's compiler's job to optimize it, not mine.

Well, the compiler didn't do a very good job, does it?
And that's the whole point.  It is ivory tower naivete to
think that compilers optimize everything perfectly every
time.  Yes, you can just wave your hand and say, "well,
obviously AIX sucks", but in my experience at least, ALL
compilers suck in one way or another.
 
> >> Let's see what happens on my 486DX33 box:
> >> 
> >> So, it's 1.10 + 0.23 = 1.33 seconds of cpu time for test1 versus
> >> 1.17 + 0.18 = 1.35 seconds for test2.  Conclusions:
> >> 1. My 486DX33 is faster than your AIX system (or maybe gcc is faster
than
> >>    xlc) :-)
> >Oops!  I just realized my AIX program had 100 for the outer loop, not
> >10 (had me worried for a minute there ...)
> >
> >> 2. Your results are not reproducible.  Your extra "insight" simply
> >"helped"
> >>    you to generate less readable code.
> >
> >Actually, try again with 500 or 5000 in the inner loops; 50000
> >probably flushed the cache.
> 
> Get a clue.  If test1 flushes the cache, test2 will flush it, as well.
> Both implementations behave the same way WRT cache utilization: all
> 5 x 50000 array elements are accessed in a single iteration of the outer
> loop, hence the same thing (cache hit or cache miss) will happen at the
> next iteration in both versions.  Cache is simply a non-issue in your
> example (even if you intended it to be :-)

- Sigh - I must really think about these posts for more than
the two minutes per thread.  You are right, of course.  The
cache is irrelevent to this example.  It's more likely the
fact that the first case is makes better use of page locality.
It may also be that the compiler ran out of address registers
with five arrays (or a combination of both).

Of course, I could restructure the example to take advantage
of the cache, and get even more improvement. :)
 
> >In any case, the code is identically
> >readable either way IMO, and costs you nothing to implement
> >the efficient way.
> 
> You forgot to provide a VALID justification for your claim that test2
> is the "efficient way".

It's probably mostly the paging locality.  Did you run your
test under DOS or Windows?  It would be interesting to see
if there was a difference.

> >So it wasn't on one architecture, big deal.
> 
> Can you prove that it will be on any other architecture?  According
> to your line of argumentation, I could say: "so it was on one
architecture,
> big deal" :-)

I would have to try it on other architectures to be sure, but
can you come up with a scenerio that the second would be
significantly slower?  I would say that on the average paging
architecture, memory locality tends to be quite important to
performance.

> >On the average, it will be, and costs nothing to do it that way/
> 
> I still have to see the proof.  If your time is worth nothing, then
you're
> right, it costs nothing.

Take the same amount of time, either way.  Just a matter of
paying attention to what you do.

> >> The key to proper optimization is profiling.  Additional knowledge
about
> >> a certain platform is a lot less important.  
> >
> >Agreed; optimization shouldn't be to a certain platform, but
> >optimizations can be made on general basis.
> 
> Right.  By selecting the proper algorithm.  Most other kinds of
> optimizations will be specific to a certain machine, or, at best, class
> of machines.  For example, an optimization which tries to improve the
> cache hit rate might hurt on a supercomputer which has no cache, but it
> is adversely affected by certain memory access patterns.

Well, if you're using a super computer, chances are you *will*
optimize to that architecture, because super computer time
tends to be expensive.  For general purpose software, the
architectures are pretty much the same all around.  They all
have caches, they all have paging.

> When programming for a single platform/compiler combination, it makes
> sense to perform micro-optimizations, _if they're rewarding enough_.
> But then, if portability is not an issue, the critical code could be
> written in assembly, as well.  When writing portable code in a HLL,
> micro-optimizations are usually a waste of effort and programmer time.

I think that most architectures in the real world are close
enough that you can find commonality, such as inefficient
recursion.

-- Tim Behrendsen (tim@airshields.com)