From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: 109fba,baaf5f793d03d420
X-Google-Attributes: gid109fba,public
X-Google-Thread: fc89c,97188312486d4578
X-Google-Attributes: gidfc89c,public
X-Google-Thread: 103376,97188312486d4578
X-Google-Attributes: gid103376,public
X-Google-Thread: 1014db,6154de2e240de72a
X-Google-Attributes: gid1014db,public
From: ok@goanna.cs.rmit.edu.au (Richard A. O'Keefe)
Subject: Re: What's the best language to start with? [was: Re: Should I learn
 C or Pascal?]
Date: 1996/08/26
Message-ID: <4vroh3$17f@goanna.cs.rmit.edu.au>
X-Deja-AN: 176513308
references: <31FBC584.4188@ivic.qc.ca>
 <01bb83f5$923391e0$87ee6fce@timpent.airshields.com>
 <4uah1k$b2o@solutions.solon.com>
 <01bb853b$ca4c8e00$87ee6fce@timpent.airshields.com>
 <4udb2o$7io@solutions.solon.com>
 <01bb8569$9910dca0$87ee6fce@timpent.airshields.com>
 <4urqam$r9u@goanna.cs.rmit.edu.au>
 <01bb8b84$200baa80$87ee6fce@timpent.airshields.com>
 <4vbbf6$g0a@goanna.cs.rmit.edu.au> <01bb8f18$713e0e60$32ee6fce@timhome2>
organization: Comp Sci, RMIT, Melbourne, Australia
newsgroups: comp.lang.c,comp.lang.c++,comp.unix.programmer,comp.lang.ada
nntp-posting-user: ok
Date: 1996-08-26T00:00:00+00:00
List-Id: <comp.lang.ada>


"Tim Behrendsen" <tim@airshields.com> writes:

>They may exist, but does it exist universally?  The point is that to
>go out of your way to use left-field optimizations is just bad
>portability practice.

"Effen it wur good enough fur granpaw, it's good enough fur me."
in other words.  By the way, I find it EXTREMELY odd to call a technique
that has been known and used for nearly 20 years a "left-field" one.
Behrendsen is putting things exactly backwards;
*he* is advocating "do not use a technique in your programming language,
no matter how natural or readable it is, unless you know that every
compiler you use will implement it reasonably".  Well, I *can't* know
that.  Heck, I can't even know that
	X := X+1;
will take constant or even small time; almost all programming language
standardisers say "that's a quality of implementation issue which is
outside our scope".

For an *exact* parallel, consider function inlining.
These days, I expect a good compiler to at the very minimum support
programmer-supplied inlining hints, and the Fortran, Pascal, and C
compilers I normally use do it _automatically_.  This is another
of those optimisations which has been known and used for decades in
the Lisp community, which lets you write your program using a function
call instead of a macro.  I guess inline functions are another
"left-field optimisation".

> For example, if someone does this:

>void sub(char *s)
>{
>    ...
>    if (strlen(s) == 0) {     /* check for null string */
>        ...
>    }
>}

>and they know it's stupid, but they also know the compiler
>just happens to have optimization for it, should they
>still be shot?  "Diana, get me my rifle."

You have prejudged the issue by stipulating that the programmer
doing this KNOWS it is stupid.  To start with, can I just make the
obvious point that in a very large number of currently used languages,
the direct equivalent of this would be the very best thing to do.
There are a lot of *good* programmers out there (definition: in a
reasonable amount of time they can write readable code that works)
who would *not* "know that it's stupid".  It is, for example, much
less stupid than writing
	if (s == "") 

Let's see what is good about this code:
 - it is correct (not a NULL test, not a "" test, but a length test)
 - it is clear (much clearer to people not immersed in C than "!*s")
Doesn't look stupid to me yet.
What's bad about it?
 - it takes O(|s|) time.
BUT
 - standard optimisations like common subexpression elimination, can be
   applied, so that any number of calls to strlen(s) without any
   intervening potential change to s are no more expensive than a
   single call
 - in many applications, all or a large part of s will be processed
   if it is not empty, so the cost of sub() will be O(|s|) anyway
so it's only "stupid" if it occurs in a loop and you know your
compiler _doesn't_ move loop invariants out.

I would go so far as to say that

	int count_blanks(char const * const s) {
	    int i, r;

	    r = 0;
	    for (i = 0; i < strlen(s); i++)
		if (s[i] == ' ')
		    r++;
	    return r;
	}

is a _good_ function.  The ONLY strlen() optimisation that is needed here
is one that is completely general:  strlen() is a pure function and its
argument does not change, so the strlen(s) computation can be hoisted out
of the loop.  Since the optimisation in question is one which has been in
use since the very early Fortran days, it can hardly be called "left-field".
A compiler that _doesn't_ do this is indeed a poor compiler, these days.

Here I am actually arguing against my own inclination, because I strongly
dislike seeing strlen() used this way, and will take advantage of any
excuse to "fix" it.  I certainly tell students why not to do this.  But I
cannot in good conscience call clear correct code "stupid".

>> Scuse please?  WHAT reality?  The reality in this case is that you can
>> in fact in just about any reasonable programming language sort data
>> WITHOUT moving it.

>Wait, hold the phone!  "Sorts the data without moving it"?  What,
>is APL's sorting algorithm O(1)?

No of course not.  I didn't say that.  I said that
	in just about any reasonable programming language
[I did not say APL]
	you can sort data WITHOUT moving it
[I did not say O(1)].

My claim is that YOU CAN SORT THE DATA WITHOUT MOVING *THE DATA*.
I don't see how any sane reader could translate that into O(1),
unless perhaps said reader was under the delusion that sorting
involves nothing but movement!

>Yes, it may not actually get
>sorted until it gets printed, but that's irrelevent to the fact
>that it eventually gets sorted.

Printing is a complete irrelevancy indeed, dragged in solely by
Behrendsen.  It is another of the straw men I could well do without.

Sorting is not about movement.  It is about *determining a permutation*.
Depending on the type of the keys, it may do divisions (as in bucket
sort and radix sort) or comparisons (as in merge sort).
When you are sorting large or variable length records, it is of
practical importance that you can sort the data by returning a
permutation vector WITHOUT MOVING THE ORIGINAL DATA.  This is not
an academic point, because the permutation vector may be 10s or 100s
of times smaller than the original data.

My reference to the APL grade-up primitive made all clear:  grade-up
returns a permutation vector.  You can use a permutation vector to
access the original elements in ascending order, descending order, or
by binary search, or anything you want, without having to pay the
price of moving large records.  One of the things which made the
Bentley & McIlroy "Engineered" version of qsort() complicated was
having to work around the fact that qsort() is set up to sort large
fixed length records; writing the code to take advantage when the
elements turn out to fit into machine registers is tricky, and in
fact not portable.

>Indeed, and I think APL is kind of a neat language to learn.  But
>it hides so much about the internals of the computer, that I think
>it would give a student too many false impressions about how
>things really work.

Of course APL hides the internals of the computer.
That's what a high level language is FOR!
I've done enough programming in ESPOL (B6700 only) and BLISS-10 (DEC-10
only) to be grateful for languages that hide machine details.
I have manuals for
 - System/360
 - Z80 (somewhere)
 - 8051
 - 8086
 - 80960
 - Intergraph Clipper
 - Motorola 68000
 - Motorola 88000
 - SPARC
 - PA-RISC
 - (part of) MIPS 3000 (in Hennesey & someone).
and have studied them.  I not written assembly code for the Z80, 8051,
80960, or PA-RISC, though I have debugged other people's code for the
Z80 and 8051.  I have written assembler for the others, and for the
DEC-10, PDP-11, VAX, Pyramid, IBM RT-PC, and NS32532 (oh yes, and the
Computer Automation Alpha LSI 2 and the PRIME 400).  I know more about
the internals of computers than I really want to, and the better a job
the language I'm using does of hiding them, the better I like it.
(There is a *lot* to like about Oberon, but I cannot like Wirth's
reversion to C-style integral types.)

>I actually quite agree with this; it's how you get the "high level
>thinking" that I think is an issue.

There is an Edinburgh paper from the 70s that I want to quote from.
It came from the EMAS project.  The EMAS operating system was designed
and written at Edinburgh and written an Algol-like language called
IMP 77.  They observed that the operating system modules written by
people skilled in assembly language (and who even went to the extreme
of checking the assembly code produced by the compiler to make sure it
was efficient) tended to be
 - bigger,
 - less readable, and
 - SLOWER
than that produced by people who had a "high level language" perspective.
I'll try to remember to bring the paper in tomorrow so I can quote it
exactly.

>Even in a mathematical proof,
>you are talking about a sequence of micro-steps.  Yes, most
>proofs are built of other proofs, but I think this is more of
>a "proof macro language" than a "high-level mathematical
>language" (whatever that means).

I don't quite see proofs by duality or symmetry as simple macro
substitution.  I think your example here actually gives more comfort
to the anti-assembly camp than to your own.

I once took a masters level course on the Spinor Calculus.
The lecturer (who came from the Sorbonne) wanted to make sure we
understood the whole thing thoroughly, so he started with metamathematics
and constructed everything from the ground up.  We were nearly half way
through the course before we got to the rational numbers.  We ended up
with only two lectures that were actually about Spinors, and  I _still_
don't know what they are.  (I've started reading Penrose & Rindler to
find out.)  In that case at least, over-emphasis on "fundamentals" was
a major impediment to the high level understanding I really wanted.

For what it's worth, students here *do* get taught about computer
architecture, and they *do* get taught a bit of 68000 assembly code,
and they *do* work with little single board machines so they can try 
their 68000 code out.  But this is understood as a computer architecture
course, not as a data structures and algorithms course.  (The course
is not confined to the 68000, but that's all _they_ have to program at
that level.)

-- 
Australian citizen since 14 August 1996.  *Now* I can vote the xxxs out!
Richard A. O'Keefe; http://www.cs.rmit.edu.au/~ok; RMIT Comp.Sci.