From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.3 required=5.0 tests=BAYES_00,INVALID_MSGID
	autolearn=no autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: 103376,ef33c33c4f98bde1
X-Google-Attributes: gid103376,public
From: "Robert I. Eachus" <eachus@mitre.org>
Subject: Re: Compiler for Z80/6510
Date: 1999/12/01
Message-ID: <3845D4B4.98354460@mitre.org>#1/1
X-Deja-AN: 555586868
Content-Transfer-Encoding: 7bit
References: <slrn83nahf.m1.lutz@taranis.iks-jena.de>
 <383c6fed.458467@news.fiam.net> <slrn83ptfb.kn.lutz@taranis.iks-jena.de>
 <81k67s$47l$1@nnrp1.deja.com> <383DC86C.19A6F176@australia.boeing.com>
 <81m4m4$ci0$1@nnrp1.deja.com> <wcc3dtsvqy2.fsf@world.std.com>
X-Accept-Language: en
Content-Type: text/plain; charset=us-ascii
X-Complaints-To: usenet@news.mitre.org
X-Trace: top.mitre.org 944100145 12160 129.83.41.77 (2 Dec 1999 02:02:25 GMT)
Organization: The MITRE Corporation
Mime-Version: 1.0
NNTP-Posting-Date: 2 Dec 1999 02:02:25 GMT
Newsgroups: comp.lang.ada
Date: 1999-12-02T02:02:25+00:00
List-Id: <comp.lang.ada>


Robert Dewar <robert_dewar@my-deja.com wrote:
 
> In practice, really skilled assembly language programmers (there
> are very few around, especially these days), can always outrun
> any compiler given enough time and effort.

    Once, a long time ago, I taught the APL section of a Computing
Languages course right after the PL/I section.  (As I said, this was a
long time ago.)
The students were using an IBM 3033U mainframe. (As I said...)  The last
assignment for the PL/I segment was to write a program to generate magic
squares in Ada.

    In the first APL session (before the assignment was due.) I put a
$20 bill on the lecturn and said I'd give one to anyone whose submitted
PL/I program was faster than the following APL code.  (APL one-liner
omitted to spare eyesight.)
I then proceded to explain some of the reasons why an interpreted
language could be a reasonable choice for many applications.

    One student who probably didn't really need the course asked after
class how fast the APL version ran.  I handed him a listing showing
another one liner which timed the first program, and the time: 87 ms. 
About a week later, he ambushed me again after class and showed me what
he had done.  First, he used the very clever algorithm from the APL
program and coded it in PL/I.  Took over a second.  Next he wrote a PL/I
null program that still took 99 ms to run.  (He said: "So you made a
pretty safe bet."  Me: "Did you think I didn't know that?"
Next he created an APL workspace that closed itself immediately: 19 ms. 
Hmmm.
So he rewrote the code in IBM 370 assembler, and did a very good job:
233 ms.
Hmmm.  Hmmm!  What was going on?

    I explained that the IBM 3033 had build in a special accelerator for
APL code, instructions that could not be generated by the assembler. 
This set included some vector operations which were very fast when just
moving bits around, arithmetic was still limited by the ALU bottleneck. 
Since the magic square algorithm used only array permutations, it was
very, very fast.

    Now this is a pretty special case, but in general modern processors
have hundreds of special purpose operations which are not available from
the assembler.  This is because the Assembler limits itself to the set
of instructions specified for all the implementations of the
architecture.  For example, on Sun SPARCs, at first it was the case that
you could use (integer) multiply and divide operations which weren't
available in compiler generated code, but could be generated by the 
assembler.  Then it was possible to provide (in Ada terms) code inserts
that generated these instructions.  Then the compilers started
generating them.  Next the OS provided routines which trapped the
instructions if not provided by the particular chip you were running on,
etc.  But about when the version 8 SPARC reference came out, things
proceded in exactly the reverse order with some of the new instructions:
the assembler was the last to support them.  (I could go into details,
but there were serious timing issues for these instructions, and it ws
thought that they would only be used in compiler run-times for special
purposes.)  So for a while, I had some Ada code for the SPARC that ran
faster than the tightest assembler I could write.

    Currently we are in this regime again.  The Intel Pentium II & III
and the Athelon processor have different graphics support enhancements. 
We could argue which is better, but using special libraries and
run-times, in some cases special to both the chip AND the graphics card,
you can generate code which is significantly faster than any Pentium
assembly code.

    Of course, to get these advantages, not only do you have to have
processor specific knowledge, but you often have to be able to program
in machine language, and that is very definitely a dying art.

 
-- 

                                        Robert I. Eachus

with Standard_Disclaimer;
use  Standard_Disclaimer;
function Message (Text: in Clever_Ideas) return Better_Ideas is...