From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: 115aec,d275ffeffdf83655
X-Google-Attributes: gid115aec,public
X-Google-Thread: 146b77,d275ffeffdf83655
X-Google-Attributes: gid146b77,public
X-Google-Thread: 103376,d275ffeffdf83655
X-Google-Attributes: gid103376,public
From: robert_dewar@my-dejanews.com
Subject: Re: Ada vs C++ vs Java
Date: 1999/01/21
Message-ID: <787f4b$jl9$1@nnrp1.dejanews.com>
X-Deja-AN: 435238310
References: <369C1F31.AE5AF7EF@concentric.net>
 <tz5n2.4608$TO5.129375@ptah.visi.com> <369DDDC3.FDE09999@sea.ericsson.se>
 <369e309a.32671759@news.demon.co.uk> <77ledn$eu7$1@remarQ.com>
 <77pnqc$cgi$1@newnews.global.net.uk> <m3ww2mgzmj.fsf@mheaney.ni.net>
 <8p64spq5lo5.fsf@Eng.Sun.COM> <m3zp7hfzvz.fsf@mheaney.ni.net>
 <gwinn-1801992138590001@d214.dial-2.cmb.ma.ultra.net>
 <782r25$k18$1@nnrp1.dejanews.com>
 <gwinn-2001992141150001@d119.dial-2.cmb.ma.ultra.net>
X-Http-User-Agent: Mozilla/4.04 [en] (OS/2; I)
X-Http-Proxy: 1.0 x16.dejanews.com:80 (Squid/1.1.22) for client 205.232.38.14
Organization: Deja News - The Leader in Internet Discussion
X-Article-Creation-Date: Thu Jan 21 14:55:14 1999 GMT
Newsgroups: comp.lang.ada,comp.vxworks,comp.realtime
Date: 1999-01-21T00:00:00+00:00
List-Id: <comp.lang.ada>

In article
<gwinn-2001992141150001@d119.dial-2.cmb.ma.ultra.net>,

> A data point is a data point, not a universal law.
> I stand by what I said.  The K&R White Book was the sole
> C manual for many years, so it'll have to do.  ANSI C is
> not quite the same langauge, and came many years later.

That's not the point. Your "data point" is invalid because
it was an apples and oranges comparison. You cannot compare
an informal description like the K&R white book with an
ANSI standard. It is not good enough to say, well I know
I was comparing X and Y, and they are not comparable but
that's all I could find to compare, the data point is
still entirely invalid.

> > As for the size of the compilers, you were at that time
> > not looking at modern optimizing compilers. If you
> > repeat this experiment with modern optimizing
> > compilers, you will find that for all the languages,
> > the majority of the complexity,
> > and weight of the compiler is in the optimizer. xt.
>
> Is the optimiser common to both languages?  Back in
> 1980s, I couldn't isolate the parts of the two compilers,
> and nothing was shared anyway.

Again, the fact that you couldn't do the experiment
properly does not make it valid! Of course you could have
isolated the parts of the compiler back then, you just did
not know how to.

As for the optimizer being common to both languages, yes
that's typically the case now, that back end optimization
is language independent in most compilers. In any case
even if the code isn't common, both compilers have large
chunks of language independent optimization circuitry.

> So, tell us, what are the weights of the various front
> ends, and the optimiser or optimisers?  Is there still a
> pure "C" front end, or has it been subsumed into the C++
> front end now?

This varies of course from one compiler to another, I
already gave some idea for the gcc compiler.

There are of course some cases of separate C front ends
(e.g. GNU C), and some cases of integrated front ends.

But even confining your measurements to a particular front
end, or set of front ends, may say more about quality of
implementation than the language itself. For example,
comparing a C front end and an Ada front end. If you are
trying to extract accurate aliasing information (gcc does
not!) then this is far harder in C, and would take more
apparatus than in Ada.

Also, there is pretty much a free choice of where to do
certainly global optimizations, they can be done early in
the front end, or later in the backend. You need to really
know the structure of a compiler well to get any kind of
meaningful measurements.

Your attempts to "weigh" compilers cannot possibly give
anything but vague misleading results. You can "stand by"
these meaningless results if you like, but that does not
make them meaningful.

<<query about preprocessors and C++>>

You seem to confuse languages and their implementations
here (and elsewhere in the post). Yes, there was one
particular implementation of C++ that preprocessed into
C (just as there is at least one implementation of Ada 95
that preprocesses into C). It is rarely used for serious
C++ work these days, since there are many C++ compilers
around that generate efficient object code directly (the
same can be said of Ada 95 of course).

> Yep.  One would assume that all these complexities are
> more or less proportional to each other.   In particular,
> complex syntax has got to cause complex compiler code.

No, not at all. An easy mistake for someone who does not
know compiler details, but syntax is always a trivial part
of any compiler.

And it is NOT AT ALL the case that these complexities are
more or less proportional, that was the whole point of my
post (and I gave examples, please reread). Here is another
example, I can give literally hundreds of similar ones.

  A language is easier to describe, define, and use if its
  features are orthogonal. This is a well known principle,
  first clearly enunciated by the Algol-68 design.

  Let's take a particular example, in Ada 95 (same applies
  to any similar Algol style language), we have records
  with fields of whatever type we like.

  We have array types, and it is nice for these to be
  dynamically sizable.

  The above decisions, in an orthogonal design, mean that
  records can have fields whose type is a variable sized
  array. This combination causes absolutely NO increase
  in complexity of description, or use. Indeed a
  restriction that made this particular combination
  illegal would be a non-orthogonal "odd" rule, and would
  make the language more complex from these points of view.

  However, in the implementation domain, dynamic arrays are
  easy to implement, and ordinary records without dynamic
  array components are easy to implement. But the
  combination is annoying, and definitely increases the
  complexity of an implementation.

Let's take another example along a different dimension

  goto statements certainly do not significantly complicate
  the informal description or use of a language like C.
  Typically they don't much affect implementation either,
  though they do make certain optimization techniques
  harder.

  But when it comes to formal description, gotos are a
  real menace. In a denotation semantic definition, a goto
  is often far more trouble to define than say an IF or
  LOOP statement. Have a look at the original SETL sources
  for Ada to see this complication effect (the SETL
  implementation of Ada was essentially a formal definition
  that executed, so was more affected by this consideration
  than a normal implementation would be).

I can rattle on like this for a long time, there are such
trade offs between nearly every aspect of complexity.
During the Ada 95 design, we often had arguments about
complexity, and it was quite helpful when I first proposed
this more elaborate notion of complexity to realize that
very often the reason that we disagreed on the issue
was because we were arguing different dimensions.

For example, the design team often concentrated on making
things simple for the Ada programmer, sometimes at the
expense of other aspects of complexity.

What seemed at first like a disagreement ("This is far
too complex", "no it isn't", "yes it is"), turned out to
be an argument about tradeoffs between different goals, and
once we understood this clearly, it was far easier to
have a clear discussion and come to the right conclusion.

One more example, from a recent thread. The ALL keyword for
general access types in Ada 95 adds no significant
implementation complexity, or complexity in the formal
description. However, it definitely adds complexity to the
user, as noted in previous posts to this group.
>
> Have you suitable numerical metrics for the other
> components of complexity?

I don't know that anyone has tried to do this, it would
be difficult. Certainly this decomposition of the notion
of complexity makes the problem more tractable, but still
extremely difficult. But we don't have to have numerical
metrics for this to be a useful principle for discussion
(indeed bogus numerical metrics, such as your attempts to
weigh compilers, can obfuscate clear discussion -- better
no metrics than bogus ones).

> > > Assembly language is simpler than any high-order
> > > language, but it's lots more work to code in
> > > assembly.
> >
> > Now let me guess. The last time you looked at machine
> > language was in the 80's, right?   Yes, in those days,
> > the semantics of machine language was pretty simple.
>
> No, 1998.  We do read and sometimes write PowerPC
> assembly code.  Many 1980s machines had more complex
> instruction sets than the PowerPC.

I *strongly* disagree. If you are just focussing on the
functionality of the instruction set, sure, but this is
trivial in all cases. The complex part of any instruction
set is the execution efficiency semantics.

> > I am afraid that things have changed. At this stage the
> > full execution semantics of a modern chip with
> > extensive instruction-level parallelism is remarkably
> > complex along ALL the dimensions I mention above. A
> > chip like the Pentium II, if you include efficiency
> > issues, which are indeed not fully documented publicly,
> > let alone formally specified, you have something far
> > MORE complicated than any of the languages we are
> > talking about here.
>
> All true, but what has all this to do with the original
> question, the relative complexity of C, C++, Ada83, and
> Ada95?

Not sure, Why not ask Joe Gwinn, it was he who gave
assembly language as an example of a simple language :-)

> > >> Yet, people still use assembly.
> >
> > Well barely ...
>
> We try to avoid it, but don't always succeed.  Another
> variant is to code in a high-order language (HOL),
> inspect the generated assembly, paraphrase the HOL source
> trying to improve the assembly, iteratively.

That is exactly what is extremely difficult to do. To know
how to optimize the code, you have to fully understand the
scheduling and other instruction parallelism aspects of the
code. We quite often get comments on the generated code
from GNAT that show that people do not understand such
things. A little example, many people would expect in

  if (A > 4) AND [THEN] (B > 4) then

that the addition of THEN would speed things up in the
case where A is indeed greater than 4. This is of course
quite likely to be false on modern machines where jumps
can be expensive, and indeed an important optimization
on predicated machines like the Merced is to eliminate
short circuiting, and in general to convert if's to
straight line code without jumps.


> This can be very effective, but it does give a false
> impression that the code is in a HOL.  It isn't really,
> because a new compiler will force a repeat of the
> tuning steps just described.

Sounds like a highly undesirable practice to me. I would
recommend instead that you put your energy into learning
more about how to use and choose compilers effectively.
With modern machines, you are more likely to create a mess
by mucking with the generated assembly code. This may have
been a reasonable approach in the 70's but not today!

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/       Search, Read, Discuss, or Start Your Own