From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: 115aec,d275ffeffdf83655
X-Google-Attributes: gid115aec,public
X-Google-Thread: 146b77,d275ffeffdf83655
X-Google-Attributes: gid146b77,public
X-Google-Thread: 103376,d275ffeffdf83655
X-Google-Attributes: gid103376,public
From: gwinn@ma.ultranet.com (Joe Gwinn)
Subject: Re: Ada vs C++ vs Java
Date: 1999/01/22
Message-ID: <gwinn-2201990017500001@d5.dial-1.cmb.ma.ultra.net>
X-Deja-AN: 435542048
References: <369C1F31.AE5AF7EF@concentric.net>
 <tz5n2.4608$TO5.129375@ptah.visi.com> <369DDDC3.FDE09999@sea.ericsson.se>
 <369e309a.32671759@news.demon.co.uk> <77ledn$eu7$1@remarQ.com>
 <77pnqc$cgi$1@newnews.global.net.uk> <m3ww2mgzmj.fsf@mheaney.ni.net>
 <8p64spq5lo5.fsf@Eng.Sun.COM> <m3zp7hfzvz.fsf@mheaney.ni.net>
 <gwinn-1801992138590001@d214.dial-2.cmb.ma.ultra.net>
 <782r25$k18$1@nnrp1.dejanews.com>
 <gwinn-2001992141150001@d119.dial-2.cmb.ma.ultra.net>
 <787f4b$jl9$1@nnrp1.dejanews.com>
X-Ultra-Time: 22 Jan 1999 05:11:10 GMT
X-Complaints-To: abuse@ultra.net
Organization: Gwinn Instruments
Newsgroups: comp.lang.ada,comp.vxworks,comp.realtime
Date: 1999-01-22T00:00:00+00:00
List-Id: <comp.lang.ada>

In article <787f4b$jl9$1@nnrp1.dejanews.com>, robert_dewar@my-dejanews.com
wrote:

> In article
> <gwinn-2001992141150001@d119.dial-2.cmb.ma.ultra.net>,
> 
> > A data point is a data point, not a universal law.
> > I stand by what I said.  The K&R White Book was the sole
> > C manual for many years, so it'll have to do.  ANSI C is
> > not quite the same langauge, and came many years later.
> 
> That's not the point. Your "data point" is invalid because
> it was an apples and oranges comparison. You cannot compare
> an informal description like the K&R white book with an
> ANSI standard. It is not good enough to say, well I know
> I was comparing X and Y, and they are not comparable but
> that's all I could find to compare, the data point is
> still entirely invalid.

Had a formal C language reference been available back then, I would have
used it.

You keep trying to make it more than I ever claimed for it, an exercise in
knocking strawmen down.  I think I accurately described the data point,
and leave it to the readers to draw whatever conclusions they wish.  


> > > As for the size of the compilers, you were at that time
> > > not looking at modern optimizing compilers. If you
> > > repeat this experiment with modern optimizing
> > > compilers, you will find that for all the languages,
> > > the majority of the complexity,
> > > and weight of the compiler is in the optimizer. xt.
> >
> > Is the optimiser common to both languages?  Back in
> > 1980s, I couldn't isolate the parts of the two compilers,
> > and nothing was shared anyway.
> 
> Again, the fact that you couldn't do the experiment
> properly does not make it valid! Of course you could have
> isolated the parts of the compiler back then, you just did
> not know how to.

You keep trying to make it more than I ever claimed for it.  I think I
accurately described the data point, and leave it to the readers to draw
whatever conclusions they wish.  


> As for the optimizer being common to both languages, yes
> that's typically the case now, that back end optimization
> is language independent in most compilers. In any case
> even if the code isn't common, both compilers have large
> chunks of language independent optimization circuitry.

This was my impression, and you are certainly well-positioned to know.


> > So, tell us, what are the weights of the various front
> > ends, and the optimiser or optimisers?  Is there still a
> > pure "C" front end, or has it been subsumed into the C++
> > front end now?
> 
> This varies of course from one compiler to another, I
> already gave some idea for the gcc compiler.
> 
> There are of course some cases of separate C front ends
> (e.g. GNU C), and some cases of integrated front ends.
> 
> But even confining your measurements to a particular front
> end, or set of front ends, may say more about quality of
> implementation than the language itself. For example,
> comparing a C front end and an Ada front end. If you are
> trying to extract accurate aliasing information (gcc does
> not!) then this is far harder in C, and would take more
> apparatus than in Ada.
> 
> Also, there is pretty much a free choice of where to do
> certainly global optimizations, they can be done early in
> the front end, or later in the backend. You need to really
> know the structure of a compiler well to get any kind of
> meaningful measurements.
> 
> Your attempts to "weigh" compilers cannot possibly give
> anything but vague misleading results. You can "stand by"
> these meaningless results if you like, but that does not
> make them meaningful.

Well, I was hoping for another admittedly crude measure. As discussed
below, it may well be the only numerical metric available, for all its
sins.

You have gnu C, C++ and gnat at your fingertips.  What do the various
components weigh?  


> <<query about preprocessors and C++>>
> 
> You seem to confuse languages and their implementations
> here (and elsewhere in the post). Yes, there was one
> particular implementation of C++ that preprocessed into
> C (just as there is at least one implementation of Ada 95
> that preprocesses into C). It is rarely used for serious
> C++ work these days, since there are many C++ compilers
> around that generate efficient object code directly (the
> same can be said of Ada 95 of course).

I defer to your expertise here.


> > Yep.  One would assume that all these complexities are
> > more or less proportional to each other.   In particular,
> > complex syntax has got to cause complex compiler code.
> 
> No, not at all. An easy mistake for someone who does not
> know compiler details, but syntax is always a trivial part
> of any compiler.

I knew this would draw a jeremiad.

I agree that in theory these two complexities are not strictly
proportional.  However, in practice, one can generally find rough but
servicable proportionalities.  More complexity generally requires more
code to implement, especially on the scale of an entire compiler.  That's
why bigger problems take longer to code.


> And it is NOT AT ALL the case that these complexities are
> more or less proportional, that was the whole point of my
> post (and I gave examples, please reread). Here is another
> example, I can give literally hundreds of similar ones.
[big snip]
> I can rattle on like this for a long time, there are such
> trade offs between nearly every aspect of complexity.
> During the Ada 95 design, we often had arguments about
> complexity, and it was quite helpful when I first proposed
> this more elaborate notion of complexity to realize that
> very often the reason that we disagreed on the issue
> was because we were arguing different dimensions.

All very complex.  I've been in the same kinds of arguments about
operating system design, in POSIX standards meetings, so I can
commiserate.

But, let's get down to brass tacks, using personal experience and whatever
expertise and instincts we can muster to sort some languages into a rough
complexity ranking.  Everybody gets to vote.   Do you agree or disagree,
and why, in 25 words or less?  Note that we are not debating whether
complexity is good or bad, we are just estimating.  Complexity and
capability tend to come hand in hand, along with assumptions about how the
world works or at least should work.  Reasonable people can differ.


First, by pairs:

1.  Fortran 77 versus K&R C?  I would say that f77 was more complex than K&R C. 

2.  K&R C versus ANSI C?  ANSI C must be more complex, as it evolved from
K&R, but not greatly so.

3.  Ada83 versus K&R C?  Ada83 is a factor more complex, by all metrics,
than K&R C, by universal experience.  Actually, C was often derided as
being the moral equivalent of assembler, and not entirely without
justification.

4.  Ada83 versus Ada95?  Ada95 the language is more complex (bigger) than
Ada83, 
to support all the new features, like object orientation (OO).  

5.  ANSI C versus C++ (any flavor)?  C++ is more complex, by a large
factor.  I doubt that there is much argument on this.  It's the OO that
did it.

6.  Ada95 versus C++?  I have heard arguments in both directions, but
nothing really convincing (absent metrics). It may be that each camp
accuses the other of being the larger.  They therefore seem to be about
the same size.  


All in one ordered list, least complex first:

1.  K&R C

2.  ANSI C

3.  Fortran 77

4.  Ada83

5.  C++ and Ada95

Remember, we are doing these rankings based on experience and even
instinct, not theory.


> > Have you suitable numerical metrics for the other
> > components of complexity?
> 
> I don't know that anyone has tried to do this, it would
> be difficult. Certainly this decomposition of the notion
> of complexity makes the problem more tractable, but still
> extremely difficult. But we don't have to have numerical
> metrics for this to be a useful principle for discussion
> (indeed bogus numerical metrics, such as your attempts to
> weigh compilers, can obfuscate clear discussion -- better
> no metrics than bogus ones).

I don't know of any other metrics that work very well either.  It's a bit
like the debate about the usefulness of source lines of code (SLOC) as a
metric of the human labor required to write software.  Many complexity
measures have been tried over the years, but we always seem to come back
to SLOCs, because the more complex metrics just don't seem to predict any
better than just counting the lines.

But I cannot agree that rough metrics are useless, so we must use what's
available, but understand the limitations of the metrics we do use, as
with any tool.


> > > > Assembly language is simpler than any high-order
> > > > language, but it's lots more work to code in
> > > > assembly.
> > >
> > > Now let me guess. The last time you looked at machine
> > > language was in the 80's, right?   Yes, in those days,
> > > the semantics of machine language was pretty simple.
> >
> > No, 1998.  We do read and sometimes write PowerPC
> > assembly code.  Many 1980s machines had more complex
> > instruction sets than the PowerPC.
> 
> I *strongly* disagree. If you are just focussing on the
> functionality of the instruction set, sure, but this is
> trivial in all cases. The complex part of any instruction
> set is the execution efficiency semantics.

Huh?  I was talking about reading and writing assembly code for the
machine.  Many of the older machines had really strange architectures,
driven by the cost of hardware.  As semiconductors got cheaper and better,
the machine architectures got cleaner and simpler, and better suited to
supporting software.  Orthogonality arrived.


> > > I am afraid that things have changed. At this stage the
> > > full execution semantics of a modern chip with
> > > extensive instruction-level parallelism is remarkably
> > > complex along ALL the dimensions I mention above. A
> > > chip like the Pentium II, if you include efficiency
> > > issues, which are indeed not fully documented publicly,
> > > let alone formally specified, you have something far
> > > MORE complicated than any of the languages we are
> > > talking about here.
> >
> > All true, but what has all this to do with the original
> > question, the relative complexity of C, C++, Ada83, and
> > Ada95?
> 
> Not sure, Why not ask Joe Gwinn, it was he who gave
> assembly language as an example of a simple language :-)

Assemblers are generally simpler to write than HOL compilers, even for the
strange hardware architectures of yore, because the assembler did much
less for you than a compiler.  Assembly language was also simpler,
although the writer had to understand more of the hardware to write the
code.


> > > >> Yet, people still use assembly.
> > >
> > > Well barely ...
> >
> > We try to avoid it, but don't always succeed.  Another
> > variant is to code in a high-order language (HOL),
> > inspect the generated assembly, paraphrase the HOL source
> > trying to improve the assembly, iteratively.
> 
> That is exactly what is extremely difficult to do. To know
> how to optimize the code, you have to fully understand the
> scheduling and other instruction parallelism aspects of the
> code. We quite often get comments on the generated code
> from GNAT that show that people do not understand such
> things. A little example, many people would expect in
> 
>   if (A > 4) AND [THEN] (B > 4) then
> 
> that the addition of THEN would speed things up in the
> case where A is indeed greater than 4. This is of course
> quite likely to be false on modern machines where jumps
> can be expensive, and indeed an important optimization
> on predicated machines like the Merced is to eliminate
> short circuiting, and in general to convert if's to
> straight line code without jumps.

Yep.  All true.  Except the part about understanding the "scheduling and
other instruction parallelism".  Actually, we just measure the performance
of some code variations.  I didn't claim it was pretty.  Or that we do it
anywhere but a few critical hotspots.


> > This can be very effective, but it does give a false
> > impression that the code is in a HOL.  It isn't really,
> > because a new compiler will force a repeat of the
> > tuning steps just described.
> 
> Sounds like a highly undesirable practice to me. I would
> recommend instead that you put your energy into learning
> more about how to use and choose compilers effectively.

We don't always have our choice of compilers, and also there are lots of
non-technical issues in their selection, but we still have to get the job
done.


> With modern machines, you are more likely to create a mess
> by mucking with the generated assembly code. This may have
> been a reasonable approach in the 70's but not today!

What is described is changing the HOL source so that the compiler
generates the correct assembly code.  The generated assembly code is not
modified.


Joe Gwinn