From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: 103376,7df278df170d566b,start
X-Google-Attributes: gid103376,public
From: Ken Garlington <garlingtonke@lfwc.lockheed.com>
Subject: ACVC and Compiler Quality (was: Ada Core Technologies blah blah blah)
Date: 1996/04/10
Message-ID: <316B9234.35AA@lfwc.lockheed.com>
X-Deja-AN: 146805129
content-type: text/plain; charset=us-ascii
organization: Lockheed Martin Tactical Aircraft Systems
mime-version: 1.0
newsgroups: comp.lang.ada
x-mailer: Mozilla 2.01 (Macintosh; I; 68K)
Date: 1996-04-10T00:00:00+00:00
List-Id: <comp.lang.ada>

(I don't know if this got posted before, so I'm re-posting...)

Robert Dewar wrote:
> 
>    The point is not that the ACVC suite does not measure product quality,
>    it is that it does not guarantee product quality, because it only
>    addresses certain quality aspects.

OK, let's say that ACVC measures some aspects of product quality, and that
I misunderstood the statement in the old AJPO guide about not using ACVC to
determine "suitability for any particular purpose," or words to that effect.
Is what it measures meaningful to the end user? In other words, does the
presence of the ACVC have a _measureable_ effect on the portability of code
between compilers? How is this _measured_?

It's clear you can respond by saying, "Well, without the ACVC, a reasonable
person would conclude that these attributes would be worse." However, you've
implied that the ACVC suite _measures_ relevant aspects of product quality.
>From this, one could assume that changes in the ACVC test suite, say from 1.x
to 2.x, might change this measure. How did it change, or how is it expected
to change, such measures as portability, etc?

>    You can still have a 100% conformant
>    compiler with no extensions that is of low quality because it has
>    unacceptable capacity limits, or compile/runtime performance for
>    example.

I'm not sure what a "100% conformant compiler" has to do with this thread,
since we've already agreed there's no way to know if a compiler truly
conforms to the language standard. If by "conformant," you mean "passes
all ACVC tests," then you can also add to your list problems due to compilers
not generating correct code, or allowing illegal contructs (I've found "relaxed"
rules in two validated compilers so far), or providing non-standard
(but "allowable," as far as I know) extensions to the language (some of which are
now in the Ada standard), or even providing arguably unallowable extensions,
as in the TLD case.

Given all this, what is the expected benefit of the ACVC to the end user? Let me
say it another way. If I worked on a non-DoD project, and a vendor offered me a
discount if they didn't have to run the ACVC on future versions of their
compiler, under what circumstances should I take it?

>    P.S. It is perfectly possible to trace ACVC tests back to the test
>    objectives in the implementors guide.

Again, a difference of philosophy. The benefit is accrued in my domain when
the traceability matrix is generated during development. Is this what you
are saying happened? Or are you saying something similar to: "We didn't
actually test the product, but it is perfectly possible to do so."

(As to whether the test objectives can be traced back to the ARM, see below.)

>    I do not send along all bug reports for many reasons. Some involve
>    proprietary code, and the bug cannot easily be abstracted into a
>    simple example, some depend on implementation specific issues. For
>    example, the fact that some Ada program triggers an obscure bug
>    in the assembler for some obscure machine does not necessarliy
>    translate into a useful test. Some are quite GNAT specific. Some
>    are untestable issues (nice error messages for instance), etc.

Do you send all bug reports that don't meet these (reasonable) exclusions?
Do other vendors? Should they? If so, what document implies that they should?

>    Obtain and
>    read the ACVC validation procedures. Obtain and read some sample
>    VSR's. Obtain and read John Goodenough's paper. etc.

1) ACVC validation procedures

OK, let me recheck the AdaIC server and see what I find:

(ACVC) VERSION 2.0.1 USER'S GUIDE

"1.3     ACVC Purpose

"The purpose of the ACVC is to check whether an Ada compilation 
system is a conforming implementation, i.e., whether it produces 
an acceptable result for every applicable test.
A fundamental goal of validation is to promote Ada software 
portability by ensuring consistent processing of Ada language 
features as prescribed by [Ada95]."

This is a fundamental goal; however, nowhere have I been able to
find how the effectiveness of the ACVC in meeting this goal has
been measured. Note also that the purpose, essentially, is to
pass the tests!

"ACVC tests use language features in accord with contexts and idioms
normally found in production software."

Presumably, there is some mechanism used to compare the ACVC tests
to the current use of Ada in production systems. Nowhere can I
find how this is done.

The rest of this section painstakingly points out what the ACVC
tests cannot do, which is probably a good idea, since this area
is where supporters of the ACVC seem to feel most secure in
dicussing the ACVC.

So, how are the tests designed? The following section would
seem to be the right place for the answer.

"3.6     General Standards

Tests for 9XBasic and for Ada95 were developed to a general set of 
standards.  To promote a variety of code styles and usage idioms 
in the tests, standards were not necessarily rigorously enforced 
but were used as guidelines for test writers."

Well, that's doesn't sound very encouraging for a standard. In fact,
the section quickly moves into fine naming conventions, etc. I
do discover that:

"In 9XBasic, tests use as few Ada features as necessary to write 
a self-checking executable test that can be read and modified.  
Tests for Ada95 exhibit a usage oriented style, employing a rich 
assortment and interaction of features and exemplifying the kind 
of code styles and idioms that compilers may encounter in 
practice."

Presumably, this emphasis on a "user-oriented" style (which I've
heard repeatedly stressed as a new thing for Ada 95 tests) will
accrue some benefit. This implies that (1) some deficiency was
measured before, and (2) there is some effort to take Ada 95
measures to see if the deficiency is corrected. What are these
deficiencies, and these measures? Not in this document.

How about:

THE ADA COMPILER VALIDATION CAPABILITY (ACVC) 
VERSION 2.0.1 TEST OBJECTIVES DOCUMENT

This immediately goes into brief paragraphs for each test, e.g.

" B360001  Check that, within the definition of a nonlimited composite type or a limited 
composite type that..."

Presumably, the writer of this test contributed to a bi-directional traceability
matrix, but it must have been at the bottom of this document, since my browser died
midway through the download. This would probably help reduce the redundancy found in
the ACVC 1.x tests, as mentioned in the User's Guide.

Both of these documents are March 1996 documents, and are referenced in others documents
which I read previously, such as the Ada 9X Transition Planning Guide and the Ada 95
Adoption Handbook. I couldn't find the answer to my questions in these documents, either.

2) VSRs - Read one. Not a plan. Doesn't answer any of the questions I've raised.

3) John Goodenough's paper - I found several references to the Quality and Style guide
on Ada IC when I searched on "goodenough". Was this what you had in mind, or was there
some other paper? If so, how would the average end user get it? Furthermore, if this
paper is not incorporated into the official AJPO documentation, how useful is it?

>   The Ada compiler vendor market certainly does NOT look like a safety
>   critical market (even by Ken's standards, which seem a bit weak to me
>   since they seem very testing oriented, and I would never consider
>   testing alone to be sufficient for safety critical applications).

Me, neither. Why would you think otherwise?

What's more, the Ada community (well, OK, those members in this thread)
doesn't even seem willing to accept 50% of the weakened standards I've
raised, so what difference does it make where I set the bar?

>   It is definitely the fact that today we do NOT have the technology
>   to develop compilers for modern languages that meet the criteria
>   of safety critical programs.

What about the criteria for SEI Level II? Level III? ISO 9000? _Any_
explicit criteria related to end-use quality, other than:

   "to check whether an Ada compilation system ... produces 
    an acceptable result for every applicable test."

>   This is why all safety critical programs
>   are tested, validated and certified at the object code level (I will
>   certainly not fly a plane where the flight software took on faith
>   the 100% accuracy of all generated code).

Of course, since testing can't prove 100% accuracy... :)

Did you understand the Musa reference? Should I go into more detail
about what I mean?

As an aside, would you fly on an aircraft where formal methods were not
used in the development of the safety-critical software?

>   Is it possible to develop such technology? Yes, I think so, but it
>   involves some substantial work at both the engineering level and
>   research level. The NYU group is actually very interested in this
>   area. We recently made a proposal to NSF, not funded :-( precisely
>   to do work in this area.

Why should it have been funded, given that you claim below that Ada compiler
quality is adequate? If you can answer that, I suspect you will end up
agreeing with Mr. McCabe and myself!

>   If anyone is interested, they can contact
>   me and I will send a copy of the proposal. We are currently exploring
>   other funding sources for this work.

I would be very interested. Please send me a copy. If you don't have my
address, let me know.

>   Could we do more now with current technology? Possibly.

Yes! The part of the thread _I_ want to talk about! :)

>   We could do
>   for example full coverage and path testing for compilers. There are
>   however several problems with this.
> 
>     It would be VERY expensive....

>     It would tend to make it much harder to fix problems and
>     improve the compiler....

These first two I cannot argue, although I can point out that
automation can at least reduce the impact.

On the other hand, it can be argued that going from an SEI I
to an SEI III process will also have these effects, at least
initially. In particular, it may not be a bad idea to slow down
the rate of change of a software item, if that item is
experiencing significant regressions with each release. Also,
you may find that the initial up-front cost is offset by fewer
bug fixes downstream (and less time running that enormous
regression test suite :)

Are you saying that, in order to keep Ada toolset prices low,
you (and the Ada communityu in general) are unable to invest in such
technologies? That Ada toolsets
should be thought of as a commodity product, where you make your
money by volume selling of products with small profit margins?

>     It would divert energy from other important areas. ...
>     What we want to aim at is a situation where problems
>     of compiler bugs and conformance are no more of a problem
>     in the total development process than other factors, and
>     taken as a whole, that thees bugs and other problems do not
>     significantly impact project success or timescales.

THAT'S IT! THAT'S WHAT I WANT!

Now that we agree as to where "we" (the Ada community, hopefully) want
to aim, I'll re-ask some questions:

1. Does the ACVC help in this aim? If so, how can we tell? If not, should it?

2. What other activities should be happening to meet this aim?

3. Where is the plan to make these other activities happen?

  (I think it's in Humphrey's book:

   - If you don't have a destination in mind, any road is a good one, BUT
   - If you don't have a road available, no destination is reachable!)

>   I can certainly understand Ken's frustration, but these are tough
>   problems. I am sure that when uninformed members of the public,
>   or even simply technical people not familiar with avionics, look
>   at yet another report of a military aircraft crash, and ask "Why
>   can't those guys build planes that don't crash?" Certainly
>   congress likes to ask such questions :-)

This is an outstanding example. In Dr. Levison's book "Safeware" (which is
missing from my bookcase, or I'd quote it directly), she talks about the
exemplary safety record in military aviation in recent years. However,
this wasn't always the case. In the past, the record was dismal. It got so
bad that, in desparation, the military put together a _plan_. They
developed a safety standard, made safety a top priority with commanders, etc.
and turned it around. Whenever a plane does crash, there is an investigation,
the root cause is determined, and made public to help avoid repetition.

This is also true in the engineering world. As we've gone into the business
of computer-based flight controls (analog, and later digital), we've put
together industry-wide standards on the development and evaluation of these
systems. When there is an incident, the root cause is established and
published, and the standards are updated. And, as a result, the reliability of those
systems have continued to improve, in a measurable sense.

This is also true for software in general. I've mentioned SEI and ISO 9000
already. You can argue about the effectiveness of these, but they exist.
There were issues about software quality, and the industry put together a
plan to respond.

And now, you're hearing complaints from some users of Ada tools that the
quality overall - not just a single vendor - is not as good as it should be.
Generally, the response has _not_ been, "Yes, here's what we need to do." It's
been:

>   Of course, they simply don't understand the engineering problems
>   and realities involved. Our technology is not perfect, and this
>   is true of many areas.

Well, OK, but so what? You've already made the heinous mistake of admitting
that more can be done, so why isn't it? Why don't Ada vendors use modern
software engineering and management techniques? Or do they? And how would
anyone know?

>   About the best we can hope for is to ensure
>   that compiler technology is not the weak link in the chain, and that
>   when a plane does crash, it is not because of a bug in an Ada compiler!

I am here to tell you that, today, I believe that Ada compiler technology
_is_ the weak link after requirements definition. I see too many papers
(including mine) that say, "Ada is great, but the compiler was buggy" to believe
otherwise. We can (and do) patch that weak link with object code analysis and
other techniques. I am worried (and Musa predicts) that, particularly as Ada
compilers grow more complex, our patch may not hold.

By the way, anyone who tries to use this claim as an argument to not use Ada
is misrepresenting my position entirely. I'm talking about improving a
good product, not denouncing a poor one.

Note that you don't have to be building safety-critical software for this
to be an important issue. How many converts to Ada have we lost due to the
quality of compilers? How many schedules have been impacted? How much money
has been spent? I can spin anecdotes on this, but I would think that someone,
somewhere, should actually be finding out the answers to these questions.
More importantly, they should be working on _improving_ the answers to these
questions. 

If Ada is intended for systems that need to be more reliable than average,
then Ada compilers should be more reliable than the average language compiler.
Not 100% reliable, and perhaps not even at the reliability of a safety-
critical system. Just better than average. This means two things: (1) you
have to continuously measure the reliability, and (2) you have to have an
explicit plan to make that measure improve.

I was encouraged by the TRI-Ada '95 presentation on the Verdix Ada vs. C++
compiler error rates. I would have been more impressed by a presentation
showing a steadily decreasing error rate measured across several vendor
products. (Like _that's_ ever going to happen, based on this thread!)

>   In practice I think we do a pretty good job. I am not aware of any
>   major failure of safety-critical software that can be traced to a
>   compiler bug.

I don't think we have enough operational hours to know. I also don't
know if there is a reporting mechanism available to find out if/when this
does occur. Most importantly, I don't want to wait for the software equivalent
of an O-ring failure to find out that we should have been addressing this
issue. (I keep looking at those curves in Musa's book, and crossing my fingers...)

>   Could we do better? Will we do better in the future?
>   I would certainly hope that the answer is yes and yes.

Sorry, I work for an SEI III contractor. We are not permitted to "hope"
for a goal. ;)