From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,LOTS_OF_MONEY,
	T_MONEY_PERCENT autolearn=ham autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: 103376,42427d0d1bf647b1
X-Google-Attributes: gid103376,public
From: dewar@cs.nyu.edu (Robert Dewar)
Subject: Re: Ada Core Technologies and Ada95 Standards
Date: 1996/04/10
Message-ID: <dewar.829192619@schonberg>
X-Deja-AN: 146901165
references: <00001a73+00002c20@msn.com> <dewar.827809782@schonberg>
 <828038680.5631@assen.demon.co.uk> <dewar.828062076@schonberg>
 <828127251.85@assen.demon.co.uk> <dewar.828157508@schonberg>
 <315FD5C9.342F@lfwc.lockheed.com> <dewar.828415807@schonberg>
 <3160EFBF.BF9@lfwc.lockheed.com> <dewar.829186739@schonberg>
organization: Courant Institute of Mathematical Sciences
newsgroups: comp.lang.ada
Date: 1996-04-10T00:00:00+00:00
List-Id: <comp.lang.ada>

Ken said (double >> are mine)

>> Ken Garlington asks why it is infeasible for a compiler vendor to deliver
>> the source code to the AVF for anaysis.

>Actually, I didn't ask this, but we can talk about it if you like...

 There must be two Ken Garlington's around, the other one said in
 a previous post:

   "I could think of several ways to include other types of testing in an
   ACVC, e.g.  a requirement to deliver of source code and supporting
   data to an AVF for analysis."

   (this is an exactly cutted-and-pasted quote :-)

>What I actually asked was, "Is there some way to modify the scope of the ACVC
>process to improve compiler quality across all vendors? Or, is there something
>outside the scope of the ACVC that could be done to improve compiler quality
>across all vendors?"

>Your answer: No, because we'd have to make an _investment_ to improve
>compiler quality. To get to 100% quality (whatever that means) would take
>too much money (and is technically infeasible). Therefore, no investment
>should be made.

   You miss the point entirely (seems to be happening consistently, so I
   must not be clear, hence my attempt to clarify!)

   Of COURSE! more investment is desirable to improve quality. ACT is
   a very focused company, our only business is improving the quality
   of GNAT! The issue is whether spending more resources on the ACVC
   testing is the way to do it. I very much doubt it.

   One thing we have not discussed is the ways in which the ACVC can,
   if not carefully handled, actually *decrease* compiler quality.
   Right now the rules in the Ada game are that you MUST pass the
   ACVC tests, above anything else, and in particular, above any other
   testing or quality control procedures that may make sense.

   Passing the ACVC suite is not a trivial excercise, and I think all
   vendors would agree that they have had to devote considerable resources
   to this task. These are resources not available for other quality
   improving tasks. This means that we have to be very careful that we
   do not divert too many resources and reach a point of diminishing
   returns.

   For example: Suppose we spent ten times as much on the ACVC suite, and
   had ten times the number of tests (that's really the only way I
   could see spending this investment, since the current tests are
   as effective as the contractor (SAIC) and the review team know
   how to make them). There is no question that conformance would be
   improved, but I think that the resulting diversion of resources
   would actually reach the point of decreasing returns.

   As Ken has pointed out there are many other activities besides ACVC
   testing that can contribute to quality:

      Process control (with ISO 9000 or SEI audits)
      Systematic white box testing (e.g. path testing)
      Stress testing (the NPL tool)
      Performance testing (e.g. with the ACES and PIWG tools)
      Systematic regression testing
      Purpose built test suites for particular compilers

   All these steps may be extremely useful, and compete with the ACVC
   process for resources.

   If you are acquiring an Ada compiler, you certainly do more in your
   evaluation than assure that it is validated. You may well, if your
   experience shows this is worthwhile, require additional testing,
   or for example, require that the compiler vendor have ISO 9000
   certification. Certainly IBM did quality audits on Ada vendors
   internal process control for the NASA space station project, and
   this seems entirely appropriate to me.

   I think part of the confusion here is that Ken (and John) use the
   ACVC as a kind of code word for "all the things that might be done
   to ensure quality", but in fact the ACVC is and has always been
   defined to be restricted to black box conformance testing. Perhaps
   what has gone wrong here is that Ken and other Ada users have been
   confused into thinking that the ACVC was intended to encapsulate
   the entire question of quality assessment of compilers, and that
   was never its intention.

   Let's go back to the ISO 9000 question for a moment. Suppose that your
   assessment shows that it is essential that a compiler vendor have ISO
   9000 certification (some do, Alsys obtained this certification, what
   was involved was basically a lot of paper work, writing down the
   procedures that had always been followed in formal and precise form).

   Then it is entirely reasonable for you to require this as part of
   your procurement process. Naturally you won't want to do this unless
   your careful analysis shows that this really does contribute to
   quality, because otherwise you will be requiring vendors to divert
   resources, but if your analysis shows it's a good idea, and a good
   way to spend resources, then fine.

   BUT! It is not appropriate for NIST conformance testing to include this
   criterion, becaues the Ada standard does not (and could not) say anything
   about process, since it is a language standard, and hence only defines
   the semantics of the language. So you cannot look to the ACVC for help
   here.

   Similarly, you may determine that performance of generated code is
   critical, and consequently place a lot of weight on the ACES test
   results (if your analysis shows that the ACES accurately captures
   qualities that are important to you). But this cannot be part of 
   NIST conformance testing, since the Ada standard does not (and could
   not) say anything about performance.

   Choosing among available compilers is not an easy task. I think that,
   particuarly early on, procurement officers hoped that the ACVC let
   them off the hook -- "I'll stick to validated compilers and I will
   be guaranteed that I have reasonable tools." Unforunately, it is
   not so simple, and the evaluation process for tools is much more
   complex. The ACVC testing helps as a kind of first-cut qualification,
   but that is all it is, and all it pretends to be. Thorough evaluation
   will have to go well beyond the "Is_Validated" predicate.

>The cost of IIV&V? I don't know the exact F-22 figure at the moment, but
>it's probably significantly less than 5% of the development cost. IIV&V
>is done on far more than a mere 500K SLOCs on F-22. I recommend AFSC Pamplet
>800-5, which helps estimate such costs, and also explains IIV&V. Based on your
>discussion below, I'm guessing you're not familiar with this process.

   That *is* a surprise. I admit I am much more familiar with MoD regulations
   in the Safety Critical area, and with typical British and European
   procedures. These tend to be much more oriented to formal specifications
   and formal proof, and so the certification process tends to be much
   more of the cost, as far as I can gather.

>"What statistics there are show that path testing catches approximately
>half of all bugs caught during unit testing or approximately 35% of all
>bugs.... When path testing is combined with other methods, such as limit
>checks on loops, the percentage of bugs caught rises to 50% to 60% in
>unit testing."

>    - Beizer, Software Testing Techniques, 1983.

>So, if path testing were required for Ada vendors - either at the vendor's
>site, or at an AVF - this would be the expected benefit over not doing it.

  First of all, I know of no data that would suggest that Beizer's results
  extend to the compiler domain, so that would have to be investigated.

  The danger here is that you enormously increase the cost of ACVC testing.
  There would be two ways of implementing what you suggest

    1. Require witness testing of path testing. This seems entirely
       infeasible. Right now, witness testing of the absolutely fixed
       set of tests costs of the order of $20K, and witness testing
       for full path coverage would require a huge amount of analysis,
       and a lot of very specialized expertise.

    2. Require a DOC-like document to be signed saying that full path
       testing should be done

  Either of these approaches would in addition require the technical work
  for full path testing. One very significant problem would be the issue
  of deactivated code. GNAT for example is programmed in a very defensive
  style. It is full of tests which are probably unnecessary, but either
  the proof that they are unnecessary is too difficult, or the tests are
  useful as defense against problems. Of course all code runs into the
  issue of deactivated code, but I suspect that systematic testing of
  a complete compiler would indicate that this problem is intractable
  for compilers, or at least very difficult.

  In any case, the issue is whether you would increase quality or decrease
  quality by this means. It seems to me that this would put too much
  emphasis on conformance, and too little emphasis on performance.

  Note also that in the case of optimization algorithms, proving that they
  do not violate the language rules is significant, but if that is all you
  do, you miss the point! You want to concentrate at least some effort on
  showing that the optimizations work.

  Looking at our customer base, the problems that people have (that are
  due to GNAT, rather than customer code) fall into several different
  categories. I think the list would be similar for any compiler.

     1. Safe compiler bugs (compiler bombs or gives an incorrect message)

     2. Unsafe compiler bugs (compiler generates wrong code)

     3. Performance is inadequate

     4. Optional features define     5. Features not defined in the RM are needed (bindings, libraries,
        preprocessors, other tool interfaces).

     6. Error messages are confusing

     7. Implementation dependent decisions are different from other
        compilers.

     8. Capacity limitations

  Quality improvement for GNAT involves addressing all of these areas. The
  ACVC tests concentrate entirely on points 1 and 2. Certainly these are
  important areas, especially 2, and the few times we have run into code
  generation problems, we certainly consider them to be of the highest
  priority.

  Your suggestion of path testing still adds only to points 1 and 2, and
  I fear in fact that your entire emphasis is on points 1 and 2. The
  trouble is that for most of our customers, 3-8 are also important.
  For example, many of our customers are porting large applications
  from other compilers to GNAT, and in these porting efforts, point
  7 is by far the most significant in our experience.

>Interesting. So, once you reach 500K SLOCs, you can no longer perform adequate
>testing of software. What a relief! Now, if the F-22 fails to satisfy the
>customer, I have an ironclad alibi! :)

  Again, the issue is that the domains are very different. Your assumption
  that F22 techniques and observations can apply to compilers are no more
  valid than if I thought that you should easily be able to make a
  retargetable flight software program that would work on the F22 or 777
  with minimal work :-)

>Are you saying that we're wasting money re-running ACVC tests on changed
>products? Maybe we could use that money to do process audits! See, that's
>exactly the kind of thinking I'm looking for here. Good idea!

  Sorry, I never said that, I think it is definitely very useful to rerun
  the ACVC suite whenever any change is made to the compiler. We depend
  on this as one (among several) of our continuing internal quality audit.

>We don't have a formal specification of the F-22 software, either.
>Can you come to our first flight readiness review, and explain to the pilots
>why we're not able to give him any confidence in the performance of the system
>because we're missing a formal specification?

  I do find that surprising. Talking to the folks doing similar development
  in England, they seem to be very much more focused on formal specifications.
  Of course ultimately one important thing to remember is that the requirement
  on your F22 software is not that it be 100% correct, but that in practice
  it be 100% reliable, and these are rather different criteria. Clealy the
  100% correct criterion would require a formal specification, and could not
  be demonstrated by testing, but the 100% reliability criterion is quite
  different.

>> Ken, in your message, you again refer to users expecting the ACVC suite
>> to guarantee conformance to the standard.
>
>I did? Must have been my evil twin.

  The same one who talked about sending code to the AVF no doubt :-) :-)

>What I actually asked was, "Is there some way to modify the scope of the ACVC
>process to improve compiler quality across all vendors? Or, is there something
>outside the scope of the ACVC that could be done to improve compiler quality
>across all vendors?"

  Not clear, the danger as I note above is that if you move in the wrong
  direction here, you can easily damage compiler quality.

>> P.S. If you would like to send a check for $25 million to ACT, I think
>> I can promise that 5 years from now we wlil have a compiler that is
>> much closer to conforming to the standard (of course I can also promise
>> this if you *don't* send us the $25 million :-)

>Interesting. Your process for improving the quality of your product is
>unrelated to the available resources? Wish _we_ had a system like that.
>(Or maybe I don't.)

  You missed the point of my :-) We expect GNAT to succeed, and to invest
  substantial resources in improving its quality even if we don't get your
  $25 million. There is no doubt that your $25 million check would have a
  positive impact on GNAT, but we can manage without it :-)

I notice you use the word "conformance" rather than "quality". Are these
synonyms, to you? They aren't to me. I suspect they aren't to Mr. McCabe,
or most other vendors.

  No of course they are not synonyms! That's the whole point. ACVC measures
  conformance which is just one aspect of quality. What did I ever say that
  made you think I think these are synonymous? The whole point of my
  comments is that they are NOT synonymous. In practice NIST testing
  using the ACVC suite can only measure conformance and as you point out
  repeatedly, and as I agree with repeatedly, this is NOT the one and
  only measure of quality, even if the ACVC could assure 100% conformance.