From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,LOTS_OF_MONEY, T_MONEY_PERCENT autolearn=ham autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,42427d0d1bf647b1 X-Google-Attributes: gid103376,public From: dewar@cs.nyu.edu (Robert Dewar) Subject: Re: Ada Core Technologies and Ada95 Standards Date: 1996/04/10 Message-ID: X-Deja-AN: 146901165 references: <00001a73+00002c20@msn.com> <828038680.5631@assen.demon.co.uk> <828127251.85@assen.demon.co.uk> <315FD5C9.342F@lfwc.lockheed.com> <3160EFBF.BF9@lfwc.lockheed.com> organization: Courant Institute of Mathematical Sciences newsgroups: comp.lang.ada Date: 1996-04-10T00:00:00+00:00 List-Id: Ken said (double >> are mine) >> Ken Garlington asks why it is infeasible for a compiler vendor to deliver >> the source code to the AVF for anaysis. >Actually, I didn't ask this, but we can talk about it if you like... There must be two Ken Garlington's around, the other one said in a previous post: "I could think of several ways to include other types of testing in an ACVC, e.g. a requirement to deliver of source code and supporting data to an AVF for analysis." (this is an exactly cutted-and-pasted quote :-) >What I actually asked was, "Is there some way to modify the scope of the ACVC >process to improve compiler quality across all vendors? Or, is there something >outside the scope of the ACVC that could be done to improve compiler quality >across all vendors?" >Your answer: No, because we'd have to make an _investment_ to improve >compiler quality. To get to 100% quality (whatever that means) would take >too much money (and is technically infeasible). Therefore, no investment >should be made. You miss the point entirely (seems to be happening consistently, so I must not be clear, hence my attempt to clarify!) Of COURSE! more investment is desirable to improve quality. ACT is a very focused company, our only business is improving the quality of GNAT! The issue is whether spending more resources on the ACVC testing is the way to do it. I very much doubt it. One thing we have not discussed is the ways in which the ACVC can, if not carefully handled, actually *decrease* compiler quality. Right now the rules in the Ada game are that you MUST pass the ACVC tests, above anything else, and in particular, above any other testing or quality control procedures that may make sense. Passing the ACVC suite is not a trivial excercise, and I think all vendors would agree that they have had to devote considerable resources to this task. These are resources not available for other quality improving tasks. This means that we have to be very careful that we do not divert too many resources and reach a point of diminishing returns. For example: Suppose we spent ten times as much on the ACVC suite, and had ten times the number of tests (that's really the only way I could see spending this investment, since the current tests are as effective as the contractor (SAIC) and the review team know how to make them). There is no question that conformance would be improved, but I think that the resulting diversion of resources would actually reach the point of decreasing returns. As Ken has pointed out there are many other activities besides ACVC testing that can contribute to quality: Process control (with ISO 9000 or SEI audits) Systematic white box testing (e.g. path testing) Stress testing (the NPL tool) Performance testing (e.g. with the ACES and PIWG tools) Systematic regression testing Purpose built test suites for particular compilers All these steps may be extremely useful, and compete with the ACVC process for resources. If you are acquiring an Ada compiler, you certainly do more in your evaluation than assure that it is validated. You may well, if your experience shows this is worthwhile, require additional testing, or for example, require that the compiler vendor have ISO 9000 certification. Certainly IBM did quality audits on Ada vendors internal process control for the NASA space station project, and this seems entirely appropriate to me. I think part of the confusion here is that Ken (and John) use the ACVC as a kind of code word for "all the things that might be done to ensure quality", but in fact the ACVC is and has always been defined to be restricted to black box conformance testing. Perhaps what has gone wrong here is that Ken and other Ada users have been confused into thinking that the ACVC was intended to encapsulate the entire question of quality assessment of compilers, and that was never its intention. Let's go back to the ISO 9000 question for a moment. Suppose that your assessment shows that it is essential that a compiler vendor have ISO 9000 certification (some do, Alsys obtained this certification, what was involved was basically a lot of paper work, writing down the procedures that had always been followed in formal and precise form). Then it is entirely reasonable for you to require this as part of your procurement process. Naturally you won't want to do this unless your careful analysis shows that this really does contribute to quality, because otherwise you will be requiring vendors to divert resources, but if your analysis shows it's a good idea, and a good way to spend resources, then fine. BUT! It is not appropriate for NIST conformance testing to include this criterion, becaues the Ada standard does not (and could not) say anything about process, since it is a language standard, and hence only defines the semantics of the language. So you cannot look to the ACVC for help here. Similarly, you may determine that performance of generated code is critical, and consequently place a lot of weight on the ACES test results (if your analysis shows that the ACES accurately captures qualities that are important to you). But this cannot be part of NIST conformance testing, since the Ada standard does not (and could not) say anything about performance. Choosing among available compilers is not an easy task. I think that, particuarly early on, procurement officers hoped that the ACVC let them off the hook -- "I'll stick to validated compilers and I will be guaranteed that I have reasonable tools." Unforunately, it is not so simple, and the evaluation process for tools is much more complex. The ACVC testing helps as a kind of first-cut qualification, but that is all it is, and all it pretends to be. Thorough evaluation will have to go well beyond the "Is_Validated" predicate. >The cost of IIV&V? I don't know the exact F-22 figure at the moment, but >it's probably significantly less than 5% of the development cost. IIV&V >is done on far more than a mere 500K SLOCs on F-22. I recommend AFSC Pamplet >800-5, which helps estimate such costs, and also explains IIV&V. Based on your >discussion below, I'm guessing you're not familiar with this process. That *is* a surprise. I admit I am much more familiar with MoD regulations in the Safety Critical area, and with typical British and European procedures. These tend to be much more oriented to formal specifications and formal proof, and so the certification process tends to be much more of the cost, as far as I can gather. >"What statistics there are show that path testing catches approximately >half of all bugs caught during unit testing or approximately 35% of all >bugs.... When path testing is combined with other methods, such as limit >checks on loops, the percentage of bugs caught rises to 50% to 60% in >unit testing." > - Beizer, Software Testing Techniques, 1983. >So, if path testing were required for Ada vendors - either at the vendor's >site, or at an AVF - this would be the expected benefit over not doing it. First of all, I know of no data that would suggest that Beizer's results extend to the compiler domain, so that would have to be investigated. The danger here is that you enormously increase the cost of ACVC testing. There would be two ways of implementing what you suggest 1. Require witness testing of path testing. This seems entirely infeasible. Right now, witness testing of the absolutely fixed set of tests costs of the order of $20K, and witness testing for full path coverage would require a huge amount of analysis, and a lot of very specialized expertise. 2. Require a DOC-like document to be signed saying that full path testing should be done Either of these approaches would in addition require the technical work for full path testing. One very significant problem would be the issue of deactivated code. GNAT for example is programmed in a very defensive style. It is full of tests which are probably unnecessary, but either the proof that they are unnecessary is too difficult, or the tests are useful as defense against problems. Of course all code runs into the issue of deactivated code, but I suspect that systematic testing of a complete compiler would indicate that this problem is intractable for compilers, or at least very difficult. In any case, the issue is whether you would increase quality or decrease quality by this means. It seems to me that this would put too much emphasis on conformance, and too little emphasis on performance. Note also that in the case of optimization algorithms, proving that they do not violate the language rules is significant, but if that is all you do, you miss the point! You want to concentrate at least some effort on showing that the optimizations work. Looking at our customer base, the problems that people have (that are due to GNAT, rather than customer code) fall into several different categories. I think the list would be similar for any compiler. 1. Safe compiler bugs (compiler bombs or gives an incorrect message) 2. Unsafe compiler bugs (compiler generates wrong code) 3. Performance is inadequate 4. Optional features define 5. Features not defined in the RM are needed (bindings, libraries, preprocessors, other tool interfaces). 6. Error messages are confusing 7. Implementation dependent decisions are different from other compilers. 8. Capacity limitations Quality improvement for GNAT involves addressing all of these areas. The ACVC tests concentrate entirely on points 1 and 2. Certainly these are important areas, especially 2, and the few times we have run into code generation problems, we certainly consider them to be of the highest priority. Your suggestion of path testing still adds only to points 1 and 2, and I fear in fact that your entire emphasis is on points 1 and 2. The trouble is that for most of our customers, 3-8 are also important. For example, many of our customers are porting large applications from other compilers to GNAT, and in these porting efforts, point 7 is by far the most significant in our experience. >Interesting. So, once you reach 500K SLOCs, you can no longer perform adequate >testing of software. What a relief! Now, if the F-22 fails to satisfy the >customer, I have an ironclad alibi! :) Again, the issue is that the domains are very different. Your assumption that F22 techniques and observations can apply to compilers are no more valid than if I thought that you should easily be able to make a retargetable flight software program that would work on the F22 or 777 with minimal work :-) >Are you saying that we're wasting money re-running ACVC tests on changed >products? Maybe we could use that money to do process audits! See, that's >exactly the kind of thinking I'm looking for here. Good idea! Sorry, I never said that, I think it is definitely very useful to rerun the ACVC suite whenever any change is made to the compiler. We depend on this as one (among several) of our continuing internal quality audit. >We don't have a formal specification of the F-22 software, either. >Can you come to our first flight readiness review, and explain to the pilots >why we're not able to give him any confidence in the performance of the system >because we're missing a formal specification? I do find that surprising. Talking to the folks doing similar development in England, they seem to be very much more focused on formal specifications. Of course ultimately one important thing to remember is that the requirement on your F22 software is not that it be 100% correct, but that in practice it be 100% reliable, and these are rather different criteria. Clealy the 100% correct criterion would require a formal specification, and could not be demonstrated by testing, but the 100% reliability criterion is quite different. >> Ken, in your message, you again refer to users expecting the ACVC suite >> to guarantee conformance to the standard. > >I did? Must have been my evil twin. The same one who talked about sending code to the AVF no doubt :-) :-) >What I actually asked was, "Is there some way to modify the scope of the ACVC >process to improve compiler quality across all vendors? Or, is there something >outside the scope of the ACVC that could be done to improve compiler quality >across all vendors?" Not clear, the danger as I note above is that if you move in the wrong direction here, you can easily damage compiler quality. >> P.S. If you would like to send a check for $25 million to ACT, I think >> I can promise that 5 years from now we wlil have a compiler that is >> much closer to conforming to the standard (of course I can also promise >> this if you *don't* send us the $25 million :-) >Interesting. Your process for improving the quality of your product is >unrelated to the available resources? Wish _we_ had a system like that. >(Or maybe I don't.) You missed the point of my :-) We expect GNAT to succeed, and to invest substantial resources in improving its quality even if we don't get your $25 million. There is no doubt that your $25 million check would have a positive impact on GNAT, but we can manage without it :-) I notice you use the word "conformance" rather than "quality". Are these synonyms, to you? They aren't to me. I suspect they aren't to Mr. McCabe, or most other vendors. No of course they are not synonyms! That's the whole point. ACVC measures conformance which is just one aspect of quality. What did I ever say that made you think I think these are synonymous? The whole point of my comments is that they are NOT synonymous. In practice NIST testing using the ACVC suite can only measure conformance and as you point out repeatedly, and as I agree with repeatedly, this is NOT the one and only measure of quality, even if the ACVC could assure 100% conformance.