From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=unavailable autolearn_force=no version=3.4.4 Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!feeder.eternal-september.org!gandalf.srv.welterde.de!news.jacob-sparre.dk!franka.jacob-sparre.dk!pnx.dk!.POSTED.rrsoftware.com!not-for-mail From: "Randy Brukardt" Newsgroups: comp.lang.ada Subject: Re: Formal Subprogram Access Date: Thu, 15 Feb 2018 17:03:12 -0600 Organization: JSA Research & Innovation Message-ID: References: Injection-Date: Thu, 15 Feb 2018 23:03:13 -0000 (UTC) Injection-Info: franka.jacob-sparre.dk; posting-host="rrsoftware.com:24.196.82.226"; logging-data="13690"; mail-complaints-to="news@jacob-sparre.dk" X-Priority: 3 X-MSMail-Priority: Normal X-Newsreader: Microsoft Outlook Express 6.00.2900.5931 X-RFC2646: Format=Flowed; Response X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.7246 Xref: reader02.eternal-september.org comp.lang.ada:50462 Date: 2018-02-15T17:03:12-06:00 List-Id: I wrote: ... > One of the big things learned from these automatic grading tools is that > it is really easy for junk results to creep into typical ACATS grading > setups (which usually depend on comparing against known-good results). I > found 4 ACATS tests that were marked as passing for Janus/Ada that > actually failed. Two of those actually reflected compiler bugs introduced > in recent years (both easy to fix, thank goodness), one was a batch file > problem, and one probably was just let off of the to-do list (but of > course if it isn't on the to-do list, it isn't very likely to ever be > worked on). Thus I'm not too surprised to find similar things for GNAT. I should clarify a bit about this. Just because the automated tools report GNAT as failing on some particular target, that doesn't mean tht GNAT actually would fail a formal conformity assessment. There are a number of other factors in play. First, the fully automated testing tool can only handle "usual" tests; those that require special handling have to be run specially. For my GNAT tools, this includes things like the tests that include foreign language code (the ACATS grading tools ignore such code as it isn't relevant to grading - the only thing that matters about it is that it was compiled, and it isn't worth anyone's time to try to automate detection of non-Ada compilers and non-Ada source code). It also includes all of the Annex E tests that require actual partitioning (I have no personal interest in figuring out how to configure those). For a formal conformity assessment, many of these special handling things would be run using custom scripts (rather than the automatically generated scripts generated by the tools); the results of running those scripts could be graded with the usual grading tool. This would handle examples like the cases above, as well as any other tests that need special options to be processed correctly. Moreover, an implementer doing a formal test would have the opportunity to challenge the grading of any test, potentially to explain why it should be considered "Passed", to suggest that it be run specially, or even to argue that it does not appropriately reflect the rules of Ada. These test disputes would be discussed with the ACAL (the formal tester) and possibly with the ACAA Technical Agent (that's me). This process can result in modified grading requirements for that implementer or for all ACATS users, or even the permanent removal of test from the test suite. Additionally, the ACATS gradiing tool enforces a rather strict view of the ACATS grading standards. It's quite likely that a B-Test that it reports as failed actually would be graded as passed by a human as the error message is "close enough" to the required location. Moreover, a human can read the contents of an error message while the grading tool makes no attempt to do that. (I've spent some time improving existing ACATS tests so that the grading tools are more likely to be able to grade them successfully; but doing that for the entire test suite is not going to be a good use of limited resources.) To summarize, just because an automatic test run grades some tests as failed doesn't necessarily mean that those tests would be graded as failed in a formal conformity assessment. More simply, some failures in an automatic test run doesn't mean that the compiler can't pass conformity assessment. Randy. P.S. It should be noted that I did most of this GNAT tools work on my own time, and not in an official capacity as ACAA Technical Agent. If I had done it officially, I wouldn't be allowed to talk about it (which would have defeated the purpose of building the tools).