From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00
	autolearn=unavailable autolearn_force=no version=3.4.4
Path: 
 eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!feeder.eternal-september.org!gandalf.srv.welterde.de!news.jacob-sparre.dk!franka.jacob-sparre.dk!pnx.dk!.POSTED.rrsoftware.com!not-for-mail
From: "Randy Brukardt" <randy@rrsoftware.com>
Newsgroups: comp.lang.ada
Subject: Re: Formal Subprogram Access
Date: Thu, 15 Feb 2018 17:03:12 -0600
Organization: JSA Research & Innovation
Message-ID: <p653jh$dbq$1@franka.jacob-sparre.dk>
References: <p5l6bd$ete$1@dont-email.me>
 <p5lne2$fe9$1@franka.jacob-sparre.dk><p5mflm$ibq$1@dont-email.me>
 <p5tudq$5eb$1@franka.jacob-sparre.dk><lyeflpkrh3.fsf@pushface.org>
 <p6019s$hmo$1@franka.jacob-sparre.dk> <lymv0bk59g.fsf@pushface.org>
 <p633tm$cur$1@franka.jacob-sparre.dk>
Injection-Date: Thu, 15 Feb 2018 23:03:13 -0000 (UTC)
Injection-Info: franka.jacob-sparre.dk;
 posting-host="rrsoftware.com:24.196.82.226";
	logging-data="13690"; mail-complaints-to="news@jacob-sparre.dk"
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2900.5931
X-RFC2646: Format=Flowed; Response
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.7246
Xref: reader02.eternal-september.org comp.lang.ada:50462
Date: 2018-02-15T17:03:12-06:00
List-Id: <comp.lang.ada>

I wrote:
...
> One of the big things learned from these automatic grading tools is that 
> it is really easy for junk results to creep into typical ACATS grading 
> setups (which usually depend on comparing against known-good results). I 
> found 4 ACATS tests that were marked as passing for Janus/Ada that 
> actually failed. Two of those actually reflected compiler bugs introduced 
> in recent years (both easy to fix, thank goodness), one was a batch file 
> problem, and one probably was just let off of the to-do list (but of 
> course if it isn't on the to-do list, it isn't very likely to ever be 
> worked on). Thus I'm not too surprised to find similar things for GNAT.

I should clarify a bit about this. Just because the automated tools report 
GNAT as failing on some particular target, that doesn't mean tht GNAT 
actually would fail a formal conformity assessment. There are a number of 
other factors in play.

First, the fully automated testing tool can only handle "usual" tests; those 
that require special handling have to be run specially. For my GNAT tools, 
this includes things like the tests that include foreign language code (the 
ACATS grading tools ignore such code as it isn't relevant to grading - the 
only thing that matters about it is that it was compiled, and it isn't worth 
anyone's time to try to automate detection of non-Ada compilers and non-Ada 
source code). It also includes all of the Annex E tests that require actual 
partitioning (I have no personal interest in figuring out how to configure 
those).

For a formal conformity assessment, many of these special handling things 
would be run using custom scripts (rather than the automatically generated 
scripts generated by the tools); the results of running those scripts could 
be graded with the usual grading tool. This would handle examples like the 
cases above, as well as any other tests that need special options to be 
processed correctly.

Moreover, an implementer doing a formal test would have the opportunity to 
challenge the grading of any test, potentially to explain why it should be 
considered "Passed", to suggest that it be run specially, or even to argue 
that it does not appropriately reflect the rules of Ada. These test disputes 
would be discussed with the ACAL (the formal tester) and possibly with the 
ACAA Technical Agent (that's me). This process can result in modified 
grading requirements for that implementer or for all ACATS users, or even 
the permanent removal of test from the test suite.

Additionally, the ACATS gradiing tool enforces a rather strict view of the 
ACATS grading standards. It's quite likely that a B-Test that it reports as 
failed actually would be graded as passed by a human as the error message is 
"close enough" to the required location. Moreover, a human can read the 
contents of an error message while the grading tool makes no attempt to do 
that. (I've spent some time improving existing ACATS tests so that the 
grading tools are more likely to be able to grade them successfully; but 
doing that for the entire test suite is not going to be a good use of 
limited resources.)

To summarize, just because an automatic test run grades some tests as failed 
doesn't necessarily mean that those tests would be graded as failed in a 
formal conformity assessment. More simply, some failures in an automatic 
test run doesn't mean that the compiler can't pass conformity assessment.

                                       Randy.

P.S. It should be noted that I did most of this GNAT tools work on my own 
time, and not in an official capacity as ACAA Technical Agent. If I had done 
it officially, I wouldn't be allowed to talk about it (which would have 
defeated the purpose of building the tools).