From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,ca9eef4d5e2078ea X-Google-Attributes: gid103376,public From: dewar@merv.cs.nyu.edu (Robert Dewar) Subject: Re: Beware: Rep spec on an enumeration type causes code explosion Date: 1997/12/06 Message-ID: X-Deja-AN: 295723012 References: X-Complaints-To: usenet@news.nyu.edu X-Trace: news.nyu.edu 881404663 9448 (None) 128.122.140.58 Organization: New York University Newsgroups: comp.lang.ada Date: 1997-12-06T00:00:00+00:00 List-Id: Joe said <<<> There are two issues here. One is the use of a confirming enumeration clause that specifies zero origin, and consecutive integers for the representation. This should have ZERO effect on the generated code. If it does not, then this is a clear inefficiency on the part of the implementation that should be fixed. Certainly such a declaration in GNAT has no effect whatsoever on the generated code. On the other hand, it is totally unnecessary to give such a declaration in Ada 95, since a compiler is *required* to use a zero based representation. You can most certainly "trust" this. In fact I would be more inclined to trust that a compiler would implement this correctly than that it would get enumeration representation clauses right! The relevant statement in the RM (easily found) is: Implementation Requirements 8 For nonboolean enumeration types, if the coding is not specified for the type, then for each value of the type, the internal code shall be equal to its position number. That's clear enough I think. So if you were putting in representation clauses of this confirming type, they are obfuscatory and unnecessary in any case, and should be removed whether or not they trigger compiler madness. As for enumeration types with holes, that's a different matter. There is a fundamental extra processing requirement there for conversion from position to representation values and back again for such opertions as array indexing, and loop control. An Ada programmer should be completely aware that if you use such an enumeration type for any operation involving access to the pos value (e.g. array indexing, loops, succ, pred), then you will pay a price. If you do NOT use any of these operations, then the use of such enumeration types should not introduce any significant overhead, although there may be some space for tables. Measuring the expansion of a 29 line program may be completely bogus here, because you may be looking at the base cost of the tables involved. In the case of GNAT, all code for such enumeration types with holes is generated by the compiler without reference to the runtime (this means that such types are fully supported by the new GNORT tecnology, which is GNAT with No RunTime, a version of GNAT for use in critical embedded applications which require certification). A very useful feature in GNAT is that you can understand a lot about expansion of your code by using the -gnatdg switch, which reconstructs the expanded tree into Ada source. Let's look at it in action for "holey" enumeration types: Here is a little source program: procedure h is type Enum is (A,B,C); for Enum use (1,5,10); begin for J in Enum loop null; end; end; And the -gnatdg output (after correcting the silly syntax error of the missing loop on the end loop :-) is: procedure h is type enum is (a, b, c); for enum use (1, 5, 10); null; enumA : constant array (natural range 0 .. 2) of enum := (a, b, c); function _rep_to_pos (A : enum; F : boolean) return integer is begin case unsigned!(A) is when 1 => return 0; when 5 => return 1; when 10 => return 2; when others => if F then [program_error] else return -1; end if; end case; end _rep_to_pos; ] L_1 : label begin L_1 : for jP in natural range 0 .. 2 loop declare j : constant enum := enumA (jP); begin null; end; end loop L_1; return; end h; You see here that a table is constructed for going from the pos value to the rep value, and a function using a case statement for going in the reverse direction. Obviously pos-to-rep is going to be much more efficient than rep-to-pos, especially in the very sparse case. The loop code generated by GNAT here is pretty nice, since as you see it keeps a shadow of the pos value, and then computes the loop index by using the fast pos-to-rep code. But you can't count on a compiler doing that. The naive code would involve conversions in both directions on each cycle of the loop. For a more expensive use of this feature, let's look at arrays: procedure h is type Enum is (A,B,C); for Enum use (1,5,10); type x is array (Enum) of Integer; v : x; e : enum := B; begin v(e) := 2; end; There is actually a very important implementation dependent decision to be made here, for which the RM gives no hint of a desirable decision, or even a recognition that there *is* an implementation dependence here. And that is whether to index the array by pos or rep values. To index by pos values means that the array type x has size 12 bytes (assuming 4 byte integers), and to index by rep values, means that it has size 40 bytes (i.e. room for 10 integers with holes). At first in GNAT, we chose the representation with holes, which is much more efficient for indexing, but wastes space. However, we found that Verdix had chosen to use the compact pos based representation, and so we changed GNAT to match, since several of our large customers had legacy Verdix code that depended on this assumption. This means that indexing such an array requires the more expensive rep-to-pos conversion. Let's look at the expanded code (the tables are the same of course) th system.unsigned_types; procedure h is type enum is (a, b, c); for enum use (1, 5, 10); type x is array (a .. c) of integer; null; enumA : constant array (natural range 0 .. 2) of enum := (a, b, c); function _rep_to_pos (A : enum; F : boolean) return integer is begin case unsigned!(A) is when 1 => return 0; when 5 => return 1; when 10 => return 2; when others => if F then [program_error] else return -1; end if; end case; end _rep_to_pos; ] freeze TxB [] freeze x [] v : x; e : enum := b; begin xP!(v) (natural(_rep_to_pos (e, true))) := 2; return; end h; and there it is! The _rep_to_pos call that is needed to do the conversion. The xP! here has to do with the fact that the underlying array at the implementation level is different from the one in the source (since its index type is integer range 0 .. 2), the ! is an unchecked conversion from the source type to the implementation type. Like unchecked conversions in general it does not generate code, it is required for type correctness of the expanded tree. The true parameter in the call tells the rep_to_pos routine what to do with a bad value that falls in a hole. Normally this raises program error, but there is one exception, namely the 'Valid attribute. We could generate two tables, to avoid the extra parameter in the normal case. That's a trade off between time and space, and we decided that it is better to have only one copy of that large case statement. The -gnatdg switch is available on all versions of GNAT. The output is Ada source with a few peculiar decorations (like the ! for unchecked conversion above). For details on these decorations, you can look at the text of the sprint.ads unit in the GNAT source distribution. The -gnatdg switch was put in for compiler debugging, but it has proved to be a very useful feature for programmers, to give them a better idea of the consequences of the code they right, and to understand the code that GNAT generates. We are thinking of elevating this switch to full first-class citizenship instead of leaving it as a debugging switch. Robert Dewar Ada Core Technologies