From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-0.9 required=5.0 tests=BAYES_00,FORGED_GMAIL_RCVD,
	FREEMAIL_FROM autolearn=no autolearn_force=no version=3.4.4
X-Google-Thread: 103376,bdebc54a485c13a4,start
X-Google-NewGroupId: yes
X-Google-Attributes: gida07f3367d7,domainid0,public,usenet
X-Google-Language: ENGLISH,ASCII-7-bit
Received: by 10.68.231.138 with SMTP id tg10mr11698306pbc.7.1332520147705;
        Fri, 23 Mar 2012 09:29:07 -0700 (PDT)
Path: 
 kz5ni24831pbc.0!nntp.google.com!news2.google.com!goblin3!goblin1!goblin.stu.neva.ru!eternal-september.org!feeder.eternal-september.org!mx04.eternal-september.org!.POSTED!not-for-mail
From: Natasha Kerensikova <lithiumcat@gmail.com>
Newsgroups: comp.lang.ada
Subject: My first compiler bug: work around or redesign?
Date: Fri, 23 Mar 2012 16:29:06 +0000 (UTC)
Organization: A noiseless patient Spider
Message-ID: <slrnjmp96d.1lme.lithiumcat@sigil.instinctive.eu>
Mime-Version: 1.0
Injection-Date: Fri, 23 Mar 2012 16:29:06 +0000 (UTC)
Injection-Info: mx04.eternal-september.org;
 posting-host="Mda950WjNwNLAFOE7yJXQw";
	logging-data="5739"; mail-complaints-to="abuse@eternal-september.org";
	posting-account="U2FsdGVkX19H9sh/hv7SxY7Kw9gsfaTt"
User-Agent: slrn/0.9.9p1 (FreeBSD)
Cancel-Lock: sha1:llvnFPagvgr0g1yFfPoL/ns654U=
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Date: 2012-03-23T16:29:06+00:00
List-Id: <comp.lang.ada>

Hello,

I happen to have encountered my very first compiler bug, or at least
something that claims to be in the following message:

+===========================GNAT BUG DETECTED==============================+
| 4.6.2 20111026 (release) -=> GNAT AUX [FreeBSD64] (x86_64-aux-freebsd9.0) GCC error:|
| in gnat_to_gnu_entity, at ada/gcc-interface/decl.c:4134                  |
| Error detected at dressup-parsers-markdown.adb:184:4 [dressup-parsers-markdown.adb:1651:7 [markdown.adb:32:4]]|
| Please submit a bug report; see http://gcc.gnu.org/bugs.html.            |
| Use a subject line meaningful to you and us to track the bug.            |
| Include the entire contents of this bug box in the report.               |
| Include the exact gcc or gnatmake command that you entered.              |
| Also include sources listed below in gnatchop format                     |
| (concatenated together with no headers between files).                   |
+==========================================================================+

So my first question is, would anyone be kind enough to try and
reproduce that bug?  The files involved are available individually at
http://fossil.instinctive.eu/dressup/dir?ci=bf54531f8dcf7117
or as a tarball at
http://fossil.instinctive.eu/dressup/tarball/Dressup-bf54531f8dcf7117.tar.gz?uuid=bf54531f8dcf71174ccb486812b886701222c342

The reason is that I'm using gnat AUX, derived from gcc 4.6.2, so it's
both unofficial and old (IIRC tasking in FreeBSD 9 is preventing the
next one from being available). It's difficult to create a new building
environment, so I would like to be sure it's really a bug before setting
out to make a minimalistic test case and reporting it.

I guess the problem somehow involves generics:
Dressup.Parsers is a generic package; so Dressup.Parsers.Markdown is
generic too, despite adding no further formal parameter.
markdown.adb:32:4 is the instantiation of Dressup.Parsers.Markdown (off
an instance of Dressup.Parsers instantiated on the line before).
dressup-parsers-markdown.adb:1651:7 is an instantiation of a subprogram
whose specification is at dressup-parsers-markdown.adb:184:4.

Could the issue be caused by having a generic instance inside a generic
instance inside a generic instance? Or is gnat supposed to handle well
such a level of nesting?


All this led me to question my approach and design and programming
practices, so that if I have to rewrite something to work around the
compiler bug, I can rewrite better.

So my first "best practices" question is about using generic subprograms
confined inside a package body. Here is a brutally-simplified version of
what is reported in the compiler bug message:

package Stuff is
   procedure Ordered_List (<some set of parameters);
   procedure Unordered_List (<the same set of parameters);
end Stuff;

package body Stuff is
   generic
      with Prefix (Line : String) return Natural;
   procedure Generic_List (<same set of parameters as previously>);

   function Ordered_Prefix (Line : String) return Natural;
   function Unordered_Prefix (Line : String) return Natural;

   -- subprogram bodies here

   procedure Ordered_List_Instance is new Generic_List (Ordered_Prefix);

   procedure Ordered_List (<same set of parameters as in the spec>)
      renames Oredered_List_Instance;

   procedure Unordered_List_Instance is new Generic_List (Unordered_Prefix);

   procedure Unordered_List (<same set of parameters as in the spec>)
      renames Unordered_List_Instance;

end Stuff;

The rationale here is that Ordered_List and Unordered_List are meant to
be completely independent, so they are presented in the specification as
being completely unrelated.

However, at implementation level, it turns out that they are very
similar: only the prefix recognition change, and further processing is
perfectly identical. So instead of cut-and-pasting code, I would write a
generic that handles all the common aspects, using a formal function for
the prefix part.

Is there something wrong with that approach?
Are there some caveat that I missed?
Are there advantages in avoiding the generics in that situation, for
example using a non-generic common function that takes an
access-to-subprogram extra parameter?

And as a tangential question, could anyone explain me why the "renames"
are required? How come a generic instantiation cannot provide a body for
a publicly-specified subprogram?


And the last part of the message here is about the general design of the
library. I have ended up using a lot of generics and access to
subprograms, but no tagged types (actually some types are tagged, but
only for future expansions, none of the code written here use any tagged
type feature).

I would understand anyone skipping that part of the discussion, but any
constructive comment will be appreciated (though not necessarily acted
upon).

The initial problem I was set out to solve was converting markdown into
HTML, but with enough modularity so that I can convert markdown into PDF
without changing the "markdown" part (that I call "parser", I hope I got
the word right), or convert creole into HTML without changing the "HTML"
part (that I call "renderer"). And as an extra requirement, I want
features of a parser to be easily and individually turned off (e.g.
removing the raw HTML inclusion in markdown for untrusted sources, or
removing the "wiki link" feature of creole where it is used outside of a
wiki).

In my previous iteration of markdown-to-HTML code (in C), I found that
a usable description of a renderer is a bunch of callbacks that operate
on the same shared state.

So for my Ada library, I decided to describe a renderer as a state
object and a set of accesses to procedure. The idea being that each
procedure renders a particular element (e.g. an ordered list, and the
callback for HTML would output "<ol>", the contents and "</ol>").
Language elements without a renderer callback are considered as
disabled.

That way, the renderer does not need to know anything about the parser,
and the parser only handles callbacks and an opaque, so it is also
independent from any particular renderer. Only the client has to care
about both the particular renderer and the particular parser in use.

I went for access to subprogram rather than a tagged type for element
renderers to ensure that all callbacks do share the same state, since in
the tagged type version each element renderer object would have its own
state (presumably referring to some shared state like the output
string or stream), there would then be no compile-time guarantee that
all renderer elements indeed belong to the same renderer. A client who
mistakenly mixes callbacks and ends up with a set of callbacks referring
to one state and another set referring to another, would have no
indication of their mistake before seeing garbage at run time.

Moreover, using dynamic dispatching of tagged type instead of access to
subprogram would mean storing somewhere object of a class-wide type,
i.e. indefinite. So it would mean extra complications like holders
objects, which make the program harder to read and to understand.
These drawbacks without benefit (unless I'm missing something) was
enough for me to rule out the option.

I then proceeded to write the (X)HTML renderer. While thinking about the
implementation, I realized that I would only need to append string
fragments. So I wrote it as a generic package, with an Accumulator
formal type and an Append procedure. Again it looked much simpler than
using an approach based on tagged type (and interfaces), but this time
en client side rather than on library side: Unbounded_String are bundled
with an Append procedure that fits perfectly, streams might be useful
out of box if String'Write can be used directly for Append. With a
interfaces, the client would have to maintain a wrapper around
Unbounded_String or streams or whatever accumulator they re-use, and it
feels to me like unnecessary clutter. Moreover it seems possible and
relatively simple to instance the generic markdown renderer with an
interface type, while the advantages of the generic version seem out of
reach of a version based on interfaces.

With this representation of renderers, I started shaping the parser with
a generic ancestor package Dressup.Parsers, that only defines the type
Element_Renderer used for the callbacks.

The extra genericity and accesses to subprograms of
Dressup.Parsers.Lexers follows the same rationale as for renderers.

I think that covers all the debatable choices, though if you feel like
discussion another one, feel free to do so.


Thanks a lot in advance for your helpful insights,
Natasha