Announcement: GNAT ported to LLVM

comp.lang.ada
 help / color / mirror / Atom feed

* Announcement: GNAT ported to LLVM
@ 2008-03-23 22:05 baldrick
  2008-03-24  9:25 ` Samuel Tardieu
  2008-04-11 13:37 ` baldrick
  0 siblings, 2 replies; 9+ messages in thread
From: baldrick @ 2008-03-23 22:05 UTC (permalink / raw)


Hi, this is to let people know that the recently released
LLVM 2.2 compiler toolkit contains experimental support for
Ada through the llvm-gcc-4.2 compiler.  Currently the only
platform it works on is linux running on 32 bit intel x86.
This is because that's what I run, and I'm the only one who's
been working on this.  I would appreciate help from other Ada
guys, both for porting to new platforms and adding support for
missing features, not to mention testing and bug fixing!.

LLVM (http://llvm.org/) is a set of compiler libraries and tools
for optimization and static and just-in-time code generation.
Personally I find LLVM a lot of fun, and pleasant to work with
due to the good design and clean implementation.  I hope you
will too!  llvm-gcc is gcc with the gcc optimizers replaced by
LLVM's; llvm-gcc-4.2 is the version of llvm-gcc based on gcc-4.2.

The way llvm-gcc works (this is transparent to users) is that the
gcc-4.2 GNAT front-end converts Ada into "gimple", gcc's internal
language independent representation.  The gimple is then turned
into LLVM's internal form, referred to as IR.  This in then run
through LLVM's optimizers, followed by LLVM's code generators
which squirt it out as assembler or object code.  In practice
you can use llvm-gcc as a drop in replacement for gcc.  However
the use of LLVM opens up other possibilities too.

For example, it is possible to have llvm-gcc squirt out LLVM IR
rather than object code (by using -emit-llvm on the command line).
It is possible to link the LLVM IR for different compilation units
together and reoptimize them.  In other words you can do link-time
optimization.  This is all language independent, so if part of your
program is written in Ada and other parts in C/C++/Fortran etc, you
can link them all together and mutually optimize them, resulting in
C routines being inlined into Ada etc.

The compiler works quite well, but it is still experimental.  All
of the ACATS testsuite passes except for c380004 and c393010.  Since
c380004 also fails with gcc-4.2, that makes c393010 the only failure
due to the use of the LLVM infrastructure (the problem comes from
the GNAT front-end which produces a bogus type declaration that the
gimple -> LLVM convertor rejects).  On the other hand, many of the
tests in the GNAT testsuite fail.  The release notes give some more
details of what works and what doesn't:
        http://llvm.org/releases/2.2/docs/ReleaseNotes.html

The precompiled llvm-gcc-4.2 shipped with the LLVM 2.2 release was
built without support for Ada, so you will need to build the compiler
yourself.  You can find instructions at
        http://llvm.org/docs/GCCFEBuildInstrs.html

Please report bugs and problems to the LLVM mailing lists, or using
http://llvm.org/bugs/  One nice thing about LLVM is that people are
responsive and quickly fix bugs (often by the next day).

The LLVM IR is easy to read (with a bit of practice), and since it
contains the entire LLVM state you get to see exactly what has
happened to your program.  This might be useful for static analysis,
it is certainly useful for understanding how the various Ada
constructs
are implemented.  To give you a taste for what it looks like, here is
an example showing what a simple Ada program gets turned into.  Here
is the Ada:

with Ada.Text_IO;
procedure Hello is
begin
   Ada.Text_IO.Put_Line ("Hello world!");
end;

Here's the result of compiling it:

$ gcc -S -O2 -emit-llvm -o - hello.adb
...
        %struct.string___XUB = type { i32, i32 }
...
@.str = internal constant [12 x i8] c"Hello world!"             ; <[12
x i8]*> [#uses=1]
@C.168.1155 = internal constant %struct.string___XUB { i32 1, i32
12 }          ; <%struct.string___XUB*> [#uses=1]

define void @_ada_hello() {
entry:
        tail call void @ada__text_io__put_line__2( i8* getelementptr
([12 x i8]* @.str, i32 0, i32 0), %struct.string___XUB* @C.168.1155 )
        ret void
}

declare void @ada__text_io__put_line__2(i8*, %struct.string___XUB*)


I've dropped the declarations of some uninteresting types and other
info,
thus the ...  Note that passing -S -emit-llvm results in LLVM
assembler
being output (the human readable version of LLVM IR); using -c -emit-
llvm
would result in the compact binary form of LLVM IR, known as bitcode.
Passing -o - causes the assembler to be dumped to the terminal.

Here you can see:

(1) The declaration of Ada.Text_IO.Put_Line:
        declare void @ada__text_io__put_line__2(i8*,
%struct.string___XUB*)
The name ada__text_io__put_line__2 is that generated by GNAT for this
routine.
The function returns no value ("void") and has two arguments: a
pointer to an
i8 (an i8 is an 8 bit integer, in this case a character) and a pointer
to a
%struct.string___XUB, which is a record type.  The declaration of the
type is
        %struct.string___XUB = type { i32, i32 }
which is a record containing two 32 bit integers.  These are the lower
and
upper bounds for the string.  Thus a call two Ada.Text_IO.Put_Line in
fact
passes two arguments, a pointer to the string contents and a pointer
to the
string bounds.

(2) The code defining Hello (_ada_hello).  There is one basic block,
the entry block marked "entry:".  It contains two instructions: a call
and a return instruction.  The call
        tail call void @ada__text_io__put_line__2( i8* getelementptr
([12 x i8]* @.str, i32 0, i32 0), %struct.string___XUB* @C.168.1155 )
is marked as a "tail call".  If you don't know what that means, don't
worry
about it.  The call is to the function @ada__text_io__put_line__2, see
(1) above.
The first parameter is an i8*, a pointer to an 8 bit integer, and has
the value
        getelementptr ([12 x i8]* @.str, i32 0, i32 0)
What is this?  First off, @.str is the string constant
        @.str = internal constant [12 x i8] c"Hello
world!"             ; <[12 x i8]*> [#uses=1]
This is an internal constant, meaning that it is not visible outside
this
compilation unit.  It has type [12 x i8], which is an array of 12
i8's.
It has the value "Hello world!", which is indeed 12 characters long.
There
is a comment on the end of the line (starting with ";") pointing out
the type
of @.str, which [12 x i8]*, a pointer to an array of 12 characters,
and the
fact that @.str is only used in one place.  The getelementptr
instruction is
explained in the LLVM docs, see http://llvm.org/docs/LangRef.html and
also
http://llvm.org/docs/GetElementPtr.html
Here it just converts @.str from a [12 x i8]* into an i8* before
passing it
to @ada__text_io__put_line__2.  In short: a pointer to the H in Hello
World!
is passed as the first parameter of the call.
The second parameter is a pointer to a %struct.string___XUB, a record
holding
the lower and upper bounds for the string.  The value passed is @C.
168.1155,
which is the constant declared as:
        @C.168.1155 = internal constant %struct.string___XUB { i32 1,
i32 12 }          ; <%struct.string___XUB*> [#uses=1]
This is a constant record containing the values 1 (the lower bound)
and 12 (the
upper bound).
The return instruction "ret void" completes execution of the function,
and
returns control to the caller.  The "void" indicates that this routine
does
not actually return anything.

I hope you have fun playing with LLVM!

Duncan.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Announcement: GNAT ported to LLVM
  2008-03-23 22:05 Announcement: GNAT ported to LLVM baldrick
@ 2008-03-24  9:25 ` Samuel Tardieu
  2008-03-24 18:09   ` baldrick
  2008-04-11 13:37 ` baldrick
  1 sibling, 1 reply; 9+ messages in thread
From: Samuel Tardieu @ 2008-03-24  9:25 UTC (permalink / raw)


>>>>> "Duncan" == Duncan Sands <baldrick@free.fr> writes:

Duncan> Hi, this is to let people know that the recently released LLVM
Duncan> 2.2 compiler toolkit contains experimental support for Ada
Duncan> through the llvm-gcc-4.2 compiler.

Thanks Duncan, this is an outstanding contribution to the Ada
community. Given that LLVM is already ahead of GCC in terms of code
generation quality (sometimes, starting from zero and choosing another
path is a competitive advantage), this looks very promising.

Note that the GCC folks are happy with LLVM adding some competition in
the open-source compilers arena. This is very motivating.

The difficult task, as you already know, will be to keep the Ada
front-ends in both compilers in sync. I wish you good luck with that!

  Sam
-- 
Samuel Tardieu -- sam@rfc1149.net -- http://www.rfc1149.net/



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Announcement: GNAT ported to LLVM
  2008-03-24  9:25 ` Samuel Tardieu
@ 2008-03-24 18:09   ` baldrick
  2008-03-27  0:41     ` Gene
  0 siblings, 1 reply; 9+ messages in thread
From: baldrick @ 2008-03-24 18:09 UTC (permalink / raw)


Hi Sam,

> Thanks Duncan, this is an outstanding contribution to the Ada
> community. Given that LLVM is already ahead of GCC in terms of code
> generation quality (sometimes, starting from zero and choosing another
> path is a competitive advantage), this looks very promising.

I'm glad you appreciate my work!  That said, in my experience gcc-4.2
produces slightly faster code for Ada than llvm-gcc-4.2 does.  Given
that LLVM manages to produce code that comes close to gcc while being
much simpler than gcc and easier to improve, I expect it will overtake
gcc soon.  In fact I haven't even started working on Ada specific
optimizer improvements yet: I've been concentrating on correctness.

> The difficult task, as you already know, will be to keep the Ada
> front-ends in both compilers in sync. I wish you good luck with that!

It's not yet clear to me whether I should backport the gcc-4.3 Ada
front-end to llvm-gcc-4.2, or start working on llvm-gcc-4.3.  For the
moment I'm just working on improving the correctness and robustness
of llvm-gcc-4.2.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Announcement: GNAT ported to LLVM
  2008-03-24 18:09   ` baldrick
@ 2008-03-27  0:41     ` Gene
  2008-03-27  8:27       ` baldrick
  0 siblings, 1 reply; 9+ messages in thread
From: Gene @ 2008-03-27  0:41 UTC (permalink / raw)


On Mar 24, 2:09 pm, baldrick <baldr...@free.fr> wrote:
> Hi Sam,
>
> > Thanks Duncan, this is an outstanding contribution to the Ada
> > community. Given that LLVM is already ahead of GCC in terms of code
> > generation quality (sometimes, starting from zero and choosing another
> > path is a competitive advantage), this looks very promising.
>
> I'm glad you appreciate my work!  That said, in my experience gcc-4.2
> produces slightly faster code for Ada than llvm-gcc-4.2 does.  Given
> that LLVM manages to produce code that comes close to gcc while being
> much simpler than gcc and easier to improve, I expect it will overtake
> gcc soon.  In fact I haven't even started working on Ada specific
> optimizer improvements yet: I've been concentrating on correctness.
>
> > The difficult task, as you already know, will be to keep the Ada
> > front-ends in both compilers in sync. I wish you good luck with that!
>
> It's not yet clear to me whether I should backport the gcc-4.3 Ada
> front-end to llvm-gcc-4.2, or start working on llvm-gcc-4.3.  For the
> moment I'm just working on improving the correctness and robustness
> of llvm-gcc-4.2.

This is wonderful, Duncan.  I agree that this is a huge deal for Ada.
I only learned about LLVM a few months ago.  When I did, I filed and
Ada LLVM compiler in my drawer of Utopian ideas.  Thanks for making it
true!

I assume that due to the link-time optimization capability that
inlining among packages will be handled naturally.  GNAT-gcc can't do
that, right?  This alone ought to be a big deal as accessor/setter
conventions are leading to programs filled with tiny procedures and
functions.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Announcement: GNAT ported to LLVM
  2008-03-27  0:41     ` Gene
@ 2008-03-27  8:27       ` baldrick
  2008-03-27 12:43         ` Alex R. Mosteo
  0 siblings, 1 reply; 9+ messages in thread
From: baldrick @ 2008-03-27  8:27 UTC (permalink / raw)


Hi Gene,

> I assume that due to the link-time optimization capability that
> inlining among packages will be handled naturally.

that's correct: you can compile each package to bitcode, then at
link-time they will all be mutually optimized, including inlining
into each other.  You can also compile the entire runtime to bitcode
and have that be mutually optimized with your code too.  I didn't
turn this on by default because currently link-time-optimization is
not transparent: you have to explicitly call some LLVM tools at link
time.  There's a plan to teach llvm-gcc to do this automagically when
you use it to do linking.

> GNAT-gcc can't do that, right?

It can to some extent: if you use -O2 -gnatn then it will peek inside
the bodies of packages you are using to try to inline functions.  That
functionality becomes a lot less useful now though.

> This alone ought to be a big deal as accessor/setter
> conventions are leading to programs filled with tiny procedures and
> functions.

Very true, and I guess that's why ACT implemented -gnatn.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Announcement: GNAT ported to LLVM
  2008-03-27  8:27       ` baldrick
@ 2008-03-27 12:43         ` Alex R. Mosteo
  2008-03-27 15:22           ` baldrick
  0 siblings, 1 reply; 9+ messages in thread
From: Alex R. Mosteo @ 2008-03-27 12:43 UTC (permalink / raw)


baldrick wrote:

> Hi Gene,
> 
>> I assume that due to the link-time optimization capability that
>> inlining among packages will be handled naturally.
> 
> that's correct: you can compile each package to bitcode, then at
> link-time they will all be mutually optimized, including inlining
> into each other.  You can also compile the entire runtime to bitcode
> and have that be mutually optimized with your code too.  I didn't
> turn this on by default because currently link-time-optimization is
> not transparent: you have to explicitly call some LLVM tools at link
> time.  There's a plan to teach llvm-gcc to do this automagically when
> you use it to do linking.
> 
>> GNAT-gcc can't do that, right?
> 
> It can to some extent: if you use -O2 -gnatn then it will peek inside
> the bodies of packages you are using to try to inline functions.  That
> functionality becomes a lot less useful now though.
> 
>> This alone ought to be a big deal as accessor/setter
>> conventions are leading to programs filled with tiny procedures and
>> functions.
> 
> Very true, and I guess that's why ACT implemented -gnatn.

I thought that -gnatN was even more aggressive than -gnatn. That said, in my
experience, while -gnatn rarely causes bugs to arise, -gnatN causes lots of
spurious errors to be reported, forcing to turn it off for many files. Worse
still, failure may be triggered by some spec, but is the file with-ing it that
must be disabled...

In practice I've settled into -O3 -gnatn for optimized builds.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Announcement: GNAT ported to LLVM
  2008-03-27 12:43         ` Alex R. Mosteo
@ 2008-03-27 15:22           ` baldrick
  2008-03-27 17:25             ` Alex R. Mosteo
  0 siblings, 1 reply; 9+ messages in thread
From: baldrick @ 2008-03-27 15:22 UTC (permalink / raw)


> I thought that -gnatN was even more aggressive than -gnatn.

My understanding is that -gnatN was essentially a workaround for
various historical limitations of the gcc inliner, most of which
no longer exist.  I've been told by ACT that -gnatN is deprecated
on the x86.  Personally I never bother with it.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Announcement: GNAT ported to LLVM
  2008-03-27 15:22           ` baldrick
@ 2008-03-27 17:25             ` Alex R. Mosteo
  0 siblings, 0 replies; 9+ messages in thread
From: Alex R. Mosteo @ 2008-03-27 17:25 UTC (permalink / raw)


baldrick wrote:

>> I thought that -gnatN was even more aggressive than -gnatn.
> 
> My understanding is that -gnatN was essentially a workaround for
> various historical limitations of the gcc inliner, most of which
> no longer exist.  I've been told by ACT that -gnatN is deprecated
> on the x86.  Personally I never bother with it.

Good to know.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Announcement: GNAT ported to LLVM
  2008-03-23 22:05 Announcement: GNAT ported to LLVM baldrick
  2008-03-24  9:25 ` Samuel Tardieu
@ 2008-04-11 13:37 ` baldrick
  1 sibling, 0 replies; 9+ messages in thread
From: baldrick @ 2008-04-11 13:37 UTC (permalink / raw)


I made a mistake in the instructions for building llvm-gcc with Ada
support (http://llvm.org/docs/GCCFEBuildInstrs.html): you need to use
the 2005 GNAT GPL edition if you are building the LLVM 2.2 release,
not the 2006 edition as originally stated [*].  I've tweaked the
version
of the Ada front-end in the LLVM subversion repository so that it can
be built using the 2005, 2006 and 2007 GNAT GPL editions, as well as
with gcc-4.2.  It fails to build with gcc-4.3, but I can try to fix
that too if anyone wants it.

Duncan.

[*] llvm-gcc is based on gcc-4.2, which also can't be built with
the 2006 GNAT GPL edition.



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2008-04-11 13:37 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-23 22:05 Announcement: GNAT ported to LLVM baldrick
2008-03-24  9:25 ` Samuel Tardieu
2008-03-24 18:09   ` baldrick
2008-03-27  0:41     ` Gene
2008-03-27  8:27       ` baldrick
2008-03-27 12:43         ` Alex R. Mosteo
2008-03-27 15:22           ` baldrick
2008-03-27 17:25             ` Alex R. Mosteo
2008-04-11 13:37 ` baldrick

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox