comp.lang.ada
 help / color / mirror / Atom feed
* Profiling Ada binaries
@ 2016-07-22 13:24 Markus Schöpflin
  2016-07-22 14:59 ` Alejandro R. Mosteo
  0 siblings, 1 reply; 15+ messages in thread
From: Markus Schöpflin @ 2016-07-22 13:24 UTC (permalink / raw)


Dear list,

I realize that the following question is rather vague, but I hope that perhaps 
someone can give me some advice on how to proceed, or maybe has already 
experienced similar issues.

I am trying to profile code generated by Gnat using Gprof. For this I add -pg 
to the compilation and link switches, turn down optimization to -O1 and 
disable inlining of functions only called once 
(-fno-inline-functions-called-once).

The result obtained when running gprof to generate the report from the 
profiling data gathered during running the executable is rather strange, as 
the report suggests that e.g. elaboration functions of compilation units are 
called countless times, which of course most likely isn't true.

To illustrate, here are the first few lines of a flat profile:

Flat profile:

Each sample counts as 0.01 seconds.
   %   cumulative   self              self     total
  time   seconds   seconds    calls   s/call   s/call  name
   4.66      4.91     4.91      429     0.01     0.01  foo1
   4.41      9.55     4.64 162635540     0.00     0.00  bar___elabs
   3.99     13.76     4.21   515639     0.00     0.00  foo2

As one can see, the second line suggests that the elaboration for the spec of 
the package bar is called 162635540 times. Looking at the call graph of 
bar___elabs shows:

                 0.00    0.00       1/162635540     adainit [92]
                 0.00    0.00      33/162635540 
standard_math__float_32_elementary__arccosX [2249]
                 0.00    0.00    1461/162635540 
standard_math__float_32_elementary__arcsinX [1099]
                 0.00    0.00  108848/162635540 
standard_math__float_64_elementary__arctanX [1573]
                 0.00    0.00  109380/162635540 
standard_math__float_64_elementary__sinX [1572]
                 0.00    0.00  163937/162635540 
standard_math__float_64_elementary__cosX [687]
                 0.01    0.00  515639/162635540 
standard_math__float_32_elementary__OexponX [609]
                 0.03    0.00 1029874/162635540 
standard_math__float_32_elementary__tanX [346]
                 0.04    0.00 1358089/162635540 
standard_math__float_64_elementary__sqrtX [454]
                 0.04    0.00 1543778/162635540 
standard_math__float_32_elementary__logX [558]
                 0.50    0.00 17564360/162635540 
standard_math__float_32_elementary__expX [93]
                 0.56    0.00 19681964/162635540 
standard_math__float_32_elementary__arctanX [56]
                 0.79    0.00 27790226/162635540 
standard_math__float_32_elementary__cosX [51]
                 0.82    0.00 28730603/162635540 
standard_math__float_32_elementary__sinX [49]
                 1.83    0.00 64037347/162635540 
standard_math__float_32_elementary__sqrtX [45]
[33]     4.4    4.64    0.00 162635540         bar___elabs [33]

There is one legit call from adainit, all other reported calls are most likely 
incorrect. The many calls to standard math functions most likely are correct, 
they just don't call bar___elabs.

Did anybody encounter such anomalies before? Is there a way to get correct 
profiling information for Ada binaries generated by Gnat? Different compiler 
flags maybe? Or using a different profiling tool altogether?

TIA,
Markus


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Profiling Ada binaries
  2016-07-22 13:24 Profiling Ada binaries Markus Schöpflin
@ 2016-07-22 14:59 ` Alejandro R. Mosteo
  2016-07-22 15:05   ` Alejandro R. Mosteo
  2016-07-25  6:57   ` Markus Schöpflin
  0 siblings, 2 replies; 15+ messages in thread
From: Alejandro R. Mosteo @ 2016-07-22 14:59 UTC (permalink / raw)


On 22/07/16 15:24, Markus Schöpflin wrote:
> Dear list,
>
> (...)
>
> Flat profile:
>
> Each sample counts as 0.01 seconds.
>   %   cumulative   self              self     total
>  time   seconds   seconds    calls   s/call   s/call  name
>   4.66      4.91     4.91      429     0.01     0.01  foo1
>   4.41      9.55     4.64 162635540     0.00     0.00  bar___elabs
>   3.99     13.76     4.21   515639     0.00     0.00  foo2
>
> As one can see, the second line suggests that the elaboration for the
> spec of the package bar is called 162635540 times. Looking at the call
> graph of bar___elabs shows:
>
>                 0.00    0.00       1/162635540     adainit [92]
>                 0.00    0.00      33/162635540
> standard_math__float_32_elementary__arccosX [2249]
>                 0.00    0.00    1461/162635540
> standard_math__float_32_elementary__arcsinX [1099]
>                 0.00    0.00  108848/162635540
> standard_math__float_64_elementary__arctanX [1573]
>                 0.00    0.00  109380/162635540
> standard_math__float_64_elementary__sinX [1572]
>                 0.00    0.00  163937/162635540
> standard_math__float_64_elementary__cosX [687]
>                 0.01    0.00  515639/162635540
> standard_math__float_32_elementary__OexponX [609]
>                 0.03    0.00 1029874/162635540
> standard_math__float_32_elementary__tanX [346]
>                 0.04    0.00 1358089/162635540
> standard_math__float_64_elementary__sqrtX [454]
>                 0.04    0.00 1543778/162635540
> standard_math__float_32_elementary__logX [558]
>                 0.50    0.00 17564360/162635540
> standard_math__float_32_elementary__expX [93]
>                 0.56    0.00 19681964/162635540
> standard_math__float_32_elementary__arctanX [56]
>                 0.79    0.00 27790226/162635540
> standard_math__float_32_elementary__cosX [51]
>                 0.82    0.00 28730603/162635540
> standard_math__float_32_elementary__sinX [49]
>                 1.83    0.00 64037347/162635540
> standard_math__float_32_elementary__sqrtX [45]
> [33]     4.4    4.64    0.00 162635540         bar___elabs [33]
>
> There is one legit call from adainit, all other reported calls are most
> likely incorrect. The many calls to standard math functions most likely
> are correct, they just don't call bar___elabs.

It's been a long time since I last used gprof, so with the caveat that I 
may be totally off the mark: perhaps you're reading the detail in 
reverse? That is, from the single elaboration call you're calling 
something than in turn calls the math functions.

IIRC these tools tend to show things top/bottom, with times in inner 
calls being summed up as you unwind the call stack. That way you know 
what takes all the time (in this case, elaboration), and smaller parts 
of the pie being shown within it.

But then I miss in these calls whatever is calling the math operations 
in the elaboration code. Perhaps seeing your procedure might help.

Alex.

>
> Did anybody encounter such anomalies before? Is there a way to get
> correct profiling information for Ada binaries generated by Gnat?
> Different compiler flags maybe? Or using a different profiling tool
> altogether?
>
> TIA,
> Markus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Profiling Ada binaries
  2016-07-22 14:59 ` Alejandro R. Mosteo
@ 2016-07-22 15:05   ` Alejandro R. Mosteo
  2016-07-25  7:01     ` Markus Schöpflin
  2016-07-25  6:57   ` Markus Schöpflin
  1 sibling, 1 reply; 15+ messages in thread
From: Alejandro R. Mosteo @ 2016-07-22 15:05 UTC (permalink / raw)


On 22/07/16 16:59, Alejandro R. Mosteo wrote:
> On 22/07/16 15:24, Markus Schöpflin wrote:
>> Dear list,
>>
>> (...)

Incidentally, you might want to try with callgrind / kcachegrind to 
compare results. IIRC gprof is less accurate given that it instruments 
code with some granularity while valgrind does not have that limitation.

Alex.

>>
>> Flat profile:
>>
>> Each sample counts as 0.01 seconds.
>>   %   cumulative   self              self     total
>>  time   seconds   seconds    calls   s/call   s/call  name
>>   4.66      4.91     4.91      429     0.01     0.01  foo1
>>   4.41      9.55     4.64 162635540     0.00     0.00  bar___elabs
>>   3.99     13.76     4.21   515639     0.00     0.00  foo2
>>
>> As one can see, the second line suggests that the elaboration for the
>> spec of the package bar is called 162635540 times. Looking at the call
>> graph of bar___elabs shows:
>>
>>                 0.00    0.00       1/162635540     adainit [92]
>>                 0.00    0.00      33/162635540
>> standard_math__float_32_elementary__arccosX [2249]
>>                 0.00    0.00    1461/162635540
>> standard_math__float_32_elementary__arcsinX [1099]
>>                 0.00    0.00  108848/162635540
>> standard_math__float_64_elementary__arctanX [1573]
>>                 0.00    0.00  109380/162635540
>> standard_math__float_64_elementary__sinX [1572]
>>                 0.00    0.00  163937/162635540
>> standard_math__float_64_elementary__cosX [687]
>>                 0.01    0.00  515639/162635540
>> standard_math__float_32_elementary__OexponX [609]
>>                 0.03    0.00 1029874/162635540
>> standard_math__float_32_elementary__tanX [346]
>>                 0.04    0.00 1358089/162635540
>> standard_math__float_64_elementary__sqrtX [454]
>>                 0.04    0.00 1543778/162635540
>> standard_math__float_32_elementary__logX [558]
>>                 0.50    0.00 17564360/162635540
>> standard_math__float_32_elementary__expX [93]
>>                 0.56    0.00 19681964/162635540
>> standard_math__float_32_elementary__arctanX [56]
>>                 0.79    0.00 27790226/162635540
>> standard_math__float_32_elementary__cosX [51]
>>                 0.82    0.00 28730603/162635540
>> standard_math__float_32_elementary__sinX [49]
>>                 1.83    0.00 64037347/162635540
>> standard_math__float_32_elementary__sqrtX [45]
>> [33]     4.4    4.64    0.00 162635540         bar___elabs [33]
>>
>> There is one legit call from adainit, all other reported calls are most
>> likely incorrect. The many calls to standard math functions most likely
>> are correct, they just don't call bar___elabs.
>
> It's been a long time since I last used gprof, so with the caveat that I
> may be totally off the mark: perhaps you're reading the detail in
> reverse? That is, from the single elaboration call you're calling
> something than in turn calls the math functions.
>
> IIRC these tools tend to show things top/bottom, with times in inner
> calls being summed up as you unwind the call stack. That way you know
> what takes all the time (in this case, elaboration), and smaller parts
> of the pie being shown within it.
>
> But then I miss in these calls whatever is calling the math operations
> in the elaboration code. Perhaps seeing your procedure might help.
>
> Alex.
>
>>
>> Did anybody encounter such anomalies before? Is there a way to get
>> correct profiling information for Ada binaries generated by Gnat?
>> Different compiler flags maybe? Or using a different profiling tool
>> altogether?
>>
>> TIA,
>> Markus
>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Profiling Ada binaries
  2016-07-22 14:59 ` Alejandro R. Mosteo
  2016-07-22 15:05   ` Alejandro R. Mosteo
@ 2016-07-25  6:57   ` Markus Schöpflin
  1 sibling, 0 replies; 15+ messages in thread
From: Markus Schöpflin @ 2016-07-25  6:57 UTC (permalink / raw)


Am 22.07.2016 um 16:59 schrieb Alejandro R. Mosteo:
> On 22/07/16 15:24, Markus Schöpflin wrote:
>> Dear list,
>>
>> (...)
>>
>> Flat profile:
>>
>> Each sample counts as 0.01 seconds.
>>   %   cumulative   self              self     total
>>  time   seconds   seconds    calls   s/call   s/call  name
>>   4.66      4.91     4.91      429     0.01     0.01  foo1
>>   4.41      9.55     4.64 162635540     0.00     0.00  bar___elabs
>>   3.99     13.76     4.21   515639     0.00     0.00  foo2
>>

[...]

> It's been a long time since I last used gprof, so with the caveat that I may
> be totally off the mark: perhaps you're reading the detail in reverse? That
> is, from the single elaboration call you're calling something than in turn
> calls the math functions.

Not likely, as the spec of bar just contains a bunch of array and record 
declarations and constants to initialize them with. Something along the lines of:

	type T is array(1..10)(1..10) of Float;
	T_Def : constant := (others => (others => 0.0));

But interestingly, when looking at the generated object code, the elaboration 
code of the spec of bar is adjacent to some helper functions used by those 
trigonometric functions. Looks like gprof somehow gets lost when translating 
back to symbolic names.

Markus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Profiling Ada binaries
  2016-07-22 15:05   ` Alejandro R. Mosteo
@ 2016-07-25  7:01     ` Markus Schöpflin
  2016-07-25 16:45       ` rieachus
  0 siblings, 1 reply; 15+ messages in thread
From: Markus Schöpflin @ 2016-07-25  7:01 UTC (permalink / raw)


Am 22.07.2016 um 17:05 schrieb Alejandro R. Mosteo:

> Incidentally, you might want to try with callgrind / kcachegrind to compare
> results. IIRC gprof is less accurate given that it instruments code with some
> granularity while valgrind does not have that limitation.

We tried that before, but *grind slows down performance too much, so we cannot 
profile the affected binary. (It's some kind of soft real time application, so 
we can't just let it sit and run over night.)

It currently looks like we have some success with perf right now. The key 
seems to be to add "-fno-inline-functions-called-once" to the compilation 
flags to get any meaningful call stack.

Markus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Profiling Ada binaries
  2016-07-25  7:01     ` Markus Schöpflin
@ 2016-07-25 16:45       ` rieachus
  2016-07-25 17:14         ` Simon Wright
  2016-07-26  8:37         ` Markus Schöpflin
  0 siblings, 2 replies; 15+ messages in thread
From: rieachus @ 2016-07-25 16:45 UTC (permalink / raw)


On Monday, July 25, 2016 at 3:01:07 AM UTC-4, Markus Schöpflin wrote:
I am trying to profile code generated by Gnat using Gprof. For this I add -pg 
to the compilation and link switches, turn down optimization to -O1 and 
disable inlining of functions only called once 
(-fno-inline-functions-called-once).

Gee. I would never think to compile the math libraries with -O1.  Seriously, the math libraries are written with ease of understanding in mind.  You may have thousands of calls in the implementation of one function, and due to the packages being generic, every one of those calls will do an elaboration check.  How can that be efficient?  I believe GNAT has non-generic versions for Short_Float, Float, and Long_Float which use the hardware built-ins.  But I doubt you would get that automatically with -O1.

Try compiling everything with -O3 (or whatever you use) then recompile only the unit you want the tracing for with -O1 and -fno-inline-functions-called-once.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Profiling Ada binaries
  2016-07-25 16:45       ` rieachus
@ 2016-07-25 17:14         ` Simon Wright
  2016-07-25 22:05           ` rieachus
  2016-07-26  8:37         ` Markus Schöpflin
  1 sibling, 1 reply; 15+ messages in thread
From: Simon Wright @ 2016-07-25 17:14 UTC (permalink / raw)


rieachus@comcast.net writes:

> Seriously, the math libraries are written with ease of understanding
> in mind.

Like the rest of GNAT, then.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Profiling Ada binaries
  2016-07-25 17:14         ` Simon Wright
@ 2016-07-25 22:05           ` rieachus
  0 siblings, 0 replies; 15+ messages in thread
From: rieachus @ 2016-07-25 22:05 UTC (permalink / raw)


On Monday, July 25, 2016 at 1:14:06 PM UTC-4, Simon Wright wrote:
> rieachus@comcast.net writes:
> 
> > Seriously, the math libraries are written with ease of understanding
> > in mind.
> 
> Like the rest of GNAT, then.

Actually, I was thinking of the original Generic Elementary Functions package and the other standard generic math packages.  These were the result of work for about a decade of NUMWG and were printed, not just packages specifications but bodies as well as special issues of Ada Letters.  Along the way the best algorithms known for many of the functions (in terms of least significant bit--lsb--errors) were improved several times.  And of course, the IEEE Floating Point standards were being improved at the same time, by some of the same people.

There is a lot to be said for using the (hardware) built-in GEF, and as I said, GNAT does let you get at the 32 and 64 bit IEEE floating point hardware.  You may not know that it is possible to access (still) the 80-bit IEEE Extended in x86 (or should that be x87 ;-) compatible CPUs, but not in 64-bit mode.  DEC VAXes had their own floating point formats including decent 128-bit versions. and so on.  Unless you do things like Mandelbrot or Julia sets, work with eigenvectors of large matrices, FFTs, or solve large linear programming problems, double precision is quite enough.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Profiling Ada binaries
  2016-07-25 16:45       ` rieachus
  2016-07-25 17:14         ` Simon Wright
@ 2016-07-26  8:37         ` Markus Schöpflin
  2016-08-01 22:40           ` rieachus
  1 sibling, 1 reply; 15+ messages in thread
From: Markus Schöpflin @ 2016-07-26  8:37 UTC (permalink / raw)


Am 25.07.2016 um 18:45 schrieb rieachus@comcast.net:

> Gee. I would never think to compile the math libraries with -O1.
> Seriously, the math libraries are written with ease of understanding in
> mind.  You may have thousands of calls in the implementation of one
> function, and due to the packages being generic, every one of those calls
> will do an elaboration check.  How can that be efficient?

GNAT by default uses static elaboration. There should be no elaboration checks 
when calling the generic versions. Or am I mistaken here?

 > I believe GNAT
> has non-generic versions for Short_Float, Float, and Long_Float which use
> the hardware built-ins.  But I doubt you would get that automatically with
> -O1.

Even using the non-generic versions I have not been able to get the hardware 
built-ins. The best I can achieve for a call to e.g. cos(X) is:

         call    ada__numerics__long_elementary_functions__cos

> Try compiling everything with -O3 (or whatever you use) then recompile only
> the unit you want the tracing for with -O1 and -fno-inline-functions-called-once.

-O3 is explicitly discouraged by the documentation, so we're normally using 
-O2. And to get a general feeling on where the application is burning its CPU 
cycles, -O1 seems to be OK, as the execution time normally is dominated by the 
choice of algorithms and not by differences in the optimization level.

Markus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Profiling Ada binaries
  2016-07-26  8:37         ` Markus Schöpflin
@ 2016-08-01 22:40           ` rieachus
  2016-08-01 23:36             ` Jeffrey R. Carter
  2016-08-02  6:39             ` Markus Schöpflin
  0 siblings, 2 replies; 15+ messages in thread
From: rieachus @ 2016-08-01 22:40 UTC (permalink / raw)


On Tuesday, July 26, 2016 at 4:37:31 AM UTC-4, Markus Schöpflin wrote:
 
> GNAT by default uses static elaboration. There should be no elaboration checks 
> when calling the generic versions. Or am I mistaken here?

From the GNAT documentation: Strict conformance to the Ada Reference Manual can be achieved by adding two compiler options for dynamic checks for access-before-elaboration on subprogram calls and generic instantiations (-gnatE) and stack overflow checking (-fstack-check).

>  > I believe GNAT
> > has non-generic versions for Short_Float, Float, and Long_Float which use
> > the hardware built-ins.  But I doubt you would get that automatically with
> > -O1.

> Even using the non-generic versions I have not been able to get the hardware 
> built-ins. The best I can achieve for a call to e.g. cos(X) is: 

>         call    ada__numerics__long_elementary_functions__cos

That's silly.  A project for a rainy afternoon.  Hmm.  May rain today...

Again GNAT docs to the rescue: 15.1 Machine code insertions:

The equivalent can be written for GNAT as:

    Asm ("fsinx %1 %0",
         My_Float'Asm_Output ("=f", result),
         My_Float'Asm_Input  ("f",  angle));

I assume I wrap that in a function call:

  function Sin(Angle: in Float) return Float is
    Asm ("fsinx %1 %0",
         My_Float'Asm_Output ("=f", result),
         My_Float'Asm_Input  ("f",  angle));
  end Sin;

Now all I have to do is put together a set of test cases...
 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Profiling Ada binaries
  2016-08-01 22:40           ` rieachus
@ 2016-08-01 23:36             ` Jeffrey R. Carter
  2016-08-02  7:00               ` Markus Schöpflin
  2016-08-05  3:18               ` rieachus
  2016-08-02  6:39             ` Markus Schöpflin
  1 sibling, 2 replies; 15+ messages in thread
From: Jeffrey R. Carter @ 2016-08-01 23:36 UTC (permalink / raw)


On 08/01/2016 03:40 PM, rieachus@comcast.net wrote:
> On Tuesday, July 26, 2016 at 4:37:31 AM UTC-4, Markus Schöpflin wrote:
>  
> 
>> Even using the non-generic versions I have not been able to get the hardware 
>> built-ins. The best I can achieve for a call to e.g. cos(X) is: 
> 
>>         call    ada__numerics__long_elementary_functions__cos
> 
> Again GNAT docs to the rescue: 15.1 Machine code insertions:

I think the OP would benefit from knowing why this is necessary.

If you look at the body of Ada.Numerics.Long_Elementary_Functions.Cos, you'll
probably find a call to {something that calls} the built-in function.

If you look at the requirements for the Cos function in Annex A and Annex G (if
implemented, which it is for GNAT), you'll find a number of requirements for
accuracy and special cases. If you look at the definition of the built-in
function, you'll likely find that it doesn't meet all of those requirements. Any
call to Cos has to involve wrapping a call to the built-in function in code to
ensure those requirements are met, so you won't find a call to the built-in
function in the generated code.

Even if the built-in function met all the requirements, the desire for the
compiler to be portable will result in the call to the built-in being squirreled
away, not produced by the code generator.

The general rule, "If you need specific machine code, use a machine-code
insertion," applies here. Of course, the result is non-portable code, while the
call to the language-defined library function is portable.

-- 
Jeff Carter
"Hello! Smelly English K...niggets."
Monty Python & the Holy Grail
08


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Profiling Ada binaries
  2016-08-01 22:40           ` rieachus
  2016-08-01 23:36             ` Jeffrey R. Carter
@ 2016-08-02  6:39             ` Markus Schöpflin
  1 sibling, 0 replies; 15+ messages in thread
From: Markus Schöpflin @ 2016-08-02  6:39 UTC (permalink / raw)


Am 02.08.2016 um 00:40 schrieb rieachus@comcast.net:
> On Tuesday, July 26, 2016 at 4:37:31 AM UTC-4, Markus Schöpflin wrote:
>
>> GNAT by default uses static elaboration. There should be no elaboration
>> checks when calling the generic versions. Or am I mistaken here?
>
> From the GNAT documentation: Strict conformance to the Ada Reference Manual
> can be achieved by adding two compiler options for dynamic checks for
> access-before-elaboration on subprogram calls and generic instantiations
> (-gnatE) and stack overflow checking (-fstack-check).

You lost me there. I was arguing that I don't need to worry about the 
performance impact of dynamic elaborations checks as GNAT by default uses 
static elaboration. Why would I want to turn on strict ARM conformance if I 
don't need it in this case?

Markus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Profiling Ada binaries
  2016-08-01 23:36             ` Jeffrey R. Carter
@ 2016-08-02  7:00               ` Markus Schöpflin
  2016-08-05  3:18               ` rieachus
  1 sibling, 0 replies; 15+ messages in thread
From: Markus Schöpflin @ 2016-08-02  7:00 UTC (permalink / raw)


Am 02.08.2016 um 01:36 schrieb Jeffrey R. Carter:
> On 08/01/2016 03:40 PM, rieachus@comcast.net wrote:
>> On Tuesday, July 26, 2016 at 4:37:31 AM UTC-4, Markus Schöpflin wrote:
>>
>>> Even using the non-generic versions I have not been able to get the
>>> hardware built-ins. The best I can achieve for a call to e.g. cos(X)
>>> is:
>>
>>>   call    ada__numerics__long_elementary_functions__cos
>>
>> Again GNAT docs to the rescue: 15.1 Machine code insertions:
>
> I think the OP would benefit from knowing why this is necessary.
>
> If you look at the body of Ada.Numerics.Long_Elementary_Functions.Cos,
> you'll probably find a call to {something that calls} the built-in
> function.
>
> If you look at the requirements for the Cos function in Annex A and Annex G
> (if implemented, which it is for GNAT), you'll find a number of
> requirements for accuracy and special cases. If you look at the definition
> of the built-in function, you'll likely find that it doesn't meet all of
> those requirements. Any call to Cos has to involve wrapping a call to the
> built-in function in code to ensure those requirements are met, so you
> won't find a call to the built-in function in the generated code.

I am aware of that. I was trying to state that you don't get the hardware
built-ins in GNAT, despite the claim of the grandparent poster.

We rely on the guarantees given by the Ada LRM in this case. To make matter
worse (performance wise), we use

   type F is new Float range Float'Range;

as the base float type to make sure that we don't silently run into +/-INF or
other IEEE corner cases. This forces us to use the generic versions in
Ada.Numerics.Generic_Elementary_Functions.

We could of course try

   subtype F is Float range Float'Range;

and use the non-generic versions found in Numerics.Elementary_Functions, which
might be faster than their generic counterpart.

But although we are doing a lot of trigonometric calculations in our code, the
performance hot spots are usually not caused by floating point arithmetic, so
fortunately I don't have to worry about this too much.

> Even if the built-in function met all the requirements, the desire for the
> compiler to be portable will result in the call to the built-in being
> squirrelled away, not produced by the code generator.
>
> The general rule, "If you need specific machine code, use a machine-code
> insertion," applies here. Of course, the result is non-portable code, while
> the call to the language-defined library function is portable.

Actually I don't agree with you here. I don't see a reason why GNAT shouldn't 
be able to generate code which uses the built-in HW primitives, at least when 
using an appropriate optimization level and inlining. The resulting binary 
isn't portable anyway.

Markus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Profiling Ada binaries
  2016-08-01 23:36             ` Jeffrey R. Carter
  2016-08-02  7:00               ` Markus Schöpflin
@ 2016-08-05  3:18               ` rieachus
  2016-08-05 20:27                 ` Randy Brukardt
  1 sibling, 1 reply; 15+ messages in thread
From: rieachus @ 2016-08-05  3:18 UTC (permalink / raw)


On Monday, August 1, 2016 at 7:36:45 PM UTC-4, Jeffrey R. Carter wrote:

> If you look at the requirements for the Cos function in Annex A and Annex G (if
> implemented, which it is for GNAT), you'll find a number of requirements for
> accuracy and special cases. If you look at the definition of the built-in
> function, you'll likely find that it doesn't meet all of those requirements. Any
> call to Cos has to involve wrapping a call to the built-in function in code to
> ensure those requirements are met, so you won't find a call to the built-in
> function in the generated code.

Yes, any "strict" LRM matching implementation probably won't use the built-in functions.  The problem is not the one parameter (radian) versions, it is the two-parameter versions, especially the cases where Ada provides two parameter versions of the arc- (reverse) trig functions.  The problem is not the special case values, but testing for the two parameter versions.  Where "all cases" requires just 2^32 tests, no big deal and I have tests for those.  Two 32-bit parameters is tough, but actually can be done for some functions.  Two 64-bit parameters?  Forget about it.  You have 2^128 cases, which will take about 10^24 CPU years.

Providing a non-strict implementation is much easier.  I don't have to special case the exact cases, and while I would test a decent subset of the full parameter space to to compute mean LSB error statistics, I wouldn't look for special cases.  (Years ago someone -- was it Mike Woodger? -- found FOUR cases where X*X*X*X is not equal to (X*X)*(X*X) for floating point values.  This is why the strict mode should only be used if you really need it. ;-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Profiling Ada binaries
  2016-08-05  3:18               ` rieachus
@ 2016-08-05 20:27                 ` Randy Brukardt
  0 siblings, 0 replies; 15+ messages in thread
From: Randy Brukardt @ 2016-08-05 20:27 UTC (permalink / raw)


<rieachus@comcast.net> wrote in message 
news:588a93bb-b39a-4196-b6ca-5e673fd256dd@googlegroups.com...
On Monday, August 1, 2016 at 7:36:45 PM UTC-4, Jeffrey R. Carter wrote:

>> If you look at the requirements for the Cos function in Annex A and Annex 
>> G (if
>> implemented, which it is for GNAT), you'll find a number of requirements 
>> for
>> accuracy and special cases. If you look at the definition of the built-in
>> function, you'll likely find that it doesn't meet all of those 
>> requirements. Any
>> call to Cos has to involve wrapping a call to the built-in function in 
>> code to
>> ensure those requirements are met, so you won't find a call to the 
>> built-in
>> function in the generated code.
>
>Yes, any "strict" LRM matching implementation probably won't use the 
>built-in functions.

Can't really, at least for the Intel implementations of Sin, Cos, etc. Those 
don't do argument reduction, which is required to get the appropriate 
accuracy. (I recall reading the Intel processor manual for those 
instructions, and they recommended reducing the arguments before using 
them.) I suppose one could try to do all of that inline, but I can't see the 
reason to bother with special code for that -- the usual inlining mechanisms 
will do that if it makes sense.

                                     Randy.


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-08-05 20:27 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-22 13:24 Profiling Ada binaries Markus Schöpflin
2016-07-22 14:59 ` Alejandro R. Mosteo
2016-07-22 15:05   ` Alejandro R. Mosteo
2016-07-25  7:01     ` Markus Schöpflin
2016-07-25 16:45       ` rieachus
2016-07-25 17:14         ` Simon Wright
2016-07-25 22:05           ` rieachus
2016-07-26  8:37         ` Markus Schöpflin
2016-08-01 22:40           ` rieachus
2016-08-01 23:36             ` Jeffrey R. Carter
2016-08-02  7:00               ` Markus Schöpflin
2016-08-05  3:18               ` rieachus
2016-08-05 20:27                 ` Randy Brukardt
2016-08-02  6:39             ` Markus Schöpflin
2016-07-25  6:57   ` Markus Schöpflin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox