comp.lang.ada
 help / color / mirror / Atom feed
* Profiling Gnat re. pthread_malloc, pthread_getspecific, system finalisation & tasking.
@ 2001-06-23  7:34 Charles Darcy
  2001-06-23 10:57 ` Profiling Gnat re. pthread_malloc, pthread_getspecific, system Larry Kilgallen
  2001-06-23 19:26 ` Profiling Gnat re. pthread_malloc, pthread_getspecific, system finalisation & tasking Pat Rogers
  0 siblings, 2 replies; 6+ messages in thread
From: Charles Darcy @ 2001-06-23  7:34 UTC (permalink / raw)


Hello,

    I've profiled an Ada program, which I converted from c++, so that I
might learn why the Ada program performs worse than the c++ version
(approx. 10 times slower). I'm using Gnat 3.13p and gprof on a Linux
(Mandrake 8.0) machine. I've used inlining, O3 optimisation, and
disabled all run-time checks.

    The profile results (below) seem to indicate that pthread_malloc,
pthread_getspecific consume a large portion of processing time, but
these functions are a mystery to me. Is there any way to reduce the
performance cost of these functions ?

    The other expensive sub-programs seem related to tasking and
controlled types, neither of which I directly use. I do use the Booch
components (maps, collections), which may make use of controlled types,
but I am not creating and destroying these as part of my main
processing, so I do not see why the system__finalization_* subprograms
are such a factor. Similarly, as I do not use tasks, I don't understand
why the system__tasking_* subprograms consume a significant portion of
processor time.


(Note: I've renamed my subprograms to *my_subprogram*)

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
  8.29     79.07    79.07  4840668     0.02     0.04
*my_main_processing_subprogram*
  7.61    151.64    72.57                             pthread_malloc
  6.43    212.98    61.34
pthread_getspecific
  5.32    263.76    50.78
system__finalization_implementation__initialize__2
  5.09    312.35    48.59                             _free_internal
  4.20    352.44    40.09
system__finalization_implementation__finalize_list
  3.62    386.97    34.53
system__tasking__initialization__set_jmpbuf_address
  2.60    411.73    24.76
system__tasking__initialization__task_unlock__2
  2.42    434.80    23.07  1786603     0.01     0.01  *my_subprogram*
  2.39    457.63    22.83                             pthread_mutex_lock

  2.37    480.24    22.61  1595032     0.01     0.04  *my_subprogram*
  2.17    500.90    20.66
system__tasking__initialization__task_lock__2
  2.13    521.24    20.34
system__soft_links__set_jmpbuf_address_soft
  1.94    539.72    18.48
pthread_mutex_unlock
  1.87    557.56    17.84 557870939     0.00     0.00  *my_subprogram*
  1.72    573.92    16.36 61165604     0.00     0.00  *my_subprogram*
  1.68    589.94    16.02 71029260     0.00     0.00  *my_subprogram*
  1.66    605.78    15.84
system__tasking__initialization__get_sec_stack_addr
  1.60    621.03    15.25
system__finalization_implementation__adjust__2
  1.59    636.16    15.13 12025996     0.00     0.00  *my_subprogram*
  1.42    649.66    13.50
system__tasking__initialization__get_jmpbuf_address
  1.34    662.40    12.74 84209013     0.00     0.00  *my_subprogram*
  1.23    674.13    11.73 84209013     0.00     0.00  *my_subprogram*
  1.20    685.60    11.47
ada__strings__unbounded__adjust__2
  1.10    696.06    10.46 84215790     0.00     0.00  *my_subprogram*
  1.08    706.38    10.32
system__secondary_stack__ss_allocate
  1.07    716.57    10.19                             __gnat_free
  1.07    726.76    10.19                             __gnat_malloc
  1.05    736.77    10.01 84217148     0.00     0.00 *my_subprogram*
  1.02    746.53     9.76 182182420     0.00     0.00  *my_subprogram*
  0.90    755.09     8.56
system__soft_links__get_jmpbuf_address_soft
  0.86    763.31     8.22 288134396     0.00     0.00  *my_subprogram*
  0.85    771.37     8.06
system__finalization_implementation__finalize
  0.80    779.02     7.65                             pthread_free
  0.78    786.45     7.43
ada__strings__search__index__3
  0.74    793.53     7.08
system__tasking__initialization__get_current_excep
  0.71    800.31     6.78 84209013     0.00     *my_subprogram*
  0.63    806.36     6.05                             malloc



    I'm new to Ada, and afraid I may have inadvertantly used the
language in an improper fashion, resulting in poor performance.
My thanks to anyone who can enlighten me as to what might be done to
eliminate any unnecessary performance overhead.


regards,

Charles.




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Profiling Gnat re. pthread_malloc, pthread_getspecific, system
  2001-06-23  7:34 Profiling Gnat re. pthread_malloc, pthread_getspecific, system finalisation & tasking Charles Darcy
@ 2001-06-23 10:57 ` Larry Kilgallen
  2001-06-23 19:26 ` Profiling Gnat re. pthread_malloc, pthread_getspecific, system finalisation & tasking Pat Rogers
  1 sibling, 0 replies; 6+ messages in thread
From: Larry Kilgallen @ 2001-06-23 10:57 UTC (permalink / raw)


In article <3B344689.8A79ED4F@mullum.com.au>, Charles Darcy <charlie@mullum.com.au> writes:

>     I've profiled an Ada program, which I converted from c++, so that I
> might learn why the Ada program performs worse than the c++ version
> (approx. 10 times slower). I'm using Gnat 3.13p and gprof on a Linux
> (Mandrake 8.0) machine. I've used inlining, O3 optimisation, and
> disabled all run-time checks.
> 
>     The profile results (below) seem to indicate that pthread_malloc,
> pthread_getspecific consume a large portion of processing time, but
> these functions are a mystery to me. Is there any way to reduce the
> performance cost of these functions ?

I am not familiar with your analysis tool, but I would suggest
considering whether the C++ version uses the same pthread_ calls
and uses them an equivalent number of times in a run against
some fixed input data.

Just by the name, pthread_malloc would seem it could not be any
more efficient than ordinary malloc, although it might be equal
on your platform.

>     I'm new to Ada, and afraid I may have inadvertantly used the
> language in an improper fashion, resulting in poor performance.
> My thanks to anyone who can enlighten me as to what might be done to
> eliminate any unnecessary performance overhead.

That is a possibility.  Someone familiar with Ada might be able to
look at your code and see if it resembles "C++ style" more than
Ada style.

You might also consider whether it is possible to experiment with
your program on a different Ada compiler and operating system. It
could be there is something in your algorithm that happens to hit
a weak point in GNAT.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Profiling Gnat re. pthread_malloc, pthread_getspecific, system  finalisation & tasking.
  2001-06-23  7:34 Profiling Gnat re. pthread_malloc, pthread_getspecific, system finalisation & tasking Charles Darcy
  2001-06-23 10:57 ` Profiling Gnat re. pthread_malloc, pthread_getspecific, system Larry Kilgallen
@ 2001-06-23 19:26 ` Pat Rogers
  2001-06-23 20:06   ` tmoran
  1 sibling, 1 reply; 6+ messages in thread
From: Pat Rogers @ 2001-06-23 19:26 UTC (permalink / raw)



"Charles Darcy" <charlie@mullum.com.au> wrote in message
news:3B344689.8A79ED4F@mullum.com.au...
> Hello,
>
>     I've profiled an Ada program, which I converted from c++, so that I
> might learn why the Ada program performs worse than the c++ version
> (approx. 10 times slower). I'm using Gnat 3.13p and gprof on a Linux
> (Mandrake 8.0) machine. I've used inlining, O3 optimisation, and
> disabled all run-time checks.

By "using inlining" do you mean pragma Inline?  Did you also specify -gnatn?

Is this for your own personal curiosity or for work?  If the latter, let me
strongly suggest you get an evaluation license from ACT so that you can do a
proper comparison -- you will indeed be fighting the issues you mention at
the bottom of your post and will need their help in getting meaningful
results (as would be the case for any vendor).

>     The profile results (below) seem to indicate that pthread_malloc,
> pthread_getspecific consume a large portion of processing time, but
> these functions are a mystery to me. Is there any way to reduce the
> performance cost of these functions ?
>
>     The other expensive sub-programs seem related to tasking and
> controlled types, neither of which I directly use.

Somehow your Ada code does not correspond to your C++ code, regarding
threads.






^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Profiling Gnat re. pthread_malloc, pthread_getspecific, system  finalisation & tasking.
  2001-06-23 19:26 ` Profiling Gnat re. pthread_malloc, pthread_getspecific, system finalisation & tasking Pat Rogers
@ 2001-06-23 20:06   ` tmoran
  2001-06-25  5:25     ` Rod Kay
  0 siblings, 1 reply; 6+ messages in thread
From: tmoran @ 2001-06-23 20:06 UTC (permalink / raw)


> > (Mandrake 8.0) machine. I've used inlining, O3 optimisation, and
> > disabled all run-time checks.
>
> By "using inlining" do you mean pragma Inline?  Did you also specify -gnatn?
  A 10x difference in speed is unlikely to be caused by, or solved by,
any of those things.

> Somehow your Ada code does not correspond to your C++ code, regarding
> threads.
    Likely.  Though there may also be an optimization opportunity in
making it *less* like the C.  I'm thinking in particular of getting
rid of malloc's by using local (stack) declarations instead.  Ada
programs typically use heap allocation a lot less.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Profiling Gnat re. pthread_malloc, pthread_getspecific, system   finalisation & tasking.
  2001-06-23 20:06   ` tmoran
@ 2001-06-25  5:25     ` Rod Kay
  2001-06-25  5:42       ` Charles Darcy
  0 siblings, 1 reply; 6+ messages in thread
From: Rod Kay @ 2001-06-25  5:25 UTC (permalink / raw)


tmoran@acm.org wrote:
> 
> > > (Mandrake 8.0) machine. I've used inlining, O3 optimisation, and
> > > disabled all run-time 
> >
> > By "using inlining" do you mean pragma Inline?  Did you also specify -gnatn?

     I'm using Gnat's pragma Inline_Always, which I believe operates
regardless of the -gnatn switch. (I had trouble linking when I tried
pragma Inline & -gnatn.)


>   A 10x difference in speed is unlikely to be caused by, or solved by,
> any of those things.

    Yes, as it turns out the problem was unrelated to the optimisation
controls. A critical function dereferenced a forward reference type
(from John G. Volan's forward reference package), resulting in an
instance of one of my types being created and used, in place of the
instance actually referred to. This type contained a Container from the
Booch Components, which is a controlled type. The creation/destruction
overhead, incurred each time the procedure was called, essentially
crippled the program.

    As it turns out the forward reference was unnecessary, a relic from
an earlier circular elaboration problem, and performance improved
markedly when it was removed.



> 
> > Somehow your Ada code does not correspond to your C++ code, regarding
> > threads.
>     Likely.  Though there may also be an optimization opportunity in
> making it *less* like the C.  I'm thinking in particular of getting
> rid of malloc's by using local (stack) declarations instead.  Ada
> programs typically use heap allocation a lot less.


    I've found that the AWS comms packages I'm using (in an unrelated
area of the application), makes use of Ada's tasking. The AWS packages
are not used in the processing which I'm profiling, but their presence
seem to impose a tasking overhead on all subprograms. When I removed the
AWS packages from the program, the pthread_* and system__tasking__*
subprograms vanished from the profile results.


    The program is now performing only a little below the c++ version,
and with a little more optimisation, I'm hoping to equal or exceed that.
Thanks to all for your time and advise.



regards,

Charles.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Profiling Gnat re. pthread_malloc, pthread_getspecific, system   finalisation & tasking.
  2001-06-25  5:25     ` Rod Kay
@ 2001-06-25  5:42       ` Charles Darcy
  0 siblings, 0 replies; 6+ messages in thread
From: Charles Darcy @ 2001-06-25  5:42 UTC (permalink / raw)



    Ooops ... I appear to be having an identity crisis. 

    The prior post was from me (Charles Darcy), and not this 'Rod Kay'
imposter (flatmate).

    Sorry for any confusion. Hopefully, my identity settings are correct
now.



regards,

Charlie.



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2001-06-25  5:42 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-06-23  7:34 Profiling Gnat re. pthread_malloc, pthread_getspecific, system finalisation & tasking Charles Darcy
2001-06-23 10:57 ` Profiling Gnat re. pthread_malloc, pthread_getspecific, system Larry Kilgallen
2001-06-23 19:26 ` Profiling Gnat re. pthread_malloc, pthread_getspecific, system finalisation & tasking Pat Rogers
2001-06-23 20:06   ` tmoran
2001-06-25  5:25     ` Rod Kay
2001-06-25  5:42       ` Charles Darcy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox