Re: Performance and Thread Safety of the Booch Components

comp.lang.ada
 help / color / mirror / Atom feed

* Re: Performance and Thread Safety of the Booch Components
  2000-04-21  0:00 Performance and Thread Safety of the Booch Components Harry Erwin
@ 2000-04-21  0:00 ` Ted Dennison
  2000-04-22  0:00   ` Simon Wright
  0 siblings, 1 reply; 4+ messages in thread
From: Ted Dennison @ 2000-04-21  0:00 UTC (permalink / raw)

Harry Erwin wrote:

> Experience with the STL in C++ has shown that the standard container
> classes and associated algorithms degrade performance severely unless a
> significant level of optimization is turned on. This is in part due to
> the number of temporary objects and function calls that get generated.

...

> The Booch components appear to be generally exception-safe and
> exception-neutral. Are there any issues known that I should be aware
> of?--other than one esoteric bug involving copying between queues of
> different types that I've already reported.

I don't think they really have been highly tuned for performance. As an
example, we were using doing multiple lookups using a map object in a loop
that gets executed at 60Hz for a simulator. Performance was terrible even on
GreenHills' highest optimization levels, so I looked at it with a
subprogram-level performance analyzer. It turned out that even the *bounded*
maps (which you'd figure wouldn't do dynamic allocation) were dynamicly
allocating an iterator every lookup. I fixed that, then found that it was
still quite slow due to the fact that the (20 or so byte) result of the
lookup was getting passed as a "return" value through 3 successive
function's return calls. That resulted in each object in the map getting
copied 4 times at 60Hz. You'd think that there'd be some way for the
compiler to optimize that down to one copy at the assignemt on the outside,
but I couldn't find it. So I just wrote a procedure version and that did the
trick.

I submitted all these changes to Simon Wright for possible incorporation
into the official components, but of course this is only Maps. I'm not
trying to rag on Simon here, or those who came before him. I'm just pointing
out that Booch is a volunteer effort. Improvements in functionality come
when someone with the proper tools takes an interest in it. If you have
interest in high-performance components, and you have access to a execution
profiler, I'd encourage you to use the components. Just be prepared to have
to make a couple of twiddles here and there (and please send them back to
Simon, so the next person won't have to do it too).

--
T.E.D.

Home - mailto:dennison@telepath.com  Work - mailto:dennison@ssd.fsi.com
WWW  - http://www.telepath.com/dennison/Ted/TED.html  ICQ  - 10545591

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Performance and Thread Safety of the Booch Components
@ 2000-04-21  0:00 Harry Erwin
  2000-04-21  0:00 ` Ted Dennison
  0 siblings, 1 reply; 4+ messages in thread
From: Harry Erwin @ 2000-04-21  0:00 UTC (permalink / raw)


Disclaimer: my expertise is much deeper for C++ than Ada.

Experience with the STL in C++ has shown that the standard container
classes and associated algorithms degrade performance severely unless a
significant level of optimization is turned on. This is in part due to
the number of temporary objects and function calls that get generated.
On the other hand, with optimization, they about as fast as the best
hand-tuned code. What is the experience with the Booch Components in Ada
95? They look at least as well-designed as the STL. Are they also slow
unless optimization in enabled? If so, what level of optimization is
needed?

The thread-safety of the STL is a problem for strings, which are by
default COW in most compiler implementations. Are there similar issues
for any of the Booch components that I should be aware of?

The Booch components appear to be generally exception-safe and
exception-neutral. Are there any issues known that I should be aware
of?--other than one esoteric bug involving copying between queues of
different types that I've already reported.

-- 
Harry Erwin, PhD, <herwin@gmu.edu>, Bat Researcher, Senior SW Analyst 
and Security Engineer, and Adjunct Professor of Computer Science, 
George Mason University.




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Performance and Thread Safety of the Booch Components
  2000-04-21  0:00 ` Ted Dennison
@ 2000-04-22  0:00   ` Simon Wright
  2000-04-24  0:00     ` Ted Dennison
  0 siblings, 1 reply; 4+ messages in thread
From: Simon Wright @ 2000-04-22  0:00 UTC (permalink / raw)


Ted Dennison <dennison@telepath.com> writes:

> Harry Erwin wrote:
> 
> > Experience with the STL in C++ has shown that the standard container
> > classes and associated algorithms degrade performance severely unless a
> > significant level of optimization is turned on. This is in part due to
> > the number of temporary objects and function calls that get generated.
> 
> ...
> 
> > The Booch components appear to be generally exception-safe and
> > exception-neutral. Are there any issues known that I should be aware
> > of?--other than one esoteric bug involving copying between queues of
> > different types that I've already reported.
> 
> I don't think they really have been highly tuned for performance. As an
> example, we were using doing multiple lookups using a map object in a loop
> that gets executed at 60Hz for a simulator. Performance was terrible even on
> GreenHills' highest optimization levels, so I looked at it with a
> subprogram-level performance analyzer. It turned out that even the *bounded*
> maps (which you'd figure wouldn't do dynamic allocation) were dynamicly
> allocating an iterator every lookup.

Yes, this was a bad move. My excuse is that the performance of "new"
on Linux is pretty good (well, not unacceptable to me as an occasional
user :-) -- it took about 80 uS on my P200+ box to create an Iterator.

Unfortunately, "new" on VxWorks is a lot slower. Most VxWorks users, I
expect, allocte memory at startup and deprecate "new" at other times.

Ted wasn't helped that the specialised Iterators in Maps didn't get
updated at the same time as I changed the standard passive iterators
to use an existing Iterator object rather than creating their own.

I've invented a new Iterator scheme, appended to this article in
gnatchop-compatible form (I see it uses 'Img, a GNAT-specific
attribute, sorry) which reduces that 80uS to 4 uS, wow! -- and it
doesn't use "new" at all.

There is a subtle bug in the old scheme (the one in the present
release), where Iterators were assignable but were implemented as a
smart pointer to a shared data structure. The new scheme at least
should allow a proper Adjustment on assignment (but I have yet to
think of an example where that would be useful).

I see that the C++ BCs that are my baseline don't implement any sort
of internal locking on containers that are being iterated over. This
has been a worry for me for a long time.

>                                      I fixed that, then found that it was
> still quite slow due to the fact that the (20 or so byte) result of the
> lookup was getting passed as a "return" value through 3 successive
> function's return calls. That resulted in each object in the map getting
> copied 4 times at 60Hz. You'd think that there'd be some way for the
> compiler to optimize that down to one copy at the assignemt on the outside,
> but I couldn't find it. So I just wrote a procedure version and that did the
> trick.

Perhaps "pragma Inline" might have helped. At the expense of bloat,
perhaps.

I certainly intend to take up Ted's suggestion (at least in the
internals of the library). Of course, these optimisations amy just
turn out to be pessimisations for other users ..

> I submitted all these changes to Simon Wright for possible incorporation
> into the official components, but of course this is only Maps. I'm not
> trying to rag on Simon here, or those who came before him. I'm just pointing
> out that Booch is a volunteer effort. Improvements in functionality come
> when someone with the proper tools takes an interest in it. If you have
> interest in high-performance components, and you have access to a execution
> profiler, I'd encourage you to use the components. Just be prepared to have
> to make a couple of twiddles here and there (and please send them back to
> Simon, so the next person won't have to do it too).

Seconded!

Code follows:
=======================================================================
with Ada.Finalization;
package Iteration is
  type Container is abstract tagged private;
  type Iterator is abstract tagged private;
  function New_Iterator (For_The_Container : Container) return Iterator'Class
    is abstract;
  procedure Reset (It : in out Iterator) is abstract;
private
  type Container is abstract new Ada.Finalization.Controlled with null record;
  type Container_Ptr is access all Container'Class;
  for Container_Ptr'Storage_Size use 0;
  type Iterator is abstract tagged record
    For_The_Container : Container_Ptr;
  end record;
end Iteration;
package Iteration.Child is
  type Child_Container is new Container with private;
  function New_Iterator
     (For_The_Container : Child_Container) return Iterator'Class;
private
  type Child_Container is new Container with null record;
  type Child_Iterator is new Iterator with null record;
  procedure Reset (It : in out Child_Iterator);
end Iteration.Child;
with Ada.Text_Io;
with System.Address_To_Access_Conversions;
package body Iteration.Child is
  package Allow_Access
  is new System.Address_To_Access_Conversions (Child_Container);
  function New_Iterator
     (For_The_Container : Child_Container) return Iterator'Class is
    Result : Child_Iterator
       := (For_The_Container =>
              Allow_Access.To_Pointer (For_The_Container'Address).all'Access);
  begin
 --     Ada.Text_Io.Put_Line ("Iteration.Child.New_Iterator (Child_Container).");
    Reset (Result);
    return Result;
  end New_Iterator;
  procedure Reset (It : in out Child_Iterator) is
  begin
 --     Ada.Text_Io.Put_Line ("Iteration.Child.Reset (Child_Iterator).");
    null;
  end Reset;
end Iteration.Child;
with Ada.Calendar;
with Ada.Text_Io;
with Iteration.Child;
procedure Iteration_Test is
  C : Iteration.Child.Child_Container;
  Loops : constant Positive := 100_000;
  Start : Ada.Calendar.Time;
  Taken : Duration;
  use Ada.Text_Io;
  use type Ada.Calendar.Time;
begin
  Put_Line ("Timing" & Loops'Img & " simple iterator allocations:");
  Start := Ada.Calendar.Clock;
  for I in 1 .. Loops loop
    declare
      It : Iteration.Iterator'Class := Iteration.Child.New_Iterator (C);
    begin
      null;
    end;
  end loop;
  Taken := (Ada.Calendar.Clock - Start) / Loops;
  Put_Line (".. took" & Duration'Image (Taken) & " sec on average.");

  Put_Line ("Timing" & Loops'Img & " dispatching iterator allocations:");
  Start := Ada.Calendar.Clock;
  for I in 1 .. Loops loop
    declare
      It : Iteration.Iterator'Class
         := Iteration.New_Iterator (Iteration.Container'Class (C));
    begin
      null;
    end;
  end loop;
  Taken := (Ada.Calendar.Clock - Start) / Loops;
  Put_Line (".. took" & Duration'Image (Taken) & " sec on average.");

  Put_Line ("Timing" & Loops'Img & " dispatching iterator reallocations:");
  Start := Ada.Calendar.Clock;
  declare
    It : Iteration.Iterator'Class
       := Iteration.New_Iterator (Iteration.Container'Class (C));
  begin
    for I in 1 .. Loops loop
      It := Iteration.New_Iterator (Iteration.Container'Class (C));
    end loop;
  end;
  Taken := (Ada.Calendar.Clock - Start) / Loops;
  Put_Line (".. took" & Duration'Image (Taken) & " sec on average.");

  Put_Line("done.");
end Iteration_Test;




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Performance and Thread Safety of the Booch Components
  2000-04-22  0:00   ` Simon Wright
@ 2000-04-24  0:00     ` Ted Dennison
  0 siblings, 0 replies; 4+ messages in thread
From: Ted Dennison @ 2000-04-24  0:00 UTC (permalink / raw)

Simon Wright wrote:

> Ted Dennison <dennison@telepath.com> writes:
>
> > subprogram-level performance analyzer. It turned out that even the *bounded*
> > maps (which you'd figure wouldn't do dynamic allocation) were dynamicly
> > allocating an iterator every lookup.
>
> Unfortunately, "new" on VxWorks is a lot slower. Most VxWorks users, I
> expect, allocte memory at startup and deprecate "new" at other times.

The problem isn't just the speed (although in this case that was the main problem).
There's also the unpredictability factor. Certain parts of a simulation rely on
operations occurring at the same time every iteration, otherwise things could start
to jitter or go unstable. If you throw vxWorks heap allocations and deallocations in
front of one of these calculations, what happens to the timing? I could ask WindRiver
to be sure, but since they weren't bragging to us about their great real-time
allocation scheme, I can be pretty sure that its a typical one. That means there's
going to be some heap coalescing algorithm in allocation or deallocation that will
take a varying amount of time based on the configuration the heap happens to be in at
that particular moment.

Now our scheduler does have a certain amount of protection for the most hard realtime
of its tasks (eg: the interface with the visual). But still its best to be safe. That
is the main reason real-time programmers try to avoid runtime heap usage as a general
rule, and that is why we want to use the bounded components, despite the waste of
memory and its accordant cache slowdowns.

So speed alone isn't the issue. I'd much prefer to see *no* dynamic allocation when
I'm dealing with a bounded structure. If there is some, I'd like to see it identified
in the specs for the offending routine so allocation-phobic users know to avoid it
after initialization.

> >                                      I fixed that, then found that it was
> > still quite slow due to the fact that the (20 or so byte) result of the
> > lookup was getting passed as a "return" value through 3 successive
> > function's return calls. That resulted in each object in the map getting
> > copied 4 times at 60Hz. You'd think that there'd be some way for the
> > compiler to optimize that down to one copy at the assignemt on the outside,
> > but I couldn't find it. So I just wrote a procedure version and that did the
> > trick.
>
> Perhaps "pragma Inline" might have helped. At the expense of bloat,
> perhaps.

That's what I thought too. No dice though.. :-(

--
T.E.D.

Home - mailto:dennison@telepath.com  Work - mailto:dennison@ssd.fsi.com
WWW  - http://www.telepath.com/dennison/Ted/TED.html  ICQ  - 10545591

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2000-04-24  0:00 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-04-21  0:00 Performance and Thread Safety of the Booch Components Harry Erwin
2000-04-21  0:00 ` Ted Dennison
2000-04-22  0:00   ` Simon Wright
2000-04-24  0:00     ` Ted Dennison

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox