comp.lang.ada
 help / color / mirror / Atom feed
* Reference counter in smart pointers are not updated properly when used by multiple tasks
@ 2018-02-01  5:52 onox
  2018-02-01  8:39 ` Dmitry A. Kazakov
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: onox @ 2018-02-01  5:52 UTC (permalink / raw)


I am currently designing a job processing system for an OpenGL 4.5 render engine. All the 
work like scene culling and matrix transform updates are performed by small jobs which 
can be executed by a small number of workers. A job can have 0 or 1 dependent job (the 
successor). The first jobs that needs to be executed (the leaves in the job graph) are 
enqueued to a single queue. Multiple workers can then try to dequeue jobs. Multiple jobs 
can have the same successor job, so I am using atomics (GCC's __sync_sub_and_fetch_4) to 
decrement a counter and if the counter becomes 0 then the successor job needs to be 
enqueued (so at the start only the leaves of the job graph are enqueued).

A job can create a subgraph and this subgraph is then inserted between the current job 
and its optional dependent job (the successor). In particular there is currently a 
Parallize function which can spawn multiple Parallel_Job'Class jobs each with a different 
range. This means that if you call Parallelize (My_Parallel_Job, 24, 6) (it returns a 
Job_Ptr), the job will (when it gets executed) spawn 4 copies of My_Parallel_Job with the 
ranges 1..6, 7..12, 13..18, 19..24.

So far so good.

Now, when I enqueue a job, I want to know when its whole graph has been executed. To do 
this I have created a synchronized interface called Future. It has a function 
Current_Status which can return the values Waiting, Running, Done, and Failed. It also 
has an entry that blocks until the status becomes Done or Failed. The status is updated 
by a worker.

Furthermore the package (Orka.Jobs.Boss) that manages the workers and the queue has a 
variable that is an array of instances of Future. And it has a variable that is a 
protected type called Manager with an entry Acquire and procedure Release. This protected 
object is used to manage the array of Future objects. The idea is that these 2 operations 
have O(1) time complexity and that there is no unchecked deallocation. Only the jobs 
themselves are used via raw pointers and are freed by the workers after execution.

The pointer to a Future object is put inside a smart pointer. When the smart pointer 
detects that there are 0 references, then the Release procedure described above is called.

The whole job system itself works; the jobs gets executed and in the right sequence. It 
is the managing of the acquired Future object that is troubling: When a job is enqueued 
using the queue from Orka.Jobs.Boss, the Enqueue entry is given a smart pointer. If the 
smart pointer is empty, then we need to acquire a Future object, otherwise the smart 
pointer is put in a record together with the Job_Ptr and enqueued.

So here is the problem: after I have enqueued the first job and blocked on the Future's 
Wait_Until_Done entry, then (if there is only 1 worker) there remains 1 reference to the 
Future object (the smart pointer that I gave to the Enqueue entry). At the end of the 
main procedure of the executable I see the Future object getting released. This is good. 
But if I use 2 or more workers, then often there remain > 1 references to the Future 
object and the Future object doesn't get released. Sometimes there remain < 1 references 
and the Future object gets released too early.

So with 1 worker there is no problem, but with 2 or more workers the smart pointers act 
weird. I am just using GCC's __sync_sub_and_fetch_4 and __sync_add_and_fetch_4. I have 
also tried to use Atomic aspect for the references, but that didn't help.

So the smart pointers are not working properly when there are jobs (paired with the same 
smart pointer) being executed by multiple workers. I'm not sure if this is a bug in my 
own code or whether GNAT is doing something funky in the Finalize procedure.

Can someone provide some insight?

See https://github.com/onox/jobs-test/blob/master/examples/orka_test-test_9_jobs.adb

- Change Number_Of_Workers in src/orka-jobs-boss.ads:47 to 1 (OK) or 2 (buggy)

- Compile with `make`

TL;DR; Reference counter of smart pointers are not properly updated when used by multiple 
tasks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Reference counter in smart pointers are not updated properly when used by multiple tasks
  2018-02-01  5:52 Reference counter in smart pointers are not updated properly when used by multiple tasks onox
@ 2018-02-01  8:39 ` Dmitry A. Kazakov
  2018-02-01 10:01   ` onox
  2018-02-01  8:41 ` Simon Wright
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Dmitry A. Kazakov @ 2018-02-01  8:39 UTC (permalink / raw)


On 01/02/2018 06:52, onox wrote:

> So the smart pointers are not working properly when there are jobs (paired with the same
> smart pointer) being executed by multiple workers.

Depends on the implementation. Surely handling reference counts must be 
done in a task-safe manner, e.g. through a protected action. Pragma 
Atomic is no help here.

> I'm not sure if this is a bug in my
> own code or whether GNAT is doing something funky in the Finalize procedure.

I never had problems with that.

> Can someone provide some insight?

1. I have two implementations, one with heavy logging/tracing the 
actions taken on reference counts.

2. There is gnatmem, which works surprisingly good. Reference-counted 
objects are allocated in the pool, so most of bugs are also memory leaks.

3. Controlled objects are tagged and thus passed by reference. It is 
important to remember when passing controlled handles/smart pointers 
around. A usual schema for dealing with shared linked structures like 
graphs is to clone its parts when updated under concurrent access. The 
case is detected using the reference count value > n, where n is the 
number of "owned" references. That n depends on parameter passing 
mechanism, which is, as I said, by-reference.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Reference counter in smart pointers are not updated properly when used by multiple tasks
  2018-02-01  5:52 Reference counter in smart pointers are not updated properly when used by multiple tasks onox
  2018-02-01  8:39 ` Dmitry A. Kazakov
@ 2018-02-01  8:41 ` Simon Wright
  2018-02-01 14:48 ` Jeffrey R. Carter
  2018-02-01 19:04 ` Robert A Duff
  3 siblings, 0 replies; 12+ messages in thread
From: Simon Wright @ 2018-02-01  8:41 UTC (permalink / raw)


onox <denkpadje@gmail.com> writes:

> So the smart pointers are not working properly when there are jobs
> (paired with the same smart pointer) being executed by multiple
> workers. I'm not sure if this is a bug in my own code or whether GNAT
> is doing something funky in the Finalize procedure.

I had a look on macOS. Some messing around required with build system/os
choice.

GCC 8.0.0 failed with ICE, tried 7.1.0.

Doesn't support CPU affinity: commented out.

Number of workers 1: OK.

Number of workers 2:

Worker #2 finished -- refs:  2
Worker #2 dequeueing...
Releasing slot...
Releasing slot...
Worker #1 : ORKA.JOBS.JOB : raised PROGRAM_ERROR : adjust/finalize raised SYSTEM.ASSERTIONS.ASSERT_FAILURE: failed precondition from orka-futures-slots.ads:60 instantiated at orka_test-package_9_jobs.ads:20
Load address: 0x108284000
Call stack traceback locations:
0x108295c11 0x1082a1cd0 0x7fff7da586bf 0x7fff7da5856b

Worker #2 : ORKA.JOBS.JOB : raised PROGRAM_ERROR : adjust/finalize raised SYSTEM.ASSERTIONS.ASSERT_FAILURE: failed precondition from orka-futures-slots.ads:60 instantiated at orka_test-package_9_jobs.ads:20
Load address: 0x108284000
Call stack traceback locations:
0x108293228 0x1082946a5 0x1082a1cd0 0x7fff7da586bf 0x7fff7da5856b

     Jobs not done: RUNNING
Releasing slot...

Execution terminated by unhandled exception
raised PROGRAM_ERROR : adjust/finalize raised PROGRAM_ERROR: orka-smart_pointers.adb:64 explicit raise
Load address: 0x108284000
Call stack traceback locations:
0x1082a0f56 0x1082a3dec 0x1082a4cea 0x1082a4aeb 0x1082c1fea 0x7fff7da88fcf 0x7fff7da88ee0 0x108286ac7 0x1082c206a

The tracebacks were no good, need to build with MODE=debug; when I do
this, I see your problem with the references,

Worker #2 dequeueing...
     Jobs done: DONE  3
     Time:  0.009000000
references B (expects >= 1):  2
shutting down...
Worker #1 dequeued  0
Worker #1 terminated
Worker #2 dequeued  0
Worker #2 terminated
shutting down?
references C (expects 1):  2
Queue high (expects 0):  0
Queue normal (expects 0):  0
Slots acquired (expects 1):  1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Reference counter in smart pointers are not updated properly when used by multiple tasks
  2018-02-01  8:39 ` Dmitry A. Kazakov
@ 2018-02-01 10:01   ` onox
  2018-02-01 10:28     ` onox
  2018-02-01 10:57     ` Dmitry A. Kazakov
  0 siblings, 2 replies; 12+ messages in thread
From: onox @ 2018-02-01 10:01 UTC (permalink / raw)


On Thursday, February 1, 2018 at 9:39:24 AM UTC+1, Dmitry A. Kazakov wrote:
> On 01/02/2018 06:52, onox wrote:
> 
> > So the smart pointers are not working properly when there are jobs (paired with the same
> > smart pointer) being executed by multiple workers.
> 
> Depends on the implementation. Surely handling reference counts must be 
> done in a task-safe manner, e.g. through a protected action. Pragma 
> Atomic is no help here.

You're right Dimitry, using a protected object instead fixes the problem. But I do not understand why __sync_sub_and_fetch_4 is not sufficient. Could you explain?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Reference counter in smart pointers are not updated properly when used by multiple tasks
  2018-02-01 10:01   ` onox
@ 2018-02-01 10:28     ` onox
  2018-02-01 10:51       ` Dmitry A. Kazakov
  2018-02-01 10:57     ` Dmitry A. Kazakov
  1 sibling, 1 reply; 12+ messages in thread
From: onox @ 2018-02-01 10:28 UTC (permalink / raw)


I'm replaced the Atomic.Unsigned_32 with:

   protected type Reference_Counter is
      procedure Increment;
      procedure Decrement (Result : out Natural);
      function Count return Natural;
   private
      References : Natural := 1;
   end Reference_Counter;

   type Reference_Counter_Access is access Reference_Counter;

   type Data_Record is record
      References : Reference_Counter_Access;

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Reference counter in smart pointers are not updated properly when used by multiple tasks
  2018-02-01 10:28     ` onox
@ 2018-02-01 10:51       ` Dmitry A. Kazakov
  0 siblings, 0 replies; 12+ messages in thread
From: Dmitry A. Kazakov @ 2018-02-01 10:51 UTC (permalink / raw)


On 01/02/2018 11:28, onox wrote:
> I'm replaced the Atomic.Unsigned_32 with:
> 
>     protected type Reference_Counter is
>        procedure Increment;
>        procedure Decrement (Result : out Natural);
>        function Count return Natural;
>     private
>        References : Natural := 1;
>     end Reference_Counter;
> 
>     type Reference_Counter_Access is access Reference_Counter;
> 
>     type Data_Record is record
>        References : Reference_Counter_Access;

There is System.Atomic_Counters I suppose from GNAT you could use.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Reference counter in smart pointers are not updated properly when used by multiple tasks
  2018-02-01 10:01   ` onox
  2018-02-01 10:28     ` onox
@ 2018-02-01 10:57     ` Dmitry A. Kazakov
  1 sibling, 0 replies; 12+ messages in thread
From: Dmitry A. Kazakov @ 2018-02-01 10:57 UTC (permalink / raw)


On 01/02/2018 11:01, onox wrote:
> On Thursday, February 1, 2018 at 9:39:24 AM UTC+1, Dmitry A. Kazakov wrote:
>> On 01/02/2018 06:52, onox wrote:
>>
>>> So the smart pointers are not working properly when there are jobs (paired with the same
>>> smart pointer) being executed by multiple workers.
>>
>> Depends on the implementation. Surely handling reference counts must be
>> done in a task-safe manner, e.g. through a protected action. Pragma
>> Atomic is no help here.
> 
> You're right Dimitry, using a protected object instead fixes the
> problem. But I do not understand why __sync_sub_and_fetch_4 is not
> sufficient.

Subtract-and-fetch should work, if present and returns the resulted 
value, pragma Atomic should not. It is kind of difficult to use GCC 
built-ins.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Reference counter in smart pointers are not updated properly when used by multiple tasks
  2018-02-01  5:52 Reference counter in smart pointers are not updated properly when used by multiple tasks onox
  2018-02-01  8:39 ` Dmitry A. Kazakov
  2018-02-01  8:41 ` Simon Wright
@ 2018-02-01 14:48 ` Jeffrey R. Carter
  2018-02-01 19:37   ` onox
  2018-02-02  0:09   ` Randy Brukardt
  2018-02-01 19:04 ` Robert A Duff
  3 siblings, 2 replies; 12+ messages in thread
From: Jeffrey R. Carter @ 2018-02-01 14:48 UTC (permalink / raw)


The referenced code has a library-level generic package Orka.Smart_Pointers with 
the public declaration

type Free_Ptr is not null access procedure (Value : in out Object_Type);

The body of the generic pkg has

    procedure Set
      (Object : in out Abstract_Pointer;
       Value  : Object_Type;
       Free   : not null access procedure (Value : in out Object_Type))
    is
       Procedure_Free : constant Free_Ptr := Free;
    begin
       if Object.Data /= null then
          --  Decrement old reference count
          Finalize (Object);
       end if;

       Object.Data := new Data_Record'(Object => Value, Free => Procedure_Free, 
References => <>);
    end Set;

    overriding
    procedure Finalize (Object : in out Abstract_Pointer) is
       Result : Natural;
    begin
       if Object.Data /= null then
          if Object.Data.References.Count = 0 then
             --  This exception is raised if the Future slot is released too early
             raise Program_Error;
          end if;
          Object.Data.References.Decrement (Result);
          if Result = 0 then
             Object.Data.Free (Object.Data.Object);
             Free (Object.Data);
          end if;
       end if;

       --  Idempotence: next call to Finalize has no effect
       Object.Data := null;
    end Finalize;

Set's parameter Free is an anonymous access-to-subprogram parameter; one is not 
supposed to be able to assign those and call them later, but that is exactly 
what this code does. Is this code legal? GNAT compiles it fine. If it is legal, 
should it be?

-- 
Jeff Carter
"My mind is aglow with whirling, transient nodes of
thought, careening through a cosmic vapor of invention."
Blazing Saddles
85


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Reference counter in smart pointers are not updated properly when used by multiple tasks
  2018-02-01  5:52 Reference counter in smart pointers are not updated properly when used by multiple tasks onox
                   ` (2 preceding siblings ...)
  2018-02-01 14:48 ` Jeffrey R. Carter
@ 2018-02-01 19:04 ` Robert A Duff
  3 siblings, 0 replies; 12+ messages in thread
From: Robert A Duff @ 2018-02-01 19:04 UTC (permalink / raw)


onox <denkpadje@gmail.com> writes:

>... so I am using atomics (GCC's
> __sync_sub_and_fetch_4) ...

There is a version of Ada.Strings.Unbounded that
uses atomics for reference counting.  You could look
at that as an example.  Look at a-strunb.ads.
If it has this comment near the top:

--  This package provides an implementation of Ada.Strings.Unbounded that uses
--  reference counts to implement copy on modification (rather than copy on
--  assignment). This is significantly more efficient on many targets.

--  This version is supported on:
--    - all Alpha platforms
--    - all ia64 platforms
--    - all PowerPC platforms
--    - all SPARC V9 platforms
--    - all x86 platforms
--    - all x86_64 platforms

then that's the right version to look at.

- Bob


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Reference counter in smart pointers are not updated properly when used by multiple tasks
  2018-02-01 14:48 ` Jeffrey R. Carter
@ 2018-02-01 19:37   ` onox
  2018-02-01 20:12     ` Jeffrey R. Carter
  2018-02-02  0:09   ` Randy Brukardt
  1 sibling, 1 reply; 12+ messages in thread
From: onox @ 2018-02-01 19:37 UTC (permalink / raw)


On Thursday, February 1, 2018 at 3:48:10 PM UTC+1, Jeffrey R. Carter wrote:
> Set's parameter Free is an anonymous access-to-subprogram parameter; one is not 
> supposed to be able to assign those and call them later, but that is exactly 
> what this code does. Is this code legal? GNAT compiles it fine. If it is legal, 
> should it be?

I hope so. When I instantiate the Orka.Smart_Pointers package, I do not know how to free the Object_Type (access to an interface). Only when I call Set I know how to free the object (access to a tagged type implementing the interface).

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Reference counter in smart pointers are not updated properly when used by multiple tasks
  2018-02-01 19:37   ` onox
@ 2018-02-01 20:12     ` Jeffrey R. Carter
  0 siblings, 0 replies; 12+ messages in thread
From: Jeffrey R. Carter @ 2018-02-01 20:12 UTC (permalink / raw)


On 02/01/2018 08:37 PM, onox wrote:
> On Thursday, February 1, 2018 at 3:48:10 PM UTC+1, Jeffrey R. Carter wrote:
>> Set's parameter Free is an anonymous access-to-subprogram parameter; one is not
>> supposed to be able to assign those and call them later, but that is exactly
>> what this code does. Is this code legal? GNAT compiles it fine. If it is legal,
>> should it be?
> 
> I hope so. When I instantiate the Orka.Smart_Pointers package, I do not know how to free the Object_Type (access to an interface). Only when I call Set I know how to free the object (access to a tagged type implementing the interface).

AIUI, it should be illegal. Because Ada has nested subprograms, it's possible 
that the subprogram won't exist when you call it. To prevent that from happening 
is why named access-to-subprogram types are subject to accessibility checks. 
Anonymous access-to-subprogram parameters don't have such checks because you're 
not supposed to be able to copy them.

-- 
Jeff Carter
"My mind is aglow with whirling, transient nodes of
thought, careening through a cosmic vapor of invention."
Blazing Saddles
85

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Reference counter in smart pointers are not updated properly when used by multiple tasks
  2018-02-01 14:48 ` Jeffrey R. Carter
  2018-02-01 19:37   ` onox
@ 2018-02-02  0:09   ` Randy Brukardt
  1 sibling, 0 replies; 12+ messages in thread
From: Randy Brukardt @ 2018-02-02  0:09 UTC (permalink / raw)


"Jeffrey R. Carter" <spam.jrcarter.not@spam.not.acm.org> wrote in message 
news:p4v9b8$8o1$1@dont-email.me...
> The referenced code has a library-level generic package 
> Orka.Smart_Pointers with the public declaration
>
> type Free_Ptr is not null access procedure (Value : in out Object_Type);
>
> The body of the generic pkg has
>
>    procedure Set
>      (Object : in out Abstract_Pointer;
>       Value  : Object_Type;
>       Free   : not null access procedure (Value : in out Object_Type))
>    is
>       Procedure_Free : constant Free_Ptr := Free;

This is illegal. An anonymous access-to-subprogram has "infinite" 
accessibility, so it cannot be assigned to any other kind of 
access-to-subprogram type. The entire idea is that such parameters don't 
live longer than the call that created them.

GNAT allows doing this with the distinctly non-Standard 
'Unrestricted_Access, so it probably can actually execute it, but that would 
not be the case for other Ada compilers.

If the parameter Free had type Free_Ptr, then this would work.

...
> Set's parameter Free is an anonymous access-to-subprogram parameter; one 
> is not supposed to be able to assign those and call them later, but that 
> is exactly what this code does. Is this code legal? GNAT compiles it fine. 
> If it is legal, should it be?

It's definitely not legal.

                    Randy.



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-02-02  0:09 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-01  5:52 Reference counter in smart pointers are not updated properly when used by multiple tasks onox
2018-02-01  8:39 ` Dmitry A. Kazakov
2018-02-01 10:01   ` onox
2018-02-01 10:28     ` onox
2018-02-01 10:51       ` Dmitry A. Kazakov
2018-02-01 10:57     ` Dmitry A. Kazakov
2018-02-01  8:41 ` Simon Wright
2018-02-01 14:48 ` Jeffrey R. Carter
2018-02-01 19:37   ` onox
2018-02-01 20:12     ` Jeffrey R. Carter
2018-02-02  0:09   ` Randy Brukardt
2018-02-01 19:04 ` Robert A Duff

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox