From: onox <denkpadje@gmail.com>
Subject: Reference counter in smart pointers are not updated properly when used by multiple tasks
Date: Wed, 31 Jan 2018 21:52:50 -0800 (PST)
Date: 2018-01-31T21:52:50-08:00 [thread overview]
Message-ID: <a89d76e4-a2b2-4ad9-8816-f3bfe9686930@googlegroups.com> (raw)
I am currently designing a job processing system for an OpenGL 4.5 render engine. All the
work like scene culling and matrix transform updates are performed by small jobs which
can be executed by a small number of workers. A job can have 0 or 1 dependent job (the
successor). The first jobs that needs to be executed (the leaves in the job graph) are
enqueued to a single queue. Multiple workers can then try to dequeue jobs. Multiple jobs
can have the same successor job, so I am using atomics (GCC's __sync_sub_and_fetch_4) to
decrement a counter and if the counter becomes 0 then the successor job needs to be
enqueued (so at the start only the leaves of the job graph are enqueued).
A job can create a subgraph and this subgraph is then inserted between the current job
and its optional dependent job (the successor). In particular there is currently a
Parallize function which can spawn multiple Parallel_Job'Class jobs each with a different
range. This means that if you call Parallelize (My_Parallel_Job, 24, 6) (it returns a
Job_Ptr), the job will (when it gets executed) spawn 4 copies of My_Parallel_Job with the
ranges 1..6, 7..12, 13..18, 19..24.
So far so good.
Now, when I enqueue a job, I want to know when its whole graph has been executed. To do
this I have created a synchronized interface called Future. It has a function
Current_Status which can return the values Waiting, Running, Done, and Failed. It also
has an entry that blocks until the status becomes Done or Failed. The status is updated
by a worker.
Furthermore the package (Orka.Jobs.Boss) that manages the workers and the queue has a
variable that is an array of instances of Future. And it has a variable that is a
protected type called Manager with an entry Acquire and procedure Release. This protected
object is used to manage the array of Future objects. The idea is that these 2 operations
have O(1) time complexity and that there is no unchecked deallocation. Only the jobs
themselves are used via raw pointers and are freed by the workers after execution.
The pointer to a Future object is put inside a smart pointer. When the smart pointer
detects that there are 0 references, then the Release procedure described above is called.
The whole job system itself works; the jobs gets executed and in the right sequence. It
is the managing of the acquired Future object that is troubling: When a job is enqueued
using the queue from Orka.Jobs.Boss, the Enqueue entry is given a smart pointer. If the
smart pointer is empty, then we need to acquire a Future object, otherwise the smart
pointer is put in a record together with the Job_Ptr and enqueued.
So here is the problem: after I have enqueued the first job and blocked on the Future's
Wait_Until_Done entry, then (if there is only 1 worker) there remains 1 reference to the
Future object (the smart pointer that I gave to the Enqueue entry). At the end of the
main procedure of the executable I see the Future object getting released. This is good.
But if I use 2 or more workers, then often there remain > 1 references to the Future
object and the Future object doesn't get released. Sometimes there remain < 1 references
and the Future object gets released too early.
So with 1 worker there is no problem, but with 2 or more workers the smart pointers act
weird. I am just using GCC's __sync_sub_and_fetch_4 and __sync_add_and_fetch_4. I have
also tried to use Atomic aspect for the references, but that didn't help.
So the smart pointers are not working properly when there are jobs (paired with the same
smart pointer) being executed by multiple workers. I'm not sure if this is a bug in my
own code or whether GNAT is doing something funky in the Finalize procedure.
Can someone provide some insight?
See https://github.com/onox/jobs-test/blob/master/examples/orka_test-test_9_jobs.adb
- Change Number_Of_Workers in src/orka-jobs-boss.ads:47 to 1 (OK) or 2 (buggy)
- Compile with `make`
TL;DR; Reference counter of smart pointers are not properly updated when used by multiple
tasks.
next reply other threads:[~2018-02-01 5:52 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-02-01 5:52 onox [this message]
2018-02-01 8:39 ` Reference counter in smart pointers are not updated properly when used by multiple tasks Dmitry A. Kazakov
2018-02-01 10:01 ` onox
2018-02-01 10:28 ` onox
2018-02-01 10:51 ` Dmitry A. Kazakov
2018-02-01 10:57 ` Dmitry A. Kazakov
2018-02-01 8:41 ` Simon Wright
2018-02-01 14:48 ` Jeffrey R. Carter
2018-02-01 19:37 ` onox
2018-02-01 20:12 ` Jeffrey R. Carter
2018-02-02 0:09 ` Randy Brukardt
2018-02-01 19:04 ` Robert A Duff
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox