From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM,
	WEIRD_PORT autolearn=unavailable autolearn_force=no version=3.4.4
X-Received: by 10.107.190.199 with SMTP id o190mr23759590iof.32.1517464370834;
        Wed, 31 Jan 2018 21:52:50 -0800 (PST)
X-Received: by 10.157.3.6 with SMTP id 6mr443477otv.3.1517464370494; Wed, 31
 Jan 2018 21:52:50 -0800 (PST)
Path: 
 eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!feeder.eternal-september.org!news.unit0.net!peer02.am4!peer.am4.highwinds-media.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!w142no116125ita.0!news-out.google.com!s63ni210itb.0!nntp.google.com!w142no116124ita.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail
Newsgroups: comp.lang.ada
Date: Wed, 31 Jan 2018 21:52:50 -0800 (PST)
Complaints-To: groups-abuse@google.com
Injection-Info: glegroupsg2000goo.googlegroups.com;
 posting-host=80.114.173.191;
 posting-account=BtkjvAoAAADwEquGb07eykXfyiDMOxfl
NNTP-Posting-Host: 80.114.173.191
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <a89d76e4-a2b2-4ad9-8816-f3bfe9686930@googlegroups.com>
Subject: Reference counter in smart pointers are not updated properly when
 used by multiple tasks
From: onox <denkpadje@gmail.com>
Injection-Date: Thu, 01 Feb 2018 05:52:50 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Body-CRC: 1142211931
X-Received-Bytes: 5285
Xref: reader02.eternal-september.org comp.lang.ada:50236
Date: 2018-01-31T21:52:50-08:00
List-Id: <comp.lang.ada>

I am currently designing a job processing system for an OpenGL 4.5 render engine. All the 
work like scene culling and matrix transform updates are performed by small jobs which 
can be executed by a small number of workers. A job can have 0 or 1 dependent job (the 
successor). The first jobs that needs to be executed (the leaves in the job graph) are 
enqueued to a single queue. Multiple workers can then try to dequeue jobs. Multiple jobs 
can have the same successor job, so I am using atomics (GCC's __sync_sub_and_fetch_4) to 
decrement a counter and if the counter becomes 0 then the successor job needs to be 
enqueued (so at the start only the leaves of the job graph are enqueued).

A job can create a subgraph and this subgraph is then inserted between the current job 
and its optional dependent job (the successor). In particular there is currently a 
Parallize function which can spawn multiple Parallel_Job'Class jobs each with a different 
range. This means that if you call Parallelize (My_Parallel_Job, 24, 6) (it returns a 
Job_Ptr), the job will (when it gets executed) spawn 4 copies of My_Parallel_Job with the 
ranges 1..6, 7..12, 13..18, 19..24.

So far so good.

Now, when I enqueue a job, I want to know when its whole graph has been executed. To do 
this I have created a synchronized interface called Future. It has a function 
Current_Status which can return the values Waiting, Running, Done, and Failed. It also 
has an entry that blocks until the status becomes Done or Failed. The status is updated 
by a worker.

Furthermore the package (Orka.Jobs.Boss) that manages the workers and the queue has a 
variable that is an array of instances of Future. And it has a variable that is a 
protected type called Manager with an entry Acquire and procedure Release. This protected 
object is used to manage the array of Future objects. The idea is that these 2 operations 
have O(1) time complexity and that there is no unchecked deallocation. Only the jobs 
themselves are used via raw pointers and are freed by the workers after execution.

The pointer to a Future object is put inside a smart pointer. When the smart pointer 
detects that there are 0 references, then the Release procedure described above is called.

The whole job system itself works; the jobs gets executed and in the right sequence. It 
is the managing of the acquired Future object that is troubling: When a job is enqueued 
using the queue from Orka.Jobs.Boss, the Enqueue entry is given a smart pointer. If the 
smart pointer is empty, then we need to acquire a Future object, otherwise the smart 
pointer is put in a record together with the Job_Ptr and enqueued.

So here is the problem: after I have enqueued the first job and blocked on the Future's 
Wait_Until_Done entry, then (if there is only 1 worker) there remains 1 reference to the 
Future object (the smart pointer that I gave to the Enqueue entry). At the end of the 
main procedure of the executable I see the Future object getting released. This is good. 
But if I use 2 or more workers, then often there remain > 1 references to the Future 
object and the Future object doesn't get released. Sometimes there remain < 1 references 
and the Future object gets released too early.

So with 1 worker there is no problem, but with 2 or more workers the smart pointers act 
weird. I am just using GCC's __sync_sub_and_fetch_4 and __sync_add_and_fetch_4. I have 
also tried to use Atomic aspect for the references, but that didn't help.

So the smart pointers are not working properly when there are jobs (paired with the same 
smart pointer) being executed by multiple workers. I'm not sure if this is a bug in my 
own code or whether GNAT is doing something funky in the Finalize procedure.

Can someone provide some insight?

See https://github.com/onox/jobs-test/blob/master/examples/orka_test-test_9_jobs.adb

- Change Number_Of_Workers in src/orka-jobs-boss.ads:47 to 1 (OK) or 2 (buggy)

- Compile with `make`

TL;DR; Reference counter of smart pointers are not properly updated when used by multiple 
tasks.