From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Thread: 103376,8e11100f675ea2df
X-Google-NewGroupId: yes
X-Google-Attributes: gida07f3367d7,domainid0,public,usenet
X-Google-Language: ENGLISH,ASCII-7-bit
X-Received: by 10.180.86.201 with SMTP id r9mr12391119wiz.4.1357166640911;
        Wed, 02 Jan 2013 14:44:00 -0800 (PST)
Path: 
 l12ni279469wiv.1!nntp.google.com!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: Niklas Holsti <niklas.holsti@tidorum.invalid>
Newsgroups: comp.lang.ada
Subject: Re: asynchronous task communication
Date: Thu, 03 Jan 2013 00:43:58 +0200
Organization: Tidorum Ltd
Message-ID: <akjrheF1fqhU1@mid.individual.net>
References: <1c2dnd5E6PMDR33NnZ2dnUVZ_sednZ2d@earthlink.com>
 <50e18094$0$6583$9b4e6d93@newsspool3.arcor-online.net>
 <7NednS4s2oukfXzNnZ2dnUVZ_oadnZ2d@earthlink.com>
 <aked87Fpfc1U1@mid.individual.net>
 <7cudnYloBfQDw3_NnZ2dnUVZ_rKdnZ2d@earthlink.com>
Mime-Version: 1.0
X-Trace: individual.net
 Q7WcPXTZFS+0KqGyQ+PgDg0o4MU2k5BXAVFQjcpG74LHZwlpaQLTtPSbGq/cMMPkqc
Cancel-Lock: sha1:n56HLIW7E9gKMRM5c7LQxz+FMxs=
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6;
 rv:17.0) Gecko/17.0 Thunderbird/17.0
In-Reply-To: <7cudnYloBfQDw3_NnZ2dnUVZ_rKdnZ2d@earthlink.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Date: 2013-01-03T00:43:58+02:00
List-Id: <comp.lang.ada>

On 13-01-01 05:51 , Charles Hixson wrote:
> On 12/31/2012 01:09 PM, Niklas Holsti wrote:
>> On 12-12-31 20:52 , Charles Hixson wrote:
>>> Thank you for your answer.
>>>
>>> The problem with that solution is that I've been advised that I
>>> shouldn't have many more tasks than available cores,
>>
>> IMO that advice is useful only if your application logically needs fewer
>> tasks than you have cores, but you want to increase the number of tasks
>> in order to have more cores running in parallel
   ....
>> But if your application logically needs more tasks than there are cores,
>> because of the way tasks communicate and interact, there is absolutely
>> no reason to limit the number of tasks to the number of cores, or
>> anything related to the number of cores.
>>
> My application logically "needs" thousands or millions of tasks.

Hmm. I'm not sure that I agree with you, more below.

> But this is clearly impractical, so I'm trying to work around it.

A few thousands could be practical, but I agree that a million tasks is
not. Even if the scheduling could work, the stacks will need a huge
amount of RAM.

> The application is clearly going to be "communication bound".

I'm not sure about that, either...

> OTOH, even
> with millions of separate cores, I'd still need asynchronous communication.
> 
> FWIW, you can think of what I'm doing is a neural network simulation.
> There are reasons why that isn't quite accurate, but it's close.  And
> each message is likely (but not certain) to get a reply and may also
> cause further messages to be sent on.

That sounds more like discrete event simulation, simulating a network of
interacting automata with discrete states and state transitions. It
doesn't sound so much like an analog neural network.

> Occasionally new nodes will be
> added to the net.  So logically, each "cell" should be a separate task,

There is always a choice of two representations for an automaton: as an
algorithm (i.e. a task body), or as a data object. The former represents
the automaton's state as a point in the algorithm (i.e. program counter
+ call stack + local variables), the latter as a state identifier (an
enumeration) and perhaps some additional variables (if the automaton
model has this concept).

If you have several automata, and represent them as algorithms, and the
automata change states at different, scattered times, then it is almost
necessary to implement them as tasks -- independently progressing
threads of control.

In your case, I suspect (but of course I'm not sure) that the logical
states of an individual automaton (a network node) are not complex
enough to merit representing as an algorithm, so the representation as a
data object seems equally valid. Which is why I'm not sure that your
application "needs" millions of tasks, even logically (but this logic is
subjective).

> but even so asynchronous communication is needed to avoid deadlock.

Very likely, if the network nodes would be implemented as tasks. But
this depends on how your simulation handles time. Do you have a global
concept of time-step or cycle, in which all network nodes progress by
one elementary step or action, or perhaps stay in the same state if they
receive no messages? Or do all nodes run "in parallel" with arbitrary
relative speeds?

Another question: do you expect the whole network to converge to some
stable state (fixed point), as for example in the iterative solution of
a partial differential equation? Or do you expect the network to
continue working and changing, and the interesting result is this
dynamic activity rather than some stable final state?

In the former case (iteration to stability), I would expect (by analogy
with PDEs) that the precise relative execution speeds of the network
nodes or cells is less important, and could even be pseudo-random, as
long as all nodes make positive progress. In the latter case (iteration
to simulate dynamic behaviour) I assume you need some global notion of
time and cycles.

> Since I can't model cells as tasks, I'm planning on modeling them as
> protected variables.  Even this may be too limiting, because of the
> requirement for asynchronous communication.  Particularly since
> protected variables are passive, rather than active, so each one is
> going to need to wait for a task to activate it.

It sounds like each cell implements more or less the same automaton
(state transitions), assuming that all the protected objects (variables)
are instances of the same protected type.

If so, it seems to me that this system is a good candidate for the
"worker task pool" approach. A "job" would be the execution of one state
transition (message reception?) for one cell.

Would this kind of design make some sense:

- Define a record type to represent the state of one cell (including its
connections to other cells). This does not have to be a protected type.

- Create as many worker tasks as there are processor cores.

- Allocate a subset of the cells to each worker task.

- Each worker task executes the state transitions and actions of all the
cells allocated to this task. The task maintains a queue of messages to
these cells (a "work list").

- When a cell sends a message to another cell allocated to the same
task, the message is simply appended to the message queue of that task.
The queue does not need to be protected against concurrent access,
because this queue is accessed only by this task.

- When a cell sends a message to another cell allocated to another task,
the message is appended to a queue of "incoming" messages, which is a
protected object associated with that other task.

- Each worker tasks takes and processes messages from its "incoming"
queue at suitable times, depending on the presence or absence of a
global time cycle.

Since all communication is internal, the only reason for a worker task
to be idle is if it happens that none of its cells has received any
message, which seems unlikely to happen if there are millions of cells.
So this design would be computation-bound, not communication-bound.

If the incoming-message queues become a contention bottleneck, there
could be a separate incoming-message queue for each pair of (sender
task, receiver task), or perhaps a lock-free queue implementation would
help.

> But this appears to be
> quite likely to interfere with normal execution.  The classic answer is
> to have two copies of the network and to cycle through cone copying it
> and state changes into the second.  Then, at the end of a cycle, reverse
> it.  This requires the whole thing to be copied in each cycle, even the
> parts that haven't changed in this cycle.

You can put the two copies of the state in each record object that
represents a cell, together with a per-cell indication of which copy is
"current". If the cell does not change its state, the current-state
indicator is unchanged; nothing need be copied. If the cell changes its
state, some data are changed, but at least the old and new states are
close to each other, and perhaps in the same cache lines, or at least in
the same VM page, which should reduce the cost of the copy.

-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
      .      @       .