From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-0.9 required=5.0 tests=BAYES_00,FORGED_GMAIL_RCVD,
	FREEMAIL_FROM autolearn=no autolearn_force=no version=3.4.4
X-Google-Thread: 103376,59c52143b2a1463b
X-Google-NewGroupId: yes
X-Google-Attributes: gida07f3367d7,domainid0,public,usenet
X-Google-Language: ENGLISH,ASCII
Path: 
 g2news1.google.com!postnews.google.com!q12g2000yqj.googlegroups.com!not-for-mail
From: Gene <gene.ressler@gmail.com>
Newsgroups: comp.lang.ada
Subject: Re: How many hardware threads?
Date: Mon, 12 Jul 2010 17:01:00 -0700 (PDT)
Organization: http://groups.google.com
Message-ID: 
 <4627c02d-c44c-4841-bd77-e25aee4be173@q12g2000yqj.googlegroups.com>
References: <4c3a65d7$0$2405$4d3efbfe@news.sover.net>
 <xah7sg1n8ib8$.26txy5fbnmui.dlg@40tude.net>
NNTP-Posting-Host: 184.12.82.120
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
X-Trace: posting.google.com 1278979260 27741 127.0.0.1 (13 Jul 2010 00:01:00
 GMT)
X-Complaints-To: groups-abuse@google.com
NNTP-Posting-Date: Tue, 13 Jul 2010 00:01:00 +0000 (UTC)
Complaints-To: groups-abuse@google.com
Injection-Info: q12g2000yqj.googlegroups.com; posting-host=184.12.82.120;
	posting-account=-BkjswoAAACC3NU8b6V8c50JQ2JBOs04
User-Agent: G2/1.0
X-HTTP-UserAgent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_4; en-us)
	AppleWebKit/533.16 (KHTML, like Gecko) Version/5.0 Safari/533.16,gzip(gfe)
Xref: g2news1.google.com comp.lang.ada:12360
Date: 2010-07-12T17:01:00-07:00
List-Id: <comp.lang.ada>

On Jul 12, 3:40=A0am, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
wrote:
> On Sun, 11 Jul 2010 20:47:41 -0400, Peter C. Chapin wrote:
> > I see a couple of difficulties with writing effective parallel programs
> > for "ordinary" applications (that is, applications that are not
> > embarrassingly parallel). One difficulty is load balancing: how can one
> > decompose a problem to keep all processors reasonably busy? The other
> > difficulty is scalability: how can one design a single program that can
> > use 2, 4, 16, 128, or more processors effectively without knowing ahead
> > of time exactly how many processors there will be? I'm not an expert in
> > Ada tasking but it seems like these questions are as big a problem for
> > Ada as they are for any other language environment.
>
> Back in 90's, during the era of transputers, concurrent algorithms were
> decomposed knowing in advance the number of processors and the topology o=
f
> the network of. (Unlike to multi-cores the transputers didn't share memor=
y,
> they communicate over serial links connected physically) That time the
> consensus was that the problem is not solvable in general. So you designe=
d
> up front both the algorithm and the topology.
>
> > I'm not an expert in
> > Ada tasking but it seems like these questions are as big a problem for
> > Ada as they are for any other language environment.
> > I'm not looking for a solution to all tasking problems here. But there
> > is one feature that seems like a necessary prerequisite to such a
> > solution. The language (or its standard library) needs to provide a
> > portable way for the program to determine how many hardware threads are
> > available.
>
> Well, maybe, but I don't think it would bring much. Especially because
> normally cores support multi-tasking. It would be more important for the
> architectures with the cores that do not (GPU etc).
>
> BTW, "hardware thread" =3D core? processor? ALU + an independent memory
> channel? etc. It is quite difficult to define and the algorithm's
> performance may heavily depend on the subtleness. ARG would say, look, it
> does not make sense for all platforms, forget it.
>
> > I'm about to write a simple program that decomposes into parallel,
> > compute-bound tasks quite nicely. How many such tasks should I create?
>
> Back in time it was popular to make it adaptive. I.e. you monitor the
> performance and adjust the size of the working threads pool as you go. I
> remember some articles on this topic, but it was long long ago...
>
> --
> Regards,
> Dmitry A. Kazakovhttp://www.dmitry-kazakov.de

Dmitry has it exactly right.  When a "hardware thread" can be anything
from a fraction of a core to a node on an Ethernet-connected cluster,
how many you have is not such an important question.

Guessing high and allowing the OS to sort things out has severe
limitations.  What to guess?  4? 16? 256?

When there is only a small number of independent performance control
variables, e.g. one in the case of a common thread pool, the self-
monitoring and adjustment scheme is the only one with legs.

Other multi-threading models like work stealing present a more complex
tuning problem.  Evolutionary (genetic) algorithms ought to be a
powerful way for software to adapt to its own environment. Is there
anyone doing work on this?