From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM
	autolearn=unavailable autolearn_force=no version=3.4.4
X-Received: by 10.129.154.215 with SMTP id r206mr6198893ywg.35.1466812445862;
        Fri, 24 Jun 2016 16:54:05 -0700 (PDT)
X-Received: by 10.157.60.49 with SMTP id q46mr349845otc.0.1466802418907; Fri,
 24 Jun 2016 14:06:58 -0700 (PDT)
Path: 
 eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!mx02.eternal-september.org!feeder.eternal-september.org!news.glorb.com!r1no470053ige.0!news-out.google.com!d62ni13450ith.0!nntp.google.com!r1no470033ige.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail
Newsgroups: comp.lang.ada
Date: Fri, 24 Jun 2016 14:06:58 -0700 (PDT)
In-Reply-To: <nkfeqn$nl8$1@franka.jacob-sparre.dk>
Complaints-To: groups-abuse@google.com
Injection-Info: glegroupsg2000goo.googlegroups.com;
 posting-host=78.192.88.225;
 posting-account=21X1fwoAAABfSGdxRzzAXr3Ux_KE3tHr
NNTP-Posting-Host: 78.192.88.225
References: <58b78af5-28d8-4029-8804-598b2b63013c@googlegroups.com>
 <dsim6eF91ubU1@mid.individual.net> <nk19hh$o33$1@gioia.aioe.org>
 <d9d7f8f5-8b72-450b-8152-4b6116c6ce2c@googlegroups.com>
 <nkfeqn$nl8$1@franka.jacob-sparre.dk>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <f4c3e42a-1aa1-4667-83d7-de6f81a8fdd2@googlegroups.com>
Subject: Re: RFC: Prototype for a user threading library in Ada
From: Hadrien Grasland <hadrien.grasland@gmail.com>
Injection-Date: Fri, 24 Jun 2016 23:54:05 +0000
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Xref: news.eternal-september.org comp.lang.ada:30911
Date: 2016-06-24T14:06:58-07:00
List-Id: <comp.lang.ada>

Le jeudi 23 juin 2016 03:42:50 UTC+2, Randy Brukardt a =C3=A9crit=C2=A0:
> "Hadrien Grasland" wrote :
> ...
> >I agree that implementation support for coroutines would be extremely=20
> >valuable, if it
> >were available at the language level (as in Python, C#, Go...) or even i=
n=20
> >specific
> >implementations (as in Visual C++).
>=20
> Coincidentally, we just spent quite substantial portion of the most recen=
t=20
> ARG meeting discussing this. (See AI12-0197-1; there are proposed=20
> alternatives as well but those won't get posted for a few weeks - probabl=
y=20
> along with the minutes.)

Count me pleasantly surprised!


> The problem with such proposals is that they are quite expensive to=20
> implement, and they don't seem to buy that much above the existing Ada=20
> tasking model. [Especially as the proposal explicitly does not support an=
y=20
> concurrency; one has to use POs/atomics in the normal way if concurrency =
is=20
> needed.] (After all, if you really want coroutines in Ada, just use=20
> Janus/Ada and regular tasks as it implements all tasks that way. :-)
>=20
> The problem with the Janus/Ada implementation is the inability to use=20
> threads to implement that; that's fixable but I'd need a customer to help=
=20
> support the work. (I'd use a scheme internal to the task supervisor simil=
ar=20
> to your "events" rather than trying to assign tasks to threads.)

I would be happy to beta-test that feature if you also integrated Ada 2012 =
support along the way! :)

That aside, let me explain what I think coroutines are good for. When peopl=
e turn to threads, they usually look for some of the following things:

1. Exploiting the concurrent processing abilities of modern hardware (multi=
core, hyper-threading)
2. Providing the illusion of simultaneously running tasks to their users, i=
n a fashion that extends beyond actual hardware concurrency.
3. Hide various kinds of latencies (IO, decision-making) by doing other pro=
cessing in the meantime.
4. Handle IO-heavy workloads, the typical example being a web server going =
through millions of requests per second.

Unfortunately, no single threading implementation can be good at all of the=
se. And outside the embedded world, the average modern OS is optimized to p=
rovide the best possible illusion of infinite multitasking through round-ro=
bin thread scheduling, and managing threads at the kernel level so that the=
 kernel may quickly switch between them on clock interrupts instead of dele=
gating that task to user processes.

Sadly, this setup is terrible for concurrent application performance, as ca=
n be easily tested by running a multithreaded computation with overcommitte=
d CPU resources. If you allocate even just 2 times as many OS threads as yo=
u have hardware threads, you observe a huge performance drop. Why? Because =
instead of leaving computations alone, your OS keeps switching between thre=
ads during execution, each time doing a round trip through the kernel and t=
rashing the CPU cache. There is no such thing as free concurrent lunch.

For IO-heavy applications, the situation is even worse: you will pay the af=
orementioned overhead not only at the scheduling rate of your round-robin a=
lgorithm (typically ~1 kHz), but every single time your application blocks =
for IO. This is why no web server application that allocates one OS thread =
per connection can scale to more than a couple thousand connections per sec=
ond.

So if you don't desire the illusion of perfect multitasking, it is better t=
o give up on the user convenience of round robin and use some batch-derived=
 task scheduling algorithm instead. Which, because asking your customers to=
 modify their OS kernel configuration is not usually acceptable, entails on=
ly allocating as many OS threads as there are hardware threads, and managin=
g the remainder of your concurrency in user mode. Ergo, we need user thread=
s, which are easiest to implement on top of language-level coroutine suppor=
t.


> ...
> >However, I think that as it stands, we are just about as likely to see i=
t=20
> >happening in Ada as we are to get lambdas ...
>=20
> We also talked about limited lambdas in Pisa: see AI12-0190-1. So you're=
=20
> obviously right. ;-)

I stand once again pleasantly corrected, then :) Though I have to admit tha=
t in an Ada context, I miss first-class functions more than I miss lambdas:=
 you can relatively easily replace a lambda with an expression function dec=
lared at the appropriate scope, but you need an awful lot of function-speci=
fic boilerplate in order to produce a standalone function object that can b=
e easily transmitted to an outer scope after capturing some local state.


> The problem I have with the library approach (and the coroutines and the=
=20
> like intended to support it) is that it does seem to solve any problems. =
I=20
> understand why such approaches get used in languages that don't have a re=
al=20
> tasking model, but Ada hasn't had that problem since day 1. And the reaso=
ns=20
> that writing tasking code in Ada is too hard aren't getting addressed by=
=20
> these schemes (that is, race conditions, deadlocks [especially data=20
> deadlocks], and the like).
>
> I'd prefer to concentrate on language features that make it as easy to wr=
ite=20
> (restricted and correct) parallel code as it is to write sequential code.=
 I=20
> don't see how libraries or coroutines or lambdas are getting us any close=
r=20
> to that.
>
> I'd like to understand better the motivations for these features, so if y=
ou=20
> (or anyone else) wants to try to explain them to me, feel free. (But keep=
 in=20
> mind that I tend to be hard to convince of anything these days, so don't=
=20
> bother if you're going to give up easily. ;-)

See above. Being able to easily write highly concurrent code is of limited =
use if said code ends up running with terrible performance because modern O=
Ss are not at all optimized for this kind of workload. We shouldn't need to=
 worry about how our users' OS kernels are setup, and user threading and co=
routines are a solution to this problem.

Not that I am against also providing abstractions that make concurrent code=
 easier to write, mind you. I actually have plenty of ideas in that directi=
on. It is just that I think this is something that can largely be done at t=
he library level, without requiring too much help from the underlying progr=
amming language.