From: "Randy Brukardt" <randy@rrsoftware.com>
Subject: Re: Languages don't matter. A mathematical refutation
Date: Wed, 8 Apr 2015 16:16:08 -0500
Date: 2015-04-08T16:16:08-05:00 [thread overview]
Message-ID: <mg45qp$tt6$1@loke.gir.dk> (raw)
In-Reply-To: 878ue3ff6y.fsf@jester.gateway.sonic.net
"Paul Rubin" <no.email@nospam.invalid> wrote in message
news:878ue3ff6y.fsf@jester.gateway.sonic.net...
> "Randy Brukardt" <randy@rrsoftware.com> writes:
>>> GC doesn't use that much of the runtime of typical programs
>> Given that there is no such thing as a "typical program" when it comes to
>> memory management, that's a meaningless statement.
>
> Take all the actively used programs out there and sort by decreasing
> percentage of their runtime used by GC. Throw away the top and bottom
> 10% as outliers and call the other 80% typical. That seems meaningful
> to me.
>
>>> in GC'd languages, and the cpu overhead can be made quite small
>> And this isn't remotely the major expense,
>
> So what is the major expense, and how often should anyone care about it?
Tracing, as I discussed below.
>>> There is no allocation overhead (just bump a pointer)
>> And this is bogus. Either items are collected in place, which means that
>> you
>> have to allocate from fragmented memory (a lot of more expensive than
>> "bumping a pointer" - that was the cause of the issues we had with our
>> compiler back in the day), or they're compacted somehow.
>
> Yes, the idea is to use a copying collector, which compacts, so
> allocation is just bumping a pointer. I don't claim that the
> copying/compaction phase has no overhead. But, it is not too bad in
> practice, for most applications in which GC is usable at all.
>
>> And compaction is very expensive,
>
> Maybe your info is out of date. Modern gc's use generational copying,
> so during most collections, only recently created data gets copied,
> which isn't very much data.
That's a fantasy for many programs. And for the ones that it isn't, subpools
work just as well.
> There are also larger collections when
> everything gets copied and that takes longer, but it's less frequent.
Of course. But those are the only ones that actually do anything. And it's
that overhead (plus the excessive tracing overhead) that matter.
> The aggregate overhead isn't too much. There's tons of heavily used
> code out there using GC these days, e.g. Gmail is written in Java unless
> I'm mistaken. They wouldn't have done that if GC was unaffordable.
They did it because it allowed them to use trained apes to program. :-)
Let me be clear here: Ada is used by programmers that want to get their
programs right *and* need native code performance (performance might mean
space or mean speed). That means that we're typically worrying about the
last 25%. GC sucks up that last 25%.
"Most programmers", OTOH, don't care about correctness (well, more
accurately, their management doesn't care, because bugs allow them to sell
subscriptions and other products [why is anti-virus needed, for instance? An
unwillingness to adopt simple rules to prevent the running of foreign code,
mostly caused by programming in languages like C that don't check bounds.]
And often they don't care about performance, either, concentrating on
time-to-market and other issues.
These people are not (IMHO, I know David B. would disagree) ever going to be
Ada users. While there would be some benefit in faster time-to-market if one
spends less time debugging, and Ada development is more reproducible for
that reason, the up-front costs of actually designing your code will be too
much a time sink for such users to buy into. Unfortunate, but reality.
That's what I call "trained ape" programming. It's so easy, anyone can do
it, so anyone does, and we get the mounds of crapware out there that barely
works and steals your identity at the same time. Bah-humbug. ;-)
>> simply because copying things around for no reason is expensive. (Ada
>> 2012 has to add access-in-place operations to the Ada Containers
>> specifically to eliminate that overhead.)
>
> That sounds like you're talking about copying in the application and not
> in the GC. The access frequency could be far higher in the app,
> increasing the copying costs.
Copying (of anything) is unnecessary overhead. If the programmer put it
there, that's their problem. OTOH, if the implementation puts it there, it's
a problem for all programmers, and that's a real problem. Remember again
that I'm trying to get the fastest/smallest possible code to implement the
programmer's algorithm. Any implementation overhead that can be eliminated
should be eliminated, because that's overhead that the programmer can do
nothing about.
>> Our compiler uses a lot of pointers; most of them act as handles and
>> aren't actually dereferenced in the majority of the code, but they do
>> get passed around a lot. That means a lot of copies of pointers, so
>> lots of data structures to trace.
>
> You trace from each pointer just once, either by setting a mark bit in
> the object in the case of a mark-sweep collector, or by overwriting the
> pointed-to object with a forwarding pointer in the case of a copying
> collector. So the number of objects matters much more than the number
> of pointers.
That makes no sense to me whatsoever. The whole point of tracing is to
determine what is reachable (since GC cleans up objects that are no longer
reachable). There's two ways to do that; one is to retrace everything that
exists (based on some algorithm to avoid retracing stuff that hasn't
changed), alternatively, one can somehow determine when pointers are
destroyed and update data structures based on that. In the first case (which
seems to be what you are talking about), you have to revisit every pointer
variable that might have been changed (which is pretty much all of them)
with some frequency in order to determine what they're pointing at. Pointer
constants are rare in Ada (primarily, just formal parameters); everything
else is a variable.
And of course, a "forwarding pointer" is just an extra level of indirection.
There is an old saying in the compiler world that any problem can be solved
with a level of indirection. The problem is, that such a level of
indirection costs quite a bit, as much as 10% for frequently accessed
objects. And of course in native code, once that indirection is in place, it
will stay there forever (one could use a thunk instead to make the
indirection conditional, but that would be even more expensive, because of
cache effects).
> Anyway, look at it like this. If your compiler was written 30 years ago
> and ran at tolerable speed then, it's 1000x faster now because of
> advances in hardware. Even if using GC would have slowed it down by 2x,
> it's still 500x faster than before.
Our compiler was barely usable 30 years ago; the only thing that made it
useful was that everybody was in the same boat. (On the order of 5 minutes
per compilation unit.) There were many things that we could not do because
of lack of CPU power. The current version is faster by a lot, but it still
can take minutes per compilation unit (although most units are much faster).
There's still a lot of stuff that could be done that most users would not
tolerate the wait for. I'm not remotely interested in wasting a decent
amount of time on a mechanism that is not needed for most real-world Ada
code (not to mention a compiler). People who don't know better (or don't
care) may come to a different conclusion. They're probably not trying to
integrate proof into the compiler. :-)
Randy.
next prev parent reply other threads:[~2015-04-08 21:16 UTC|newest]
Thread overview: 94+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-25 11:46 Languages don't matter. A mathematical refutation Jean François Martinez
2015-03-25 15:19 ` Paul Rubin
2015-04-03 0:50 ` robin.vowels
2015-04-03 2:18 ` Jeffrey Carter
2015-04-03 13:37 ` Bob Duff
2015-04-03 14:13 ` Dmitry A. Kazakov
2015-04-03 17:34 ` Paul Rubin
2015-04-03 19:34 ` Dmitry A. Kazakov
2015-04-03 19:58 ` Paul Rubin
2015-04-04 6:59 ` Dmitry A. Kazakov
2015-04-06 21:12 ` Paul Rubin
2015-04-07 5:57 ` Dmitry A. Kazakov
2015-04-08 4:12 ` Paul Rubin
2015-04-08 6:45 ` Dmitry A. Kazakov
2015-04-04 0:41 ` Dennis Lee Bieber
2015-04-04 3:05 ` Paul Rubin
2015-04-04 14:46 ` Dennis Lee Bieber
2015-04-04 15:41 ` brbarkstrom
2015-04-04 19:20 ` Paul Rubin
2015-04-04 20:00 ` Dmitry A. Kazakov
2015-04-04 20:44 ` Paul Rubin
2015-04-05 8:00 ` Dmitry A. Kazakov
2015-04-05 9:55 ` Brian Drummond
2015-04-06 21:27 ` Randy Brukardt
2015-04-06 17:07 ` Paul Rubin
2015-04-06 17:41 ` Dmitry A. Kazakov
2015-04-06 18:35 ` Paul Rubin
2015-04-06 21:46 ` Randy Brukardt
2015-04-06 22:12 ` Paul Rubin
2015-04-06 23:40 ` Jeffrey Carter
2015-04-07 19:07 ` Randy Brukardt
2015-04-08 3:53 ` Paul Rubin
2015-04-08 21:16 ` Randy Brukardt [this message]
2015-04-09 1:36 ` Paul Rubin
2015-04-09 23:26 ` Randy Brukardt
2015-04-09 2:36 ` David Botton
2015-04-09 8:55 ` Georg Bauhaus
2015-04-09 9:38 ` Dmitry A. Kazakov
2015-04-09 13:14 ` G.B.
2015-04-09 14:35 ` Dmitry A. Kazakov
2015-04-09 15:43 ` G.B.
2015-04-09 17:26 ` Dmitry A. Kazakov
2015-04-09 18:40 ` Niklas Holsti
2015-04-09 19:02 ` Dmitry A. Kazakov
2015-04-09 20:38 ` Paul Rubin
2015-04-09 23:35 ` Randy Brukardt
2015-04-10 14:16 ` G.B.
2015-04-10 20:58 ` Randy Brukardt
2015-04-07 0:36 ` Dennis Lee Bieber
2015-04-05 13:57 ` Dennis Lee Bieber
2015-04-03 16:17 ` J-P. Rosen
2015-04-03 17:33 ` Bob Duff
2015-04-26 11:38 ` David Thompson
2015-04-03 19:00 ` Georg Bauhaus
2015-04-03 19:12 ` Jeffrey Carter
2015-04-03 22:37 ` Bob Duff
2015-04-03 23:38 ` Jeffrey Carter
2015-04-04 0:15 ` Bob Duff
2015-04-04 7:06 ` Dmitry A. Kazakov
2015-04-04 2:59 ` Paul Rubin
2015-04-04 0:56 ` Dennis Lee Bieber
2015-03-25 17:12 ` Jean François Martinez
2015-03-26 13:43 ` Maciej Sobczak
2015-03-26 15:01 ` Jean François Martinez
2015-03-26 17:45 ` Jeffrey Carter
2015-03-26 15:21 ` Dmitry A. Kazakov
2015-03-27 11:25 ` Jean François Martinez
2015-03-27 17:36 ` Dmitry A. Kazakov
2015-03-30 10:31 ` Jean François Martinez
2015-03-30 11:52 ` Dmitry A. Kazakov
2015-03-30 12:32 ` G.B.
2015-03-30 13:48 ` Dmitry A. Kazakov
2015-03-30 15:47 ` G.B.
2015-03-30 16:05 ` Dmitry A. Kazakov
2015-04-02 12:59 ` brbarkstrom
2015-04-02 13:35 ` Dmitry A. Kazakov
2015-04-02 14:48 ` jm.tarrasa
2015-04-02 15:55 ` brbarkstrom
2015-04-02 16:21 ` Jean François Martinez
2015-04-02 16:48 ` Dmitry A. Kazakov
2015-04-02 16:41 ` Dmitry A. Kazakov
2015-04-04 10:02 ` jm.tarrasa
2015-04-04 11:16 ` Dmitry A. Kazakov
2015-04-02 15:58 ` Jean François Martinez
2015-04-02 16:39 ` Dmitry A. Kazakov
2015-04-03 9:46 ` Jean François Martinez
2015-04-03 14:00 ` Dmitry A. Kazakov
2015-04-03 17:12 ` Jean François Martinez
2015-04-02 17:17 ` G.B.
2015-04-02 19:09 ` Dmitry A. Kazakov
2015-04-02 18:24 ` Niklas Holsti
2015-04-02 18:43 ` Jeffrey Carter
2015-03-30 11:36 ` Jean François Martinez
2015-03-30 10:48 ` jm.tarrasa
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox