Languages don't matter. A mathematical refutation

comp.lang.ada
 help / color / mirror / Atom feed

* Languages don't  matter.  A mathematical refutation
@ 2015-03-25 11:46 Jean François Martinez
  2015-03-25 15:19 ` Paul Rubin
                   ` (2 more replies)
  0 siblings, 3 replies; 94+ messages in thread
From: Jean François Martinez @ 2015-03-25 11:46 UTC (permalink / raw)


n often heard assertion is "Languages don't matter".  Lets see if we can
refute it.

We cannot use an example of a single team developping some program
in language A (eg C/C++) vs another team developping the same pogram
in language B (say Ada) because differences could be due to team quality.

What we need is a program repeatedly developped both in C and Ada so we
can use statistical analysis on it. We are going to use  the well knwon 
example of Professor McCormick who teached Real Time Programming 
at University of Northern Iowa.   In his class groups of students had to
program a computer controlled model railroad.  

So for six years students had to do it in C and none ever succeded in
providing a working program despite that in later years their professor
provided them 60% of the code.

Then professor switched to Ada and despite not getting the hand holding
the C students had got, 50% of the groups provided a working solution.
In later years after professor provided 20% of the code in a component
relanting to the windowing inteface success rate was 75%.


Now let's analyse this mathematically. 

The null hypothesis is

Sc = Sa.   
With Sc = Succes rate in C and Sa = Success rate in Ada


Professor McCormick's paper lets some factors in the dark so whenever
there is a doubt we will tilt them in favor of the null hypothesis  

1) We will assumme six groups per year despite the wording in Professor
McCormick's text suggesting the number is significantly larger.  Since
the larger the sample the more discriminant the test this
favors the null hypothesis ie difference too small for rejecting the 
null hypothesis

2) We will keep the number of years to six for both languages.  Actually 
in Professors McCormick's paper the number of Ada years is 7 and he has
continued using Ada for fourteen more years.  However we will discard these
additional Ada years.  Again this favors the null hypothesis.

3) We will ignore both the 75% success rate of Ada students in following 
years and the fact they got significantly less help from the teacher than 
the C students. 

I apply the testing methods for the case where "true" standard deviations are
are unknown.

So                             C sample             Ada Sample 

Xbar (empirical mean)            0                          0.5
Variance                         0              ( 36 * 0.5**2) / (36 -1) =0.257
Standard Deviation  (Sigma)      0              sqrt(0.257) = 0.507


The standard deviation for the difference between two empiraical means is
      sqrt(Variance1/Size1  + Variance2/Size2)).

So in our case it is sqrt( 0/36 + 0.257/36) = 0.064

Our samples are large enough we don't have to use the t Student test but 
can just use the Normal distribution.  Abs ( (0.5 - 0) / 0.084 ) = 5.95

Probability is < 10**-5.   That means the hypothesis of 
Success rate in C = Success rate in Ada is, as expected, rejected.  Smashed
would be a better description.

At this point the only thing we can say is "the difference between the 
results obtained by the Ada teams those of the teams is far too high for
"luck".   We have to look at the possible causes.   What changed?
It was the same problem, the same teacher using the same methods.  It was the same university and it doesn't seem to have suddenly got far higher status and 
started attracing better, brighter students.  If we had the SAT scores of the
C-era and of the Ada-era students we would be able to confim it but there is 
no lement pointing at the  Ada-era students being brighter and better.  The only remainingg factor is the programming language used.  So we have an example in which "Language  mattered ".

Since we have a counterexample the assertion "Languages don't matter" is 
proven false. 

QED.  :-) 


Late thoughts.

I had a colleague who kept using telnet despite my warnings about the security
problems and the fact that for the user, that was him, the only difference is that with ssh you only need to type 3 letters instead of 6.  He stupidly kept
saying "I have ever used telnet and I am not going to change".   And he didn't  

I don't expect any C user will be convinced by my demonstration.  What I expect 
is him moving the goalposts and saying things like "Languages don't matter on
average.  Only programmers. Hire better programmers." and ignoring you when you
point that saying "on average" is admitting there are problems where language matters thus conceading defeat.

A Greek philopher told 
"Against human stupidity the gods themselves can do nothing".    How true.  

This demonstration was just for fun and for unrusting oooooooold skills.

Jean François Martinez


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-03-25 11:46 Languages don't matter. A mathematical refutation Jean François Martinez
@ 2015-03-25 15:19 ` Paul Rubin
  2015-04-03  0:50   ` robin.vowels
  2015-03-25 17:12 ` Jean François Martinez
  2015-03-26 13:43 ` Maciej Sobczak
  2 siblings, 1 reply; 94+ messages in thread
From: Paul Rubin @ 2015-03-25 15:19 UTC (permalink / raw)


Jean François Martinez <darkquark99@gmail.com> writes:
> So for six years students had to do it in C...
> Then professor switched to Ada ... 
> At this point the only thing we can say is "the difference between the 
> results obtained by the Ada teams those of the teams is far too high for
> "luck".   We have to look at the possible causes.   What changed? ...

One possibility is C has a longer learning curve but catches up in
productivity once the programmer is used to it.  I don't particularly
believe that theory, but the experiment would be more convincing if it
involved teams of experienced programmers rather than students.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-03-25 11:46 Languages don't matter. A mathematical refutation Jean François Martinez
  2015-03-25 15:19 ` Paul Rubin
@ 2015-03-25 17:12 ` Jean François Martinez
  2015-03-26 13:43 ` Maciej Sobczak
  2 siblings, 0 replies; 94+ messages in thread
From: Jean François Martinez @ 2015-03-25 17:12 UTC (permalink / raw)


Actually I thought about it.  That experiment only tell us what programmers at student level can do with both languages.  

But if a programmer needs years of experience and practice before being able to do what a student, perhaps even a first year student, can do then language made a difference.  Specially if you are the employer of the programmer who is paying him until he gets that level of expertise.

I agree it would be better if we could a have significant sample of programmers programming the same thing in Ada and C at their first paying job.  It would be a better mesasure of their early real, ie paying, world performance.  Unfortunately this is not going to happen.  

Another similar objection would be:  _Good_ C programmers could have succeded   But if you need a _good_ programmer, one who belongs to the top 1 perscent to do what 75% of Ada programmers can do then language made a difference.

A more serious objection would be that the Ada programmers in Professor McCormick's experiment were working in the 90s with 90s era development tools while the C students were working in the 80s.  Of course the drastic improvement from the first year belies this.    Solution would be to a comparison last C year against first Ada year but then samples of size six (my hypotheis) would be too small so it could be necessary to ask Professor McCormick about the real number of groups.  From the wording he used it seemed to be significantly larger.  
Another reason to believe evolution of development environments didn't play a role is the fact that both in both the Ada and C subgroups the success rate didn't evolve with time.  


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-03-25 11:46 Languages don't matter. A mathematical refutation Jean François Martinez
  2015-03-25 15:19 ` Paul Rubin
  2015-03-25 17:12 ` Jean François Martinez
@ 2015-03-26 13:43 ` Maciej Sobczak
  2015-03-26 15:01   ` Jean François Martinez
  2015-03-26 15:21   ` Dmitry A. Kazakov
  2 siblings, 2 replies; 94+ messages in thread
From: Maciej Sobczak @ 2015-03-26 13:43 UTC (permalink / raw)



> We have to look at the possible causes.   What changed?

At some point in time programming was introduced as a subject in the schools that precede university. You know, just as today some people are trying to push programming to nursery schools. The level (and therefore age) when programming is first introduced becomes lower and lower with time and this means that at the university level, as years go by, you can expect students to have higher and higher initial programming experience. Years ago a typical student entered university with zero programming experience and now this initial experience is significantly higher, thus allowing students to be more successful with exercises if the exercises do not change. And since such changes in the scholar system are introduced in a step-wise manner, you can also expect that at some particular time in the past the batch of new students had a step-wise more experience at the entry.

Unless you prove that this had no influence on the results in question, I refute the whole of your mathematical proof.

> This demonstration was just for fun

Sure. Preaching to the choir is always fun. ;-)

But more seriously, it would be interesting to flip languages every year and see whether the advantages of Ada hold in terms of better results. Or run two classes in parallel (with random assignments of students to each class) and compare results over the years.

-- 
Maciej Sobczak * http://www.msobczak.com * http://www.inspirel.com


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-03-26 13:43 ` Maciej Sobczak
@ 2015-03-26 15:01   ` Jean François Martinez
  2015-03-26 17:45     ` Jeffrey Carter
  2015-03-26 15:21   ` Dmitry A. Kazakov
  1 sibling, 1 reply; 94+ messages in thread
From: Jean François Martinez @ 2015-03-26 15:01 UTC (permalink / raw)

On Thursday, March 26, 2015 at 2:43:02 PM UTC+1, Maciej Sobczak wrote:

> At some point in time programming was introduced as a subject in the schools that precede university. You know, just as today some people are trying to push programming to nursery schools. The level (and therefore age) when programming is first introduced becomes lower and lower with time and this means that at the university level, as years go by, you can expect students to have higher and higher initial programming experience. Years ago a typical student entered university with zero programming experience and now this initial experience is significantly higher, thus allowing students to be more successful with exercises if the exercises do not change. And since such changes in the scholar system are introduced in a step-wise manner, you can also expect that at some particular time in the past the batch of new students had a step-wise more experience at the entry.
> 

Good objection.  It would be valid between, say, people having entered university in 1985 and people entering in 2015. But from one year to another success rate went from 0 (zero) to 50%.  Do you _really_ think in one year of programming in Visual Basic in high school, that's s one year where they also had to attend to a number of disciplines unrelated to programming, made such a difference?

> Unless you prove that this had no influence on the results in question, I refute the whole of your mathematical proof.
> 

See above.

> > This demonstration was just for fun
> 
> Sure. Preaching to the choir is always fun. ;-)
> 

No, it isn't. More than once I have battled with a particularly stubborn macho programmer.  I have already tried to use (another kind of) math on him to no avail.  I wrote that paper as propaganda material then I thought about him and knew he would just shrug it off.  "There is no hollow charge shell able to prevail against brains made of cast iron".  Solsyenitsyn. The smilie and the fun part was to hide despair. 

> But more seriously, it would be interesting to flip languages every year and see whether the advantages of Ada hold in terms of better results. Or run two classes in parallel (with random assignments of students to each class) and compare results over the years.
> 

I know.  But it is not possible.  We have to take what we have.  Also one thing I haven't discused is the "parapsychology effect".  One proponent of
parapsychology carefully collected examples who were too good to be just random
noise.  He was honest and very scientific so people took him seriously.
Problem is: the people who were gfeeding him data sent him only the experiments in which the subject had got high scores.   In fact it was the problem of getting heads twenty consecutive times.  If you play 2**20 games then you have over 60% chances of getting it.  And if you play 10 million games the chance you have of _not_ having 20 consecutive heads is inferior to 1/20000

But we don't have had thousands and thousands of people who had groups of students using C and Ada.   So when someone  reports a diffeernce who has a probability inferior to 1 in 10**5 it cannot be dismissed as "parapychology efect".  

One point however is that I need to do some fact collecting.  Like if university curriculim made that students of Professor McCormick had previous experience in Ada and not in C.  

BTW:  For pre-university experience.  In late eighties there were cheap C compilers like Borland's so a motiveted high school teenager would have
been able to afford.  Even if Gnat had been availbale it is far more dubious he would have used it to program in Ada.

> -- 
> Maciej Sobczak * http://www.msobczak.com * http://www.inspirel.com

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-03-26 13:43 ` Maciej Sobczak
  2015-03-26 15:01   ` Jean François Martinez
@ 2015-03-26 15:21   ` Dmitry A. Kazakov
  2015-03-27 11:25     ` Jean François Martinez
  1 sibling, 1 reply; 94+ messages in thread
From: Dmitry A. Kazakov @ 2015-03-26 15:21 UTC (permalink / raw)

On Thu, 26 Mar 2015 06:43:01 -0700 (PDT), Maciej Sobczak wrote:

> Unless you prove that this had no influence on the results in question, I
> refute the whole of your mathematical proof.

Yep, when *mathematical* statistics is used, then the burden of showing it
applicable lies on the author's shoulders. In particular, the software
metric used for some SW design must be shown to be a random variable. The
elementary outcomes presented. Their independence explained etc.

> But more seriously, it would be interesting to flip languages every year
> and see whether the advantages of Ada hold in terms of better results. Or
> run two classes in parallel (with random assignments of students to each
> class) and compare results over the years.

I doubt the process of random selection of a developing team from a pool of
"indistinguishable" teams were a good basis for a statistical model of
software development.

The bottom line: statistic were applied incorrectly and the results
obtained carry no prediction power.

(Nonetheless, we know that Ada is a better language, anyway...)

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-03-26 15:01   ` Jean François Martinez
@ 2015-03-26 17:45     ` Jeffrey Carter
  0 siblings, 0 replies; 94+ messages in thread
From: Jeffrey Carter @ 2015-03-26 17:45 UTC (permalink / raw)


On 03/26/2015 08:01 AM, Jean François Martinez wrote:
> On Thursday, March 26, 2015 at 2:43:02 PM UTC+1, Maciej Sobczak wrote:
> 
>> At some point in time programming was introduced as a subject in the
>> schools that precede university. You know, just as today some people are
>> trying to push programming to nursery schools. The level (and therefore
>> age) when programming is first introduced becomes lower and lower with time
>> and this means that at the university level, as years go by, you can expect
>> students to have higher and higher initial programming experience. Years
>> ago a typical student entered university with zero programming experience
>> and now this initial experience is significantly higher, thus allowing
>> students to be more successful with exercises if the exercises do not
>> change. And since such changes in the scholar system are introduced in a
>> step-wise manner, you can also expect that at some particular time in the
>> past the batch of new students had a step-wise more experience at the
>> entry.
>> 
> 
> Good objection.  It would be valid between, say, people having entered
> university in 1985 and people entering in 2015. But from one year to another
> success rate went from 0 (zero) to 50%.  Do you _really_ think in one year of
> programming in Visual Basic in high school, that's s one year where they also
> had to attend to a number of disciplines unrelated to programming, made such
> a difference?

As I understand it, the course did not replace C with Ada; it allowed Ada as an
alternative to C. Some teams continued to use C while others used Ada. The last
I heard, no team ever completed the project in C, even after teams succeeded
with Ada. This invalidates Sobczak's point.

-- 
Jeff Carter
"Since I strongly believe that overpopulation is by
far the greatest problem in the world, this [Soylent
Green] would be my only message movie."
Charleton Heston
123


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-03-26 15:21   ` Dmitry A. Kazakov
@ 2015-03-27 11:25     ` Jean François Martinez
  2015-03-27 17:36       ` Dmitry A. Kazakov
  2015-03-30 10:48       ` jm.tarrasa
  0 siblings, 2 replies; 94+ messages in thread
From: Jean François Martinez @ 2015-03-27 11:25 UTC (permalink / raw)

On Thursday, March 26, 2015 at 4:21:45 PM UTC+1, Dmitry A. Kazakov wrote:
> On Thu, 26 Mar 2015 06:43:01 -0700 (PDT), Maciej Sobczak wrote:
> 
> > Unless you prove that this had no influence on the results in question, I
> > refute the whole of your mathematical proof.
> 
> Yep, when *mathematical* statistics is used, then the burden of showing it
> applicable lies on the author's shoulders. In particular, the software
> metric used for some SW design must be shown to be a random variable. The
> elementary outcomes presented. Their independence explained etc.
> 
> > But more seriously, it would be interesting to flip languages every year
> > and see whether the advantages of Ada hold in terms of better results. Or
> > run two classes in parallel (with random assignments of students to each
> > class) and compare results over the years.
> 
> I doubt the process of random selection of a developing team from a pool of
> "indistinguishable" teams were a good basis for a statistical model of
> software development.
> 

> The bottom line: statistic were applied incorrectly and the results
> obtained carry no prediction power.
> 

Really?  Text books on statistics are choke a full of similar examples and exercises like comparing the proportion of defective products at manufacting plant A and B then checking the null hypothesis before deciding that A's products are better.   In our case we have a random variable and that is group 
quality.  This depends on quality of students but also on group dynamics.

And it is common in this field to assume taht your population is a sample
of an infinitely-sized conceptual population.  Anyway I submitted the mathematical aspects to a professional statistician and he saw no flaw to my
mathodology.  In fact he told me that similar methodologoies were used for 
ensuring fairness at the competitive exam for French engineer schools (ie avoiding a candidate being rejected because he had the misfortune of 
getting a tough examinator).

Then we leave the matermatical aspect of statistics and enter in the world of 
applied statistics like in the studies published by your friendly
bureau of statistics ,or whatever it is named in Germany or Russia, where after having established that the difference is significative ie rejecting the null hypothesis the statistician has to apply knowledge of the subject investigated (economics, scholar system, biology) and consider if there are other factors that could have influenced.  That is why for instance I was careful to note that students were of the same university:  if C students had been of University of Northern Iowa and Ada students from MIT, Standford or similar universities whose students have far higher SAT scores then that factor could have been the cause. At this point a statistician would do further studies in order to isolate its influence.  I admit there are points I did not consider (Ada students being of newer genetrations that had got pre-university experience) and others I did not discuss because investigating them would need to bother professor McCormick. 

> (Nonetheless, we know that Ada is a better language, anyway...)
> 

It is not that we know.  It is.  :-) But there are people who would be unmoved even if we could present them the kind of study I dream of  with several universities, alternating C and Ada in order to elminate the generation effect, first and last year students in order to eiiminate the slow learnng curve and so on 

Regards

Jean-François Martinez

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-03-27 11:25     ` Jean François Martinez
@ 2015-03-27 17:36       ` Dmitry A. Kazakov
  2015-03-30 10:31         ` Jean François Martinez
  2015-03-30 11:36         ` Jean François Martinez
  2015-03-30 10:48       ` jm.tarrasa
  1 sibling, 2 replies; 94+ messages in thread
From: Dmitry A. Kazakov @ 2015-03-27 17:36 UTC (permalink / raw)

On Fri, 27 Mar 2015 04:25:21 -0700 (PDT), Jean François Martinez wrote:

> On Thursday, March 26, 2015 at 4:21:45 PM UTC+1, Dmitry A. Kazakov wrote:
>> On Thu, 26 Mar 2015 06:43:01 -0700 (PDT), Maciej Sobczak wrote:
>> 
>>> Unless you prove that this had no influence on the results in question, I
>>> refute the whole of your mathematical proof.
>> 
>> Yep, when *mathematical* statistics is used, then the burden of showing it
>> applicable lies on the author's shoulders. In particular, the software
>> metric used for some SW design must be shown to be a random variable. The
>> elementary outcomes presented. Their independence explained etc.
>> 
>>> But more seriously, it would be interesting to flip languages every year
>>> and see whether the advantages of Ada hold in terms of better results. Or
>>> run two classes in parallel (with random assignments of students to each
>>> class) and compare results over the years.
>> 
>> I doubt the process of random selection of a developing team from a pool of
>> "indistinguishable" teams were a good basis for a statistical model of
>> software development.
>> 
>> The bottom line: statistic were applied incorrectly and the results
>> obtained carry no prediction power.
> 
> Really?  Text books on statistics are choke a full of similar examples and
> exercises like comparing the proportion of defective products at
> manufacting plant A and B then checking the null hypothesis before
> deciding that A's products are better.

It is not similar. A production process is a physical one of which we know
that the behavior is stochastic because the laws of physics are (e.g. laws
of thermodynamics are statistical).

A developing team is not governed by these laws, because decision making is
not random and the system as a whole has memory (people learn in process).

> In our case we have a random
> variable and that is group 
> quality.

How is this random? A given group has given quality. It is nowhere random:
if you measure the quality repeatedly you will get same quality.

> And it is common in this field to assume taht your population is a sample
> of an infinitely-sized conceptual population.

Which is a way different model. If you randomly select a team, all
randomness is not in the team but in the selection process. For this to be
correct you should show that selected teams are equivalent in terms of
given SW metrics. It would be quite difficult to do because students have
different exposures to programming, everything is changing and you cannot
"recycle" teams.

Now, let you managed to do this. What would be the conclusion? You would
have shown that a randomly selected team of uneducated pupils produce
better SW metrics with Ada than with C when faced given educational
objective. So what? Why should this imply anything for SW processes as
performed by professionals?

Again, we know Ada is better for that. But this study would not show any
casualty or statistically proven correlation between both. It is nothing
more than a "by-the-way" argument from the statistical POV.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-03-27 17:36       ` Dmitry A. Kazakov
@ 2015-03-30 10:31         ` Jean François Martinez
  2015-03-30 11:52           ` Dmitry A. Kazakov
  2015-03-30 11:36         ` Jean François Martinez
  1 sibling, 1 reply; 94+ messages in thread
From: Jean François Martinez @ 2015-03-30 10:31 UTC (permalink / raw)

On Friday, March 27, 2015 at 6:36:51 PM UTC+1, Dmitry A. Kazakov wrote:
> On Fri, 27 Mar 2015 04:25:21 -0700 (PDT), Jean François Martinez wrote:
> 
> > On Thursday, March 26, 2015 at 4:21:45 PM UTC+1, Dmitry A. Kazakov wrote:
> >> On Thu, 26 Mar 2015 06:43:01 -0700 (PDT), Maciej Sobczak wrote:
>
> > Really?  Text books on statistics are choke a full of similar examples and
> > exercises like comparing the proportion of defective products at
> > manufacting plant A and B then checking the null hypothesis before
> > deciding that A's products are better.
> 
> It is not similar. A production process is a physical one of which we know
> that the behavior is stochastic because the laws of physics are (e.g. laws
> of thermodynamics are statistical).
> 

I think you have an incorrect vision of probability;  "Probability
is a measure of our ignorance: the flipping of a coin is not random.  It is
just that we are unable to control our gesture";  from a book on probabilities I read when I was young.   

On the other hand if that night someone had not phoned your parents they would have gone to bed five minutes earlier, it would have been a different spematozoid and you would be Natalia Kazakova the pianist.  

For the output of a manufacturing plant it is subject on fluctuations for
such motives like operator lacking sleep because his baby is 
growing teeth. 

> A developing team is not governed by these laws, because decision making is
> not random and the system as a whole has memory (people learn in process).
> 
> > In our case we have a random
> > variable and that is group 
> > quality.
> 
> How is this random? A given group has given quality. It is nowhere random:
> if you measure the quality repeatedly you will get same quality.
> 
> > And it is common in this field to assume taht your population is a sample
> > of an infinitely-sized conceptual population.
> 

Are students of identical quality?  And all groups have the same synergy or lack of it?  If one of this answer is no.  Than we can apply probabilities on 
this problem just like we do for the actually non random problem of coin flipping.

> Which is a way different model. If you randomly select a team, all
> randomness is not in the team but in the selection process. For this to be
> correct you should show that selected teams are equivalent in terms of
> given SW metrics. It would be quite difficult to do because students have
> different exposures to programming, everything is changing and you cannot
> "recycle" teams.
> 

See above for coin flipping and apply it tpo groups.   We don't control the factors involved in success or failure for programming groups.  Just like
for coin flipping.  And like for cpoçing flipping we can apply probabilities and confidence intervals to sets of groups.  

> So what? Why should this imply anything for SW processes as
> performed by professionals?

This is a valid objection.  One I had thought about while thinking about the
imlplications of Prpofessor McCormick's experience and forgot to include: it  only telle us about performance at a certain level of experience.   Of corse we
could tell to people who say "MLnaguages don't matter" that they didn't 
restrict the set to "programmers having graduated" but statisticiand don't
say that they tell "Futher study is necessary for progra

> 
> Again, we know Ada is better for that. But this study would not show any
> casualty or statistically proven correlation between both. It is nothing
> more than a "by-the-way" argument from the statistical POV.
> 
> -- 
> Regards,
> Dmitry A. Kazakov
> http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-03-27 11:25     ` Jean François Martinez
  2015-03-27 17:36       ` Dmitry A. Kazakov
@ 2015-03-30 10:48       ` jm.tarrasa
  1 sibling, 0 replies; 94+ messages in thread
From: jm.tarrasa @ 2015-03-30 10:48 UTC (permalink / raw)


Ceteris paribus

All the variables the same, except the one you want to check. That's the corner stone of experiments. In the real world that's not easy, particularly when it involves human beings.

This study Ada vs C seems quite fair. C is very bug prone and I can imagine students fighting with pointers bugs, array out of bounds etc.  Nevertheless there may be many factors that can be out of control. One of them may be the even unconscious teacher's attitude.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-03-27 17:36       ` Dmitry A. Kazakov
  2015-03-30 10:31         ` Jean François Martinez
@ 2015-03-30 11:36         ` Jean François Martinez
  1 sibling, 0 replies; 94+ messages in thread
From: Jean François Martinez @ 2015-03-30 11:36 UTC (permalink / raw)

On Friday, March 27, 2015 at 6:36:51 PM UTC+1, Dmitry A. Kazakov wrote:
> On Fri, 27 Mar 2015 04:25:21 -0700 (PDT), Jean François Martinez wrote:

> Why should this imply anything for SW processes as
> performed by professionals?

Sorry the text was submitted before it was complete.  

 While thinking about the implications of Prpofessor McCormick's experience I had thought about that objection but forgot to include: it  only telle us about performance at a certain level of experience.   Of course we
could tell to people who say "Languages don't matter" that since they didn't
restrict the set to "programmers having graduated"  we win and they
lose but statisticians don't say that.  They say "we don't have data for 
market-ready programmers.  Further study is necessary".   I agree that what matters is what happens when people get paying jobs and employers (and users)
have to cope with the consequences of programmers' output.  However I think
it would be wrong to restrict us to "experienced" programmers: if during
first year after university programmers are more productive with language A and during the rest of their carreers they have identical performance with A and B then A > B.  

A final note about randomness:  you can tell the color of window shutters is
decided by house owner and probabilities don't apply.  But stastiticians consider that there is some randomeness in the  population of a city so 500,001 green shutters and 499,999 red ones vs the opposite is not considered significative (within the confidence interval or, if you prefer one guy who moved to that city) while 200,000 vs 800,000 vs the opposite points to people having different cultural habits (well outside the confidence interval).   

Since the discussion is becoming philosophical and I fear we end boring the others to death I suggest we continue it in private.

Regards.

Jean-François Martinez

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-03-30 10:31         ` Jean François Martinez
@ 2015-03-30 11:52           ` Dmitry A. Kazakov
  2015-03-30 12:32             ` G.B.
  0 siblings, 1 reply; 94+ messages in thread
From: Dmitry A. Kazakov @ 2015-03-30 11:52 UTC (permalink / raw)

On Mon, 30 Mar 2015 03:31:07 -0700 (PDT), Jean François Martinez wrote:

> I think you have an incorrect vision of probability;  "Probability
> is a measure of our ignorance: the flipping of a coin is not random.  It is
> just that we are unable to control our gesture";  from a book on
> probabilities I read when I was young.   

Your book related so-called hidden variable theory. It is considered dead
wrong for about a century.

http://en.wikipedia.org/wiki/Hidden_variable_theory

> Are students of identical quality?  And all groups have the same synergy
> or lack of it?  If one of this answer is no.  Than we can apply
> probabilities on this problem just like we do for the actually non random
> problem of coin flipping.

Bernoulli trial is applicable to coin tossing because there are elementary
outcomes known to be independent.

That was the fist question I asked, what are elementary event in the
experiment of SW developing. How are they independent etc. You cannot apply
mathematical statistic without probability axioms satisfied. Until there is
no convincing explanation of how this could be done in the case of SW
development, all this is nothing but pseudo-scientific numerology.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-03-30 11:52           ` Dmitry A. Kazakov
@ 2015-03-30 12:32             ` G.B.
  2015-03-30 13:48               ` Dmitry A. Kazakov
  0 siblings, 1 reply; 94+ messages in thread
From: G.B. @ 2015-03-30 12:32 UTC (permalink / raw)


On 30.03.15 13:52, Dmitry A. Kazakov wrote:
> Your book related so-called hidden variable theory. It is considered dead
> wrong for about a century.

God is dead?

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-03-30 12:32             ` G.B.
@ 2015-03-30 13:48               ` Dmitry A. Kazakov
  2015-03-30 15:47                 ` G.B.
  0 siblings, 1 reply; 94+ messages in thread
From: Dmitry A. Kazakov @ 2015-03-30 13:48 UTC (permalink / raw)


On Mon, 30 Mar 2015 14:32:08 +0200, G.B. wrote:

> On 30.03.15 13:52, Dmitry A. Kazakov wrote:
>> Your book related so-called hidden variable theory. It is considered dead
>> wrong for about a century.
> 
> God is dead?

Nope, he keep on playing dice... (:-))

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-03-30 13:48               ` Dmitry A. Kazakov
@ 2015-03-30 15:47                 ` G.B.
  2015-03-30 16:05                   ` Dmitry A. Kazakov
  0 siblings, 1 reply; 94+ messages in thread
From: G.B. @ 2015-03-30 15:47 UTC (permalink / raw)


On 30.03.15 15:48, Dmitry A. Kazakov wrote:
> On Mon, 30 Mar 2015 14:32:08 +0200, G.B. wrote:
>
>> On 30.03.15 13:52, Dmitry A. Kazakov wrote:
>>> Your book related so-called hidden variable theory. It is considered dead
>>> wrong for about a century.
>>
>> God is dead?
>
> Nope, he keep on playing dice... (:-))

I wonder if there are any higher order dice?

(Of course, the question needs to be asked in a language
as expressive as Ada 202X!)

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-03-30 15:47                 ` G.B.
@ 2015-03-30 16:05                   ` Dmitry A. Kazakov
  2015-04-02 12:59                     ` brbarkstrom
  0 siblings, 1 reply; 94+ messages in thread
From: Dmitry A. Kazakov @ 2015-03-30 16:05 UTC (permalink / raw)


On Mon, 30 Mar 2015 17:47:32 +0200, G.B. wrote:

> On 30.03.15 15:48, Dmitry A. Kazakov wrote:
>> On Mon, 30 Mar 2015 14:32:08 +0200, G.B. wrote:
>>
>>> On 30.03.15 13:52, Dmitry A. Kazakov wrote:
>>>> Your book related so-called hidden variable theory. It is considered dead
>>>> wrong for about a century.
>>>
>>> God is dead?
>>
>> Nope, he keep on playing dice... (:-))
> 
> I wonder if there are any higher order dice?

It depends on how much constructive in mathematical sense you want to be.
There is no reason to suggest that the diagonal proof would not work for
sets of all realizations, but you might be unable to construct such
objects.

> (Of course, the question needs to be asked in a language
> as expressive as Ada 202X!)

It would be incomputable anyway. So, God does not speak Ada.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-03-30 16:05                   ` Dmitry A. Kazakov
@ 2015-04-02 12:59                     ` brbarkstrom
  2015-04-02 13:35                       ` Dmitry A. Kazakov
                                         ` (2 more replies)
  0 siblings, 3 replies; 94+ messages in thread
From: brbarkstrom @ 2015-04-02 12:59 UTC (permalink / raw)

On Monday, March 30, 2015 at 12:05:37 PM UTC-4, Dmitry A. Kazakov wrote:
> On Mon, 30 Mar 2015 17:47:32 +0200, G.B. wrote:
> 
> > On 30.03.15 15:48, Dmitry A. Kazakov wrote:
> >> On Mon, 30 Mar 2015 14:32:08 +0200, G.B. wrote:
> >>
> >>> On 30.03.15 13:52, Dmitry A. Kazakov wrote:
> >>>> Your book related so-called hidden variable theory. It is considered dead
> >>>> wrong for about a century.
> >>>
> >>> God is dead?
> >>
> >> Nope, he keep on playing dice... (:-))
> > 
> > I wonder if there are any higher order dice?
> 
> It depends on how much constructive in mathematical sense you want to be.
> There is no reason to suggest that the diagonal proof would not work for
> sets of all realizations, but you might be unable to construct such
> objects.
> 
> > (Of course, the question needs to be asked in a language
> > as expressive as Ada 202X!)
> 
> It would be incomputable anyway. So, God does not speak Ada.
> 
> -- 
> Regards,
> Dmitry A. Kazakov
> http://www.dmitry-kazakov.de

To change the subject a bit, is there any possibility of using
the probabilistic approach to estimating the long-term cost of
maintenance given a history of errors with a particular language?
I'm thinking here of the statistical record of revisions to TeX
that Knuth maintained [see Knuth, D. E., 1999: Digital Typography,
Center for the Study of Language and Information, Stanford University,
pp. 655-662.]

It seems reasonable to suppose that there are two costs to long-term
maintenance once a particular project has completed development:
correcting errors and adding features.  Part of the cost is due to
having maintainers climb the learning curve.

So, do we have a way of developing a model for the cost of dealing
with errors that would show an advantage to Ada in long-term maintenance,
particularly for long-term information preservation?  I suppose it would
be helpful to have the AdaCore history of error discovery and correction
to compare with Knuth's documentation.

Bruce B.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-02 12:59                     ` brbarkstrom
@ 2015-04-02 13:35                       ` Dmitry A. Kazakov
  2015-04-02 14:48                         ` jm.tarrasa
                                           ` (2 more replies)
  2015-04-02 18:24                       ` Niklas Holsti
  2015-04-02 18:43                       ` Jeffrey Carter
  2 siblings, 3 replies; 94+ messages in thread
From: Dmitry A. Kazakov @ 2015-04-02 13:35 UTC (permalink / raw)

On Thu, 2 Apr 2015 05:59:03 -0700 (PDT), brbarkstrom@gmail.com wrote:

> To change the subject a bit, is there any possibility of using
> the probabilistic approach to estimating the long-term cost of
> maintenance given a history of errors with a particular language?

I don't think so. Errors do not occur, they are present or introduced. The
only probabilistic model could be random selection of a given error from
the pool of errors. This is not related to the language choice. All
difference here is in the pools, since errors made in C are different from
errors made in Ada. You change the language, you get *other* errors, no
statistics across to compare.

Or taking another path, considering "probabilities" of same error made in C
and Ada. Again, there would be no statistics of same errors, as each error
is singular and either present or not. Thus the model of "same" error, must
be a random choice of a program from a pool of programs verifying if this
error is in there. This is methodically questionable, as there is no metric
to measure "sameness" of errors in different programs. Furthermore, this
requires the programs in the pools being equivalent in the sense that
"same" errors would have same chances to be present. But errors do not
occur, as I said. The reasons for errors be made are non-probabilistic. So
you have a problem of having the pools of "equivalent" programs and even
bigger problem of having "equivalent" pools of Ada and C programs.

> I'm thinking here of the statistical record of revisions to TeX
> that Knuth maintained [see Knuth, D. E., 1999: Digital Typography,
> Center for the Study of Language and Information, Stanford University,
> pp. 655-662.]
> 
> It seems reasonable to suppose that there are two costs to long-term
> maintenance once a particular project has completed development:
> correcting errors and adding features.  Part of the cost is due to
> having maintainers climb the learning curve.

It does not fit either of the models, notwithstanding how questionable
those are.

Apples and oranges all the way.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-02 13:35                       ` Dmitry A. Kazakov
@ 2015-04-02 14:48                         ` jm.tarrasa
  2015-04-02 15:55                           ` brbarkstrom
  2015-04-02 16:41                           ` Dmitry A. Kazakov
  2015-04-02 15:58                         ` Jean François Martinez
  2015-04-02 17:17                         ` G.B.
  2 siblings, 2 replies; 94+ messages in thread
From: jm.tarrasa @ 2015-04-02 14:48 UTC (permalink / raw)


El jueves, 2 de abril de 2015, 15:36:02 (UTC+2), Dmitry A. Kazakov  escribió:
> On Thu, 2 Apr 2015 05:59:03 -0700 (PDT), brbarkstrom@gmail.com wrote:
> 
> > To change the subject a bit, is there any possibility of using
> > the probabilistic approach to estimating the long-term cost of
> > maintenance given a history of errors with a particular language?
> 
> since errors made in C are different from
> errors made in Ada. You change the language, you get *other* errors, no
> statistics across to compare.
> 
> Or taking another path, considering "probabilities" of same error made in C
> and Ada. Again,
> ...
> Apples and oranges all the way.

You can create a criteria valid for any error or language

A) Severity
 1) Data lost
 2) Crash
 3) ...
 4) Inconvenience
B) Frequency
 1) each day
 2) One a week
 3)...
C) Time needed to fix it (hours)

Score= ((Severity * K1) * (Frequency * K2)) + Time * K3

Or whatever formula you want to come with. Calculate the errors of several similar projects and you have an statistic.

Obviously the trick is in "similar projects". There are many factors, experience of programmers, experience of management, the domain of the project, the difficulty of the project.....

In fact there are a lot of  off-the-shelf packages (accounting, antivirus...), probably many of them programmed in the same language, and the quality ranges from crap to very good. That's a good clue: Language is not the main factor.


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-02 14:48                         ` jm.tarrasa
@ 2015-04-02 15:55                           ` brbarkstrom
  2015-04-02 16:21                             ` Jean François Martinez
  2015-04-02 16:48                             ` Dmitry A. Kazakov
  2015-04-02 16:41                           ` Dmitry A. Kazakov
  1 sibling, 2 replies; 94+ messages in thread
From: brbarkstrom @ 2015-04-02 15:55 UTC (permalink / raw)


> > > To change the subject a bit, is there any possibility of using
> > > the probabilistic approach to estimating the long-term cost of
> > > maintenance given a history of errors with a particular language?
> > 
> > since errors made in C are different from
> > errors made in Ada. You change the language, you get *other* errors, no
> > statistics across to compare.
> > 
> > Or taking another path, considering "probabilities" of same error made in C
> > and Ada. Again,
> > ...
> > Apples and oranges all the way.
> 
> You can create a criteria valid for any error or language
> 
> A) Severity
>  1) Data lost
>  2) Crash
>  3) ...
>  4) Inconvenience
> B) Frequency
>  1) each day
>  2) One a week
>  3)...
> C) Time needed to fix it (hours)
> 
> Score= ((Severity * K1) * (Frequency * K2)) + Time * K3
> 
> Or whatever formula you want to come with. Calculate the errors of several similar projects and you have an statistic.
> 
> Obviously the trick is in "similar projects". There are many factors, experience of programmers, experience of management, the domain of the project, the difficulty of the project.....
> 
> In fact there are a lot of  off-the-shelf packages (accounting, antivirus...), probably many of them programmed in the same language, and the quality ranges from crap to very good. That's a good clue: Language is not the main factor.

Both of these responses are interesting, although I think there's a
need for deeper exploration.

For example, consider a stream whose bed contains flecks of gold.  A given
area of undisturbed bed certainly has a fixed set of flecks.  Thus, if we
put a screen with fine enough mesh across the stream bed, each mesh element
either contains a gold fleck or not, as Dr. Kazakov postulates about errors.
Now, we consider a mining operation that scoops up stream bed and puts the
material into a sluice that collects the flecks.  The mining company wants
to know which of two sluice designs is more efficient.  There are many random
variables that could influence the fleck fraction: volume of scoop, area
of stream bed covered, as well as the sluice design.  The question is whether
the variations in design are detectable in the presence of the other 
variabilities.  I expect that our colleagues in electrical engineering
and in the newer enthusiasm for uncertainty propagation have some experience
dealing with this kind of problem.

As to whether language is the main factor in determining quality, I'm curious
as to whether there's empirical evidence about its relative importance in
the cost.  jm.ta's response suggests the issue is cut and dried.  Perhaps
he's correct - and language choice is a negligible factor.  However, a
quantitative answer needs more evidence.

Bruce B.


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-02 13:35                       ` Dmitry A. Kazakov
  2015-04-02 14:48                         ` jm.tarrasa
@ 2015-04-02 15:58                         ` Jean François Martinez
  2015-04-02 16:39                           ` Dmitry A. Kazakov
  2015-04-02 17:17                         ` G.B.
  2 siblings, 1 reply; 94+ messages in thread
From: Jean François Martinez @ 2015-04-02 15:58 UTC (permalink / raw)

On Thursday, April 2, 2015 at 3:36:02 PM UTC+2, Dmitry A. Kazakov wrote:

> 
> > To change the subject a bit, is there any possibility of using
> > the probabilistic approach to estimating the long-term cost of
> > maintenance given a history of errors with a particular language?
> 
> I don't think so. Errors do not occur, they are present or introduced. The
> only probabilistic model could be random selection of a given error from
> the pool of errors. This is not related to the language choice. All
> difference here is in the pools, since errors made in C are different from
> errors made in Ada. You change the language, you get *other* errors, no
> statistics across to compare.
> 

Really?  Errors occur because of progarmmer quality, physical shape (had a bad night), distracting events, use of alcohol and drugs and plenty, plenty of factors.   One of them being that a bird has just landed  in front of your window.  Random.  

Then you take a lot of programmers or more exactly two lots, calculate averages, standard deviations and apply the appropriate test against that difference.  Then that test tells you if difference is too small to discard the null hypothesis (same rate of success for both groups) or far too large for it holding water.  Ij the later case you try to understand why.

BTW.  Since you brought this silliness: defect rate in manufacturing plants is not governed by thermodynamics but by factors such as equipment and state of teh work force.   On Mondays (use of alcohol during week-end) and Friday (tiredness) you are going to find a significantly higher rate of defects and of work accidents.  Significantly like far beyind what could be expected from just random noise.

Jean François Martinez

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-02 15:55                           ` brbarkstrom
@ 2015-04-02 16:21                             ` Jean François Martinez
  2015-04-02 16:48                             ` Dmitry A. Kazakov
  1 sibling, 0 replies; 94+ messages in thread
From: Jean François Martinez @ 2015-04-02 16:21 UTC (permalink / raw)

On Thursday, April 2, 2015 at 5:55:54 PM UTC+2, brbar...@gmail.com wrote:

> 
> As to whether language is the main factor in determining quality, I'm curious
> as to whether there's empirical evidence about its relative importance in
> the cost.  jm.ta's response suggests the issue is cut and dried.  Perhaps
> he's correct - and language choice is a negligible factor.  However, a
> quantitative answer needs more evidence.
> 

In theory it should be possible to take N languages, M problems for each of these N*M combos have T teams of programmers and then apply staitrical methodology on the N*M*T output.  Just that the cost would be astronomical.  Consider the example I analyzed.  72 teams of 4 four stduents.  Even if we consider that every team spent only one man/month (a week of work per student) that would mean six man/years ie well over 300,000 dollars.  Plus housing and infrastructure.  

For jm.ta I find his reasonning questionnable.  

To the bravado "Languages don't matter only programmers" I prefer this sentence from the fighter pilot community: "A good pilot on bad plane will beat a bad pilot on a good plane but if you and your opponent are good but he has a better plane you are toast".

Conclusions:
-Use the better language.  That is Ada.
-Fighter pilots are smarter than programmers.  :-)
Disclaimer:  I am not a fighter pilot.  :-)

Jean-François Martinez

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-02 15:58                         ` Jean François Martinez
@ 2015-04-02 16:39                           ` Dmitry A. Kazakov
  2015-04-03  9:46                             ` Jean François Martinez
  0 siblings, 1 reply; 94+ messages in thread
From: Dmitry A. Kazakov @ 2015-04-02 16:39 UTC (permalink / raw)


On Thu, 2 Apr 2015 08:58:41 -0700 (PDT), Jean François Martinez wrote:

> On Thursday, April 2, 2015 at 3:36:02 PM UTC+2, Dmitry A. Kazakov wrote:
> 
>>> To change the subject a bit, is there any possibility of using
>>> the probabilistic approach to estimating the long-term cost of
>>> maintenance given a history of errors with a particular language?
>> 
>> I don't think so. Errors do not occur, they are present or introduced. The
>> only probabilistic model could be random selection of a given error from
>> the pool of errors. This is not related to the language choice. All
>> difference here is in the pools, since errors made in C are different from
>> errors made in Ada. You change the language, you get *other* errors, no
>> statistics across to compare.
> 
> Really?  Errors occur because of progarmmer quality, physical shape (had a
> bad night), distracting events, use of alcohol and drugs and plenty,
> plenty of factors.   One of them being that a bird has just landed  in
> front of your window.  Random.  

You are describing the process of programming. This is another model, which
is certainly not random. If that were random, the same programmer asked to
design the same program several times would randomly introduce any possible
errors. That is not the case. People have memory, they learn, decision
making process is nowhere random. In general, only typo errors could be
considered random. Anything above simple motoric functions is not random.
 
-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-02 14:48                         ` jm.tarrasa
  2015-04-02 15:55                           ` brbarkstrom
@ 2015-04-02 16:41                           ` Dmitry A. Kazakov
  2015-04-04 10:02                             ` jm.tarrasa
  1 sibling, 1 reply; 94+ messages in thread
From: Dmitry A. Kazakov @ 2015-04-02 16:41 UTC (permalink / raw)


On Thu, 2 Apr 2015 07:48:45 -0700 (PDT), jm.tarrasa@gmail.com wrote:

> El jueves, 2 de abril de 2015, 15:36:02 (UTC+2), Dmitry A. Kazakov  escribió:
>> On Thu, 2 Apr 2015 05:59:03 -0700 (PDT), brbarkstrom@gmail.com wrote:
>> 
>>> To change the subject a bit, is there any possibility of using
>>> the probabilistic approach to estimating the long-term cost of
>>> maintenance given a history of errors with a particular language?
>> 
>> since errors made in C are different from
>> errors made in Ada. You change the language, you get *other* errors, no
>> statistics across to compare.
>> 
>> Or taking another path, considering "probabilities" of same error made in C
>> and Ada. Again,
>> ...
>> Apples and oranges all the way.
> 
> You can create a criteria valid for any error or language
> 
> A) Severity
>  1) Data lost
>  2) Crash
>  3) ...
>  4) Inconvenience
> B) Frequency
>  1) each day
>  2) One a week
>  3)...
> C) Time needed to fix it (hours)
> 
> Score= ((Severity * K1) * (Frequency * K2)) + Time * K3
> 
> Or whatever formula you want to come with. Calculate the errors of several
> similar projects and you have an statistic.

Throwing together coefficients does not make it statistic. In fact,
whatever algebraic function of non-random variables you take the result is
still non-random. This is a trivial fact. So coefficients and
multiplications change nothing.

You must start with showing that "severity", "frequency" etc are random
variables. There is a clear definition of what a random variable is. It is
a measurable function from the set of elementary outcomes. The first step
is presenting the set.
 
-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-02 15:55                           ` brbarkstrom
  2015-04-02 16:21                             ` Jean François Martinez
@ 2015-04-02 16:48                             ` Dmitry A. Kazakov
  1 sibling, 0 replies; 94+ messages in thread
From: Dmitry A. Kazakov @ 2015-04-02 16:48 UTC (permalink / raw)

On Thu, 2 Apr 2015 08:55:52 -0700 (PDT), brbarkstrom@gmail.com wrote:

> As to whether language is the main factor in determining quality, I'm curious
> as to whether there's empirical evidence about its relative importance in
> the cost.

One could think out some empirical model of SW process with all incoming
parameters, the language included. The SW metrics taken as the output. A
next step would be to verify this model against the reality. The problem is
that inputs as outputs are barely measurable. So I don't think one could
come up with anything more convincing than infamous climate models, but who
knows. In any case what was described is a mockery of statistic. It is not
uncommon, there are tons of publications of that sort, even in peer
reviewed journals, but still.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-02 13:35                       ` Dmitry A. Kazakov
  2015-04-02 14:48                         ` jm.tarrasa
  2015-04-02 15:58                         ` Jean François Martinez
@ 2015-04-02 17:17                         ` G.B.
  2015-04-02 19:09                           ` Dmitry A. Kazakov
  2 siblings, 1 reply; 94+ messages in thread
From: G.B. @ 2015-04-02 17:17 UTC (permalink / raw)

On 02.04.15 15:35, Dmitry A. Kazakov wrote:
> You change the language, you get*other*  errors, no
> statistics across to compare.

No doubt there is some overlap in expressive means
of C and Ada.

Then, after a close look at CVEs, one might consider
testing a prediction that says:

   There will be fewer CVEs once int overflows are
   detected!

Take a C library, write it in C-Ada, i.e. not in idiomatic Ada,
and make sure your compiler does not by default turn off Ada.(*)
Then see when happens when throwing data at the result.
Does the number of vulnerabilities due to int overflows
decrease?

__
(*) Do not run GNAT with default switches.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-02 12:59                     ` brbarkstrom
  2015-04-02 13:35                       ` Dmitry A. Kazakov
@ 2015-04-02 18:24                       ` Niklas Holsti
  2015-04-02 18:43                       ` Jeffrey Carter
  2 siblings, 0 replies; 94+ messages in thread
From: Niklas Holsti @ 2015-04-02 18:24 UTC (permalink / raw)


On 15-04-02 15:59 , brbarkstrom@gmail.com wrote:
> On Monday, March 30, 2015 at 12:05:37 PM UTC-4, Dmitry A. Kazakov wrote:
>> On Mon, 30 Mar 2015 17:47:32 +0200, G.B. wrote:
>>
>>> On 30.03.15 15:48, Dmitry A. Kazakov wrote:
>>>> On Mon, 30 Mar 2015 14:32:08 +0200, G.B. wrote:
>>>>
>>>>> On 30.03.15 13:52, Dmitry A. Kazakov wrote:
>>>>>> Your book related so-called hidden variable theory. It is considered dead
>>>>>> wrong for about a century.
>>>>>
>>>>> God is dead?
>>>>
>>>> Nope, he keep on playing dice... (:-))
>>>
>>> I wonder if there are any higher order dice?
>>
>> It depends on how much constructive in mathematical sense you want to be.
>> There is no reason to suggest that the diagonal proof would not work for
>> sets of all realizations, but you might be unable to construct such
>> objects.
>>
>>> (Of course, the question needs to be asked in a language
>>> as expressive as Ada 202X!)
>>
>> It would be incomputable anyway. So, God does not speak Ada.
>>
>> --
>> Regards,
>> Dmitry A. Kazakov
>> http://www.dmitry-kazakov.de
>
> To change the subject a bit, is there any possibility of using
> the probabilistic approach to estimating the long-term cost of
> maintenance given a history of errors with a particular language?
> I'm thinking here of the statistical record of revisions to TeX
> that Knuth maintained [see Knuth, D. E., 1999: Digital Typography,
> Center for the Study of Language and Information, Stanford University,
> pp. 655-662.]

Something like the Zeigler study at Rational, or has it already been 
discussed in this thread?

http://archive.adaic.com/intro/ada-vs-c/cada_art.html

-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-02 12:59                     ` brbarkstrom
  2015-04-02 13:35                       ` Dmitry A. Kazakov
  2015-04-02 18:24                       ` Niklas Holsti
@ 2015-04-02 18:43                       ` Jeffrey Carter
  2 siblings, 0 replies; 94+ messages in thread
From: Jeffrey Carter @ 2015-04-02 18:43 UTC (permalink / raw)


On 04/02/2015 05:59 AM, brbarkstrom@gmail.com wrote:
> 
> To change the subject a bit, is there any possibility of using
> the probabilistic approach to estimating the long-term cost of
> maintenance given a history of errors with a particular language?
> I'm thinking here of the statistical record of revisions to TeX
> that Knuth maintained [see Knuth, D. E., 1999: Digital Typography,
> Center for the Study of Language and Information, Stanford University,
> pp. 655-662.]
> 
> It seems reasonable to suppose that there are two costs to long-term
> maintenance once a particular project has completed development:
> correcting errors and adding features.  Part of the cost is due to
> having maintainers climb the learning curve.
> 
> So, do we have a way of developing a model for the cost of dealing
> with errors that would show an advantage to Ada in long-term maintenance,
> particularly for long-term information preservation?  I suppose it would
> be helpful to have the AdaCore history of error discovery and correction
> to compare with Knuth's documentation.

The Rational study (http://archive.adaic.com/intro/ada-vs-c/cada_art.html)
compared developing an Ada compiler in Ada to the same project in C. It found ≈4
times as many post-delivery errors in the C compiler, and the average effort to
correct a C error was ≈10 times that to correct an Ada error.

-- 
Jeff Carter
"He didn't get that nose from playing ping-pong."
Never Give a Sucker an Even Break
110


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-02 17:17                         ` G.B.
@ 2015-04-02 19:09                           ` Dmitry A. Kazakov
  0 siblings, 0 replies; 94+ messages in thread
From: Dmitry A. Kazakov @ 2015-04-02 19:09 UTC (permalink / raw)


On Thu, 02 Apr 2015 19:17:02 +0200, G.B. wrote:

> On 02.04.15 15:35, Dmitry A. Kazakov wrote:
>> You change the language, you get*other*  errors, no
>> statistics across to compare.
> 
> No doubt there is some overlap in expressive means
> of C and Ada.

Yes, which is not a compliment for Ada, if you understand what I mean...

> Then, after a close look at CVEs, one might consider
> testing a prediction that says:
> 
>    There will be fewer CVEs once int overflows are
>    detected!
> 
> Take a C library, write it in C-Ada, i.e. not in idiomatic Ada,
> and make sure your compiler does not by default turn off Ada.(*)
> Then see when happens when throwing data at the result.
> Does the number of vulnerabilities due to int overflows
> decrease?

It does. The problem is not whether Ada is a better language, it is. The
problem how to quantify this fact. Statistics is no help, because of the
applicability issues. There certainly exist other approaches, but none of
them would stand much of scientific rigor, alas.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-03-25 15:19 ` Paul Rubin
@ 2015-04-03  0:50   ` robin.vowels
  2015-04-03  2:18     ` Jeffrey Carter
  0 siblings, 1 reply; 94+ messages in thread
From: robin.vowels @ 2015-04-03  0:50 UTC (permalink / raw)


On Thursday, March 26, 2015 at 2:19:06 AM UTC+11, Paul Rubin wrote:
> Jean François Martinez <d.nospam@gmail.com> writes:
> > So for six years students had to do it in C...
> > Then professor switched to Ada ... 
> > At this point the only thing we can say is "the difference between the 
> > results obtained by the Ada teams those of the teams is far too high for
> > "luck".   We have to look at the possible causes.   What changed? ...
> 
> One possibility is C has a longer learning curve but catches up in
> productivity once the programmer is used to it.  I don't particularly
> believe that theory, but the experiment would be more convincing if it
> involved teams of experienced programmers rather than students.

C has lower productivity than most other HLLs.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-03  0:50   ` robin.vowels
@ 2015-04-03  2:18     ` Jeffrey Carter
  2015-04-03 13:37       ` Bob Duff
  0 siblings, 1 reply; 94+ messages in thread
From: Jeffrey Carter @ 2015-04-03  2:18 UTC (permalink / raw)


On 04/02/2015 05:50 PM, robin.vowels@gmail.com wrote:
> 
> C has lower productivity than most other HLLs.

Not surprising, since C isn't a high-level language.

-- 
Jeff Carter
"He didn't get that nose from playing ping-pong."
Never Give a Sucker an Even Break
110


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-02 16:39                           ` Dmitry A. Kazakov
@ 2015-04-03  9:46                             ` Jean François Martinez
  2015-04-03 14:00                               ` Dmitry A. Kazakov
  0 siblings, 1 reply; 94+ messages in thread
From: Jean François Martinez @ 2015-04-03  9:46 UTC (permalink / raw)

On Thursday, April 2, 2015 at 6:39:11 PM UTC+2, Dmitry A. Kazakov wrote:
> On Thu, 2 Apr 2015 08:58:41 -0700 (PDT), Jean François Martinez wrote:
> 
> > On Thursday, April 2, 2015 at 3:36:02 PM UTC+2, Dmitry A. Kazakov wrote:
> > 
> >>> To change the subject a bit, is there any possibility of using
> >>> the probabilistic approach to estimating the long-term cost of
> >>> maintenance given a history of errors with a particular language?
> >> 
> >> I don't think so. Errors do not occur, they are present or introduced. The
> >> only probabilistic model could be random selection of a given error from
> >> the pool of errors. This is not related to the language choice. All
> >> difference here is in the pools, since errors made in C are different from
> >> errors made in Ada. You change the language, you get *other* errors, no
> >> statistics across to compare.
> > 
> > Really?  Errors occur because of progarmmer quality, physical shape (had a
> > bad night), distracting events, use of alcohol and drugs and plenty,
> > plenty of factors.   One of them being that a bird has just landed  in
> > front of your window.  Random.  
> 
> You are describing the process of programming. This is another model, which
> is certainly not random. If that were random, the same programmer asked to
> design the same program several times would randomly introduce any possible
> errors. That is not the case. People have memory, they learn, decision
> making process is nowhere random. In general, only typo errors could be
> considered random. Anything above simple motoric functions is not random.
>  
> -- 
> Regards,
> Dmitry A. Kazakov
> http://www.dmitry-kazakov.de

The problem is:  None of what you have said from the beginning is relevant
about how statitical tests work.

Let's say we have white balls and black balls.  The reasons they are white or black is definitely not random since it depends on the operator filling the machine with white or black paint.  :-).  Then from a conceptually infinite-sized urn  you take two samples at random.  The averages of the samples will differ from one another and from the expected value http://en.wikipedia.org/wiki/Expected_value (that is the propoortion of white balls in our urn) but usually not by that much and anyway you can calculate probability of the difference and probability of difference being above a certain threshold.   More exactly we approximate to a Student law or to a gaussian law depending on the size of the sample.  

Now we have two samples of programmers  and in each sample there are white balls (programmers who succeeded) and black balls (programmers who failed). 
Just like for balls being black or white due to deterministic reasons, they
may succeed or fail for deterministic reasons   We <em>don't care</em> and <em>don't care</em> why.  We make the <em>hypothesis</b>  these are samples of the same urn (ie from an homogeneous population so same proportion of white and black balls, of succeeding and failing programmers) calculate the averages and the probability we would have of getting a such difference or higher. Then we compare that probability with a threshold (usually 5% or 1%) and if it is superior we accept the hypothesis (two samples from the same urn) if it
is inferior we reject the hypothesis and assumme these are not two samples from the same urn but from different urns with one of them having a higher mix of white balls (succeeding programers).  In our case probability was inferior to one in one million so the null hypothesis (same urn) is rejected.

That is math, <em>it is correctly applied</em> and that is all what we can deduce with math.  To find the reasons of why one urn that happens to be labelled Ada has a higher proportion of white balls than the urn labelled C
requires applying field knowledge. 

Regards

Jean-François Martinez

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-03  2:18     ` Jeffrey Carter
@ 2015-04-03 13:37       ` Bob Duff
  2015-04-03 14:13         ` Dmitry A. Kazakov
                           ` (3 more replies)
  0 siblings, 4 replies; 94+ messages in thread
From: Bob Duff @ 2015-04-03 13:37 UTC (permalink / raw)

Jeffrey Carter <spam.jrcarter.not@spam.not.acm.org> writes:

> On 04/02/2015 05:50 PM, robin.vowels@gmail.com wrote:
>> 
>> C has lower productivity than most other HLLs.
>
> Not surprising, since C isn't a high-level language.

Assembly language: Semantics are defined in terms of the generated
machine code.

High-level language: Semantics are defined in terms of what the program
does.

Those are how I understand the terms, and I think how they are commonly
understood.  I can't think of any other useful definitions.
C is clearly a high-level language, by those definitions.

Portably assembly language: A contradiction in terms that is used by
some people to denigrate C, and by other people to praise it.  ;-)
No such thing.

- Bob

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-03  9:46                             ` Jean François Martinez
@ 2015-04-03 14:00                               ` Dmitry A. Kazakov
  2015-04-03 17:12                                 ` Jean François Martinez
  0 siblings, 1 reply; 94+ messages in thread
From: Dmitry A. Kazakov @ 2015-04-03 14:00 UTC (permalink / raw)


On Fri, 3 Apr 2015 02:46:51 -0700 (PDT), Jean François Martinez wrote:

> Now we have two samples of programmers  and in each sample there are white
> balls (programmers who succeeded) and black balls (programmers who failed). 

We are moving in circles. I already explained what is wrong with the model
of random selection of a team of programmers.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-03 13:37       ` Bob Duff
@ 2015-04-03 14:13         ` Dmitry A. Kazakov
  2015-04-03 17:34           ` Paul Rubin
  2015-04-03 16:17         ` J-P. Rosen
                           ` (2 subsequent siblings)
  3 siblings, 1 reply; 94+ messages in thread
From: Dmitry A. Kazakov @ 2015-04-03 14:13 UTC (permalink / raw)


On Fri, 03 Apr 2015 09:37:04 -0400, Bob Duff wrote:

> Jeffrey Carter <spam.jrcarter.not@spam.not.acm.org> writes:
> 
>> On 04/02/2015 05:50 PM, robin.vowels@gmail.com wrote:
>>> 
>>> C has lower productivity than most other HLLs.
>>
>> Not surprising, since C isn't a high-level language.
> 
> Assembly language: Semantics are defined in terms of the generated
> machine code.
> 
> High-level language: Semantics are defined in terms of what the program
> does.

I would say that is a domain-specific language. Remoteness from the machine
code does necessarily imply higher level. To me SQL or OpenGL are lower
level than C, though both are farther away from the machine and closer to
some domains. I think being higher level is related to the abstraction
capabilities.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-03 13:37       ` Bob Duff
  2015-04-03 14:13         ` Dmitry A. Kazakov
@ 2015-04-03 16:17         ` J-P. Rosen
  2015-04-03 17:33           ` Bob Duff
  2015-04-03 19:00         ` Georg Bauhaus
  2015-04-03 19:12         ` Jeffrey Carter
  3 siblings, 1 reply; 94+ messages in thread
From: J-P. Rosen @ 2015-04-03 16:17 UTC (permalink / raw)


Le 03/04/2015 15:37, Bob Duff a écrit :
> Assembly language: Semantics are defined in terms of the generated
> machine code.
OK

> High-level language: Semantics are defined in terms of what the program
> does.
I don't understand this. Do you mean "specification: observed behaviour
of a program" ? ;-)
> 
> Those are how I understand the terms, and I think how they are commonly
> understood.  I can't think of any other useful definitions.
> C is clearly a high-level language, by those definitions.
Not clearly. A number of things (handling of integer overflows,
semantics of %, dereferencing the null pointer...) are defined in the
language as "like the machine does" (this is spelled: implementation
defined). Unless it has changed in recent versions of C ?

-- 
J-P. Rosen
Adalog
2 rue du Docteur Lombard, 92441 Issy-les-Moulineaux CEDEX
Tel: +33 1 45 29 21 52, Fax: +33 1 45 29 25 00
http://www.adalog.fr

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-03 14:00                               ` Dmitry A. Kazakov
@ 2015-04-03 17:12                                 ` Jean François Martinez
  0 siblings, 0 replies; 94+ messages in thread
From: Jean François Martinez @ 2015-04-03 17:12 UTC (permalink / raw)

On Friday, April 3, 2015 at 4:02:25 PM UTC+2, Dmitry A. Kazakov wrote:
> On Fri, 3 Apr 2015 02:46:51 -0700 (PDT), Jean François Martinez wrote:
> 
> > Now we have two samples of programmers  and in each sample there are white
> > balls (programmers who succeeded) and black balls (programmers who failed). 
> 
> We are moving in circles. I already explained what is wrong with the model
> of random selection of a team of programmers.
>

You have explained nothing about statistical tests and still don't understand them.  

We assume an hypothesis, pit the real observation against the theroretical model and accept or reject the hypothesis.  Period.  If we reject it we analyze the factors and try to find why population A has a higher proportion of people producing a working program, have a microwave oven or why people living 
in Miami are more likely to have visited Disneyland than people living in Melbourne

In our case we could think that since the  course was an elective one,  "write first and think later" or "clever trick" programmers  flocked to the course during the C years and deserted it
during the Ada years.   One of such programmers in a group working on a complex 
project like this could wreak havoc.  This is definitely not random and would have to be investigated along with the hypothesis that the Ada programmers were initially similar but the contact with Ada not only the language but also the litterature (compare the Barnes books agaisnt the compendium of clver trickjs that is the K&R C book) turned them into serious, comment loving, cheap trick adverting, software engineer types.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-03 16:17         ` J-P. Rosen
@ 2015-04-03 17:33           ` Bob Duff
  2015-04-26 11:38             ` David Thompson
  0 siblings, 1 reply; 94+ messages in thread
From: Bob Duff @ 2015-04-03 17:33 UTC (permalink / raw)

"J-P. Rosen" <rosen@adalog.fr> writes:

> Le 03/04/2015 15:37, Bob Duff a écrit :
>> Assembly language: Semantics are defined in terms of the generated
>> machine code.
> OK
>
>> High-level language: Semantics are defined in terms of what the program
>> does.
> I don't understand this. Do you mean "specification: observed behaviour
> of a program" ? ;-)

Yes.  In an assembly language, if you write "ADD", the semantics are
that an ADD instruction is generated in the machine code.  But in a
high-level langauge, if you write "+", the language definition doesn't
say anything about what machine instruction(s) are emitted.  And in
practice, you might get a shift, or you might get no instructions
at all.  C is clearly in the second category.

>> Those are how I understand the terms, and I think how they are commonly
>> understood.  I can't think of any other useful definitions.
>> C is clearly a high-level language, by those definitions.
> Not clearly. A number of things (handling of integer overflows,
> semantics of %, dereferencing the null pointer...) are defined in the
> language as "like the machine does" (this is spelled: implementation
> defined). Unless it has changed in recent versions of C ?

No, no C standard has ever said "like the machine does".  Those things
are defined as "undefined behavior" in C, and that does not mean "like
the machine does", it means "anything can happen" (as far as the C
standard is concerned).  "Like the machine does" is not even meaningful
in a high-level language.  What the machine does depends on what code
the compiler generates.  For example, on x86, a C compiler might
generate an ADD (so you get wraparound semantics).  Or it might generate
ADD followed by TRAPV (so the program crashes on overflow).  Or it might
turn this:

    if (X + 1 <= 0)
        do_something();

into:

    do_something();

if it knows X >= 0 before the 'if'.  gcc actually does things
of that nature.  I've seen people get really annoyed at gcc for
that sort of thing, but I think their beef is with the language
definition -- they should use a language that defines the behavior
of overflow.

If it can prove X = the max integer, it can remove the entire if
statement.

- Bob

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-03 14:13         ` Dmitry A. Kazakov
@ 2015-04-03 17:34           ` Paul Rubin
  2015-04-03 19:34             ` Dmitry A. Kazakov
  2015-04-04  0:41             ` Dennis Lee Bieber
  0 siblings, 2 replies; 94+ messages in thread
From: Paul Rubin @ 2015-04-03 17:34 UTC (permalink / raw)

"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
> I would say that is a domain-specific language. Remoteness from the machine
> code does necessarily imply higher level. To me SQL or OpenGL are lower
> level than C, though both are farther away from the machine and closer to
> some domains. I think being higher level is related to the abstraction
> capabilities.

I thought of low level = programmer chooses the algorithms, high level =
computer chooses them.  So SQL is very high level (programmer writes a
select or join and the database figures out the query plan, and in some
cases the db might notice repeated queries and decide to add indexes all
by itself).  http://prog21.dadgum.com/166.html claims that C's highest
level feature is the switch statement, since it analyzes the set of
cases and might generate a linear sequence of comparisons, a jump table,
or a tree-like search.  The other C statements mostly generate
straightforward sequences of machine instructions modulo things like
register allocation.

I saw another paper proposing that the dividing line between low and
high level languages is that the high level ones have garbage
collection.  So that would mean Ada is low level while Java is high
level.  Java (not C) is probably Ada's main competition for big new
projects these days, and GC is probably Java's main attraction over Ada.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-03 13:37       ` Bob Duff
  2015-04-03 14:13         ` Dmitry A. Kazakov
  2015-04-03 16:17         ` J-P. Rosen
@ 2015-04-03 19:00         ` Georg Bauhaus
  2015-04-03 19:12         ` Jeffrey Carter
  3 siblings, 0 replies; 94+ messages in thread
From: Georg Bauhaus @ 2015-04-03 19:00 UTC (permalink / raw)


On 03.04.15 15:37, Bob Duff wrote:
> Portably assembly language: A contradiction in terms that is used by
> some people to denigrate C, and by other people to praise it.;-)
> No such thing.

Depends! ;-)

Robert B. K. Dewar; Anthony P. McCann (1979).
  MINIMAL - A Machine Independent Assembly Language.
  Computer Science Department Technical Report. No. 12.
  Courant Institute of Mathematical Sciences.


(That's the language in which Macro SPITBOL (a SNOBOL-4 compiler)
is written.)

(But C clearly isn't an assembly language, since, for example,
what is the equivalent of a compile-time checked parameter of
type "struct S" in assembly language?)



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-03 13:37       ` Bob Duff
                           ` (2 preceding siblings ...)
  2015-04-03 19:00         ` Georg Bauhaus
@ 2015-04-03 19:12         ` Jeffrey Carter
  2015-04-03 22:37           ` Bob Duff
  3 siblings, 1 reply; 94+ messages in thread
From: Jeffrey Carter @ 2015-04-03 19:12 UTC (permalink / raw)


On 04/03/2015 06:37 AM, Bob Duff wrote:
> Jeffrey Carter <spam.jrcarter.not@spam.not.acm.org> writes:
>>
>> Not surprising, since C isn't a high-level language.
> 
> Assembly language: Semantics are defined in terms of the generated
> machine code.
> 
> High-level language: Semantics are defined in terms of what the program
> does.
> 
> Those are how I understand the terms, and I think how they are commonly
> understood.  I can't think of any other useful definitions.
> C is clearly a high-level language, by those definitions.

Jones' analysis of languages, based on avg # of statements per function point,
divided languages into 3 levels with a pretty clear division between the levels:

Low-Level: assembly and C

Mid-Level: languages such as Pascal and FORTRAN

High-Level: languages such as Ada

-- 
Jeff Carter
"Gentlemen, you can't fight in here. This is the War Room!"
Dr. Strangelove
30


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-03 17:34           ` Paul Rubin
@ 2015-04-03 19:34             ` Dmitry A. Kazakov
  2015-04-03 19:58               ` Paul Rubin
  2015-04-04  0:41             ` Dennis Lee Bieber
  1 sibling, 1 reply; 94+ messages in thread
From: Dmitry A. Kazakov @ 2015-04-03 19:34 UTC (permalink / raw)


On Fri, 03 Apr 2015 10:34:44 -0700, Paul Rubin wrote:

> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
>> I would say that is a domain-specific language. Remoteness from the machine
>> code does necessarily imply higher level. To me SQL or OpenGL are lower
>> level than C, though both are farther away from the machine and closer to
>> some domains. I think being higher level is related to the abstraction
>> capabilities.
> 
> I thought of low level = programmer chooses the algorithms, high level =
> computer chooses them.

That would be imperative vs. declarative.

> So SQL is very high level (programmer writes a
> select or join and the database figures out the query plan, and in some
> cases the db might notice repeated queries and decide to add indexes all
> by itself).

SQL is low level in my view, because it does not support basic programming
abstractions, e.g. abstract data types.

> http://prog21.dadgum.com/166.html claims that C's highest
> level feature is the switch statement, since it analyzes the set of
> cases and might generate a linear sequence of comparisons, a jump table,
> or a tree-like search.  The other C statements mostly generate
> straightforward sequences of machine instructions modulo things like
> register allocation.

I don't buy the idea of closeness to the hardware. It would mean that the
language level would depend on the target machine. E.g. Lisp would be high
on x86 and low on a Lisp-machine.

> I saw another paper proposing that the dividing line between low and
> high level languages is that the high level ones have garbage
> collection.

Native support for any feature is not automatically high level. You could
have assembler with linear algebra matrix operations. It would be still
assembler. Exactly like SQL is low level, because relational algebra is no
better than MOV, ADD, MUL, CMP algebra of a traditional assembler. In fact
it is worse, because it is not Turing-complete.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-03 19:34             ` Dmitry A. Kazakov
@ 2015-04-03 19:58               ` Paul Rubin
  2015-04-04  6:59                 ` Dmitry A. Kazakov
  0 siblings, 1 reply; 94+ messages in thread
From: Paul Rubin @ 2015-04-03 19:58 UTC (permalink / raw)

"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
> SQL is low level in my view, because it does not support basic programming
> abstractions, e.g. abstract data types.

Is Lisp also low level in your view then?

>> http://prog21.dadgum.com/166.html claims that C's highest
>> level feature is the switch statement,
> I don't buy the idea of closeness to the hardware. It would mean that the
> language level would depend on the target machine. E.g. Lisp would be high
> on x86 and low on a Lisp-machine.

It's not about closeness to the hardware, it's about algorithms.
Searching for a value by iterating through a list of them might use
different instructions on x86 vs Lisp machine, but it's the same
algorithm on both.  But doing a binary search on the list is a different
algorithm.  And would you consider C's switch statement to be
declarative rather than imperative?  It could generate a linear search
or a binary search depending on what it decides will be faster code.
But most of us think of C as imprative.

>> [paper proposing that HLL's are those with garbage collection.]
>> Native support for any feature is not automatically high level. You could
> have assembler with linear algebra matrix operations. It would be still
> assembler.

GC is not like matrix operations.  It's not a library that you call.  It
pervades the operation of your entire program.

There's no universal definition of an HLL and I thought the GC
suggestion was interesting, though I'm not fully bought into it at the
moment.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-03 19:12         ` Jeffrey Carter
@ 2015-04-03 22:37           ` Bob Duff
  2015-04-03 23:38             ` Jeffrey Carter
  2015-04-04  0:56             ` Dennis Lee Bieber
  0 siblings, 2 replies; 94+ messages in thread
From: Bob Duff @ 2015-04-03 22:37 UTC (permalink / raw)

Jeffrey Carter <spam.jrcarter.not@spam.not.acm.org> writes:

> Jones' analysis of languages, based on avg # of statements per function point,
> divided languages into 3 levels with a pretty clear division between the levels:

I haven't read that, but...

> Low-Level: assembly and C
>
> Mid-Level: languages such as Pascal and FORTRAN
>
> High-Level: languages such as Ada

You can argue all day long about which high-level languages are higher
or lower level than which other ones.  I see a bright line between
assembly and high-level languages such as C, Pascal, Fortran, and Ada.
But I think it's better to compare high-level languages based on other
factors, rather than some ill-defined "level".

I have trouble believing the "statements per function point".
For assembly vs. C, there are lots of C statements that
typically compile into several-or-more machine instructions.
And I guess Pascal is about the same as C in that regard,
despite being rather less error prone.

I think putting "assembly and C" at the same "level" is absurd.
A C programmer doesn't do register allocation, for just one
example.  C, Pascal, and Fortran seem more-or-less the same
"level" to me.  And above that level, I think "level" ceases
to be meaningful.

- Bob

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-03 22:37           ` Bob Duff
@ 2015-04-03 23:38             ` Jeffrey Carter
  2015-04-04  0:15               ` Bob Duff
  2015-04-04  2:59               ` Paul Rubin
  2015-04-04  0:56             ` Dennis Lee Bieber
  1 sibling, 2 replies; 94+ messages in thread
From: Jeffrey Carter @ 2015-04-03 23:38 UTC (permalink / raw)


On 04/03/2015 03:37 PM, Bob Duff wrote:
> 
> I think putting "assembly and C" at the same "level" is absurd.
> A C programmer doesn't do register allocation, for just one
> example.  C, Pascal, and Fortran seem more-or-less the same
> "level" to me.  And above that level, I think "level" ceases
> to be meaningful.

I can't accept as high level any language that requires, for a subprogram to be
able to modify a parameter, that the parameter must be declared as a pointer and
the caller must explicitly pass a pointer. FORTRAN, Pascal, and Ada, to name a
few, don't. C does.

-- 
Jeff Carter
"My dear Mrs. Hemoglobin, when I first saw you, I
was so enamored with your beauty I ran to the basket,
jumped in, went down to the city, and bought myself a
wedding outfit."
Never Give a Sucker an Even Break
111

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-03 23:38             ` Jeffrey Carter
@ 2015-04-04  0:15               ` Bob Duff
  2015-04-04  7:06                 ` Dmitry A. Kazakov
  2015-04-04  2:59               ` Paul Rubin
  1 sibling, 1 reply; 94+ messages in thread
From: Bob Duff @ 2015-04-04  0:15 UTC (permalink / raw)


Jeffrey Carter <spam.jrcarter.not@spam.not.acm.org> writes:

> I can't accept as high level any language that requires, for a subprogram to be
> able to modify a parameter, that the parameter must be declared as a pointer and
> the caller must explicitly pass a pointer. FORTRAN, Pascal, and Ada, to name a
> few, don't. C does.

I'd prefer you to say, "I don't like a language that requires [above stuff
about output parameters]."
What's it got to do with "levels"?

- Bob


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-03 17:34           ` Paul Rubin
  2015-04-03 19:34             ` Dmitry A. Kazakov
@ 2015-04-04  0:41             ` Dennis Lee Bieber
  2015-04-04  3:05               ` Paul Rubin
  1 sibling, 1 reply; 94+ messages in thread
From: Dennis Lee Bieber @ 2015-04-04  0:41 UTC (permalink / raw)


On Fri, 03 Apr 2015 10:34:44 -0700, Paul Rubin <no.email@nospam.invalid>
declaimed the following:

>level.  Java (not C) is probably Ada's main competition for big new
>projects these days, and GC is probably Java's main attraction over Ada.

	Yet that is also the feature that tends to keep it from the realms I'm
currently working in... Avionics tend to use a scheme in which all heap
allocation is done during an initialization phase (pre-allocating queue
entries, for example), and NO memory allocation/freeing is performed once
the system transitions to "operational".
-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
    wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-03 22:37           ` Bob Duff
  2015-04-03 23:38             ` Jeffrey Carter
@ 2015-04-04  0:56             ` Dennis Lee Bieber
  1 sibling, 0 replies; 94+ messages in thread
From: Dennis Lee Bieber @ 2015-04-04  0:56 UTC (permalink / raw)


On Fri, 03 Apr 2015 18:37:00 -0400, Bob Duff <bobduff@theworld.com>
declaimed the following:

>Jeffrey Carter <spam.jrcarter.not@spam.not.acm.org> writes:
>I think putting "assembly and C" at the same "level" is absurd.
>A C programmer doesn't do register allocation, for just one
>example.  C, Pascal, and Fortran seem more-or-less the same
>"level" to me.  And above that level, I think "level" ceases
>to be meaningful.
>
	Yet C does support a "storage class specifier" attribute specifying
that a variable should (if possible) be allocated to a register.

	I've always felt C is somewhere between a high macro-capable assembly
language (Xerox Sigma Meta-Symbol -- even the native instruction set is
defined via an included set of macro definitions) and languages in the
FORTRAN/Pascal/Algol families. Mainly as the latter don't have, as part of
the language itself, a means of accessing any arbitrary memory address (no
casting of any integer value to act as a pointer). Ada may be a higher
level than those -- perversely by providing highly controlled syntax for
defining low-level machine access (use at xyz, etc.)

	I suppose, to me, the "level" of a language is inversely related to the
ease of creating unsafe/erroneous programs.

-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
    wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-03 23:38             ` Jeffrey Carter
  2015-04-04  0:15               ` Bob Duff
@ 2015-04-04  2:59               ` Paul Rubin
  1 sibling, 0 replies; 94+ messages in thread
From: Paul Rubin @ 2015-04-04  2:59 UTC (permalink / raw)

Jeffrey Carter <spam.jrcarter.not@spam.not.acm.org> writes:
> I can't accept as high level any language that requires, for a
> subprogram to be able to modify a parameter, that the parameter must
> be declared as a pointer and the caller must explicitly pass a
> pointer. FORTRAN, Pascal, and Ada, to name a few, don't. C does.

That's kind of a strange criterion.  C is call-by-value so it has no way
to modify parameters, though you can modify memory that's pointed to by
a parameter.  Haskell and Erlang have purely immutable data and no
pointers, so it's impossible to modify anything: does that make them
lower level than C?  

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-04  0:41             ` Dennis Lee Bieber
@ 2015-04-04  3:05               ` Paul Rubin
  2015-04-04 14:46                 ` Dennis Lee Bieber
  0 siblings, 1 reply; 94+ messages in thread
From: Paul Rubin @ 2015-04-04  3:05 UTC (permalink / raw)

Dennis Lee Bieber <wlfraed@ix.netcom.com> writes:
> 	Yet that is also the feature that tends to keep it from the realms I'm
> currently working in... Avionics tend to use a scheme in which all heap
> allocation is done during an initialization phase (pre-allocating queue
> entries, for example), and NO memory allocation/freeing is performed once
> the system transitions to "operational".

How big and complicated do those programs tend to be, if you don't mind
my asking?  Do they have a lot of complex decision paths?

I know that Galois.com does some critical systems (probably not
avionics) in Haskell, which allocates like crazy and is garbage
collected, but deals with complicated algorithms exceptionally well.  I
don't know what safeguards (other than testing) they use to ensure the
absence of memory leaks.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-03 19:58               ` Paul Rubin
@ 2015-04-04  6:59                 ` Dmitry A. Kazakov
  2015-04-06 21:12                   ` Paul Rubin
  0 siblings, 1 reply; 94+ messages in thread
From: Dmitry A. Kazakov @ 2015-04-04  6:59 UTC (permalink / raw)

On Fri, 03 Apr 2015 12:58:56 -0700, Paul Rubin wrote:

> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
>> SQL is low level in my view, because it does not support basic programming
>> abstractions, e.g. abstract data types.
> 
> Is Lisp also low level in your view then?

Classic Lisp definitely is.

>>> http://prog21.dadgum.com/166.html claims that C's highest
>>> level feature is the switch statement,
>> I don't buy the idea of closeness to the hardware. It would mean that the
>> language level would depend on the target machine. E.g. Lisp would be high
>> on x86 and low on a Lisp-machine.
> 
> It's not about closeness to the hardware, it's about algorithms.
> Searching for a value by iterating through a list of them might use
> different instructions on x86 vs Lisp machine, but it's the same
> algorithm on both.

No. There is no any algorithm when iterating is an atomic operation of the
language. And it is irrelevant what actions the hardware takes in order to
implement it, pushing electrons is not the language business. Consider an
analogue computer. It has hardware operations like solving differential
equations. What algorithm? None.

> But doing a binary search on the list is a different
> algorithm.

Yes and does not affect the level.

> And would you consider C's switch statement to be
> declarative rather than imperative?

It depends on the beholder. Your perception tells you if it were
declarative or imperative, there is no other difference. Reading the source
code you translate and interpret it into your brain. Anything that rings as
immediate action is marked as "imperative".

> It could generate a linear search
> or a binary search depending on what it decides will be faster code.

That is irrelevant.

> But most of us think of C as imprative.
> 
>>> [paper proposing that HLL's are those with garbage collection.]
>>> Native support for any feature is not automatically high level. You could
>> have assembler with linear algebra matrix operations. It would be still
>> assembler.
> 
> GC is not like matrix operations.  It's not a library that you call.  It
> pervades the operation of your entire program.

So is the floating-point unit on x86. You must save registers and restore
them when doing some unrelated operations. Does it make floating-point
multiplication high-level?

> There's no universal definition of an HLL and I thought the GC
> suggestion was interesting, though I'm not fully bought into it at the
> moment.

In my view GC is neither low nor high level. But the language abstractions
required to describe GC in the language terms would be probably high level
(if the language has an elaborated type system). So GC does not make C#
high level, though C# is. It only makes it bad language, because GC is a
bad idea.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-04  0:15               ` Bob Duff
@ 2015-04-04  7:06                 ` Dmitry A. Kazakov
  0 siblings, 0 replies; 94+ messages in thread
From: Dmitry A. Kazakov @ 2015-04-04  7:06 UTC (permalink / raw)


On Fri, 03 Apr 2015 20:15:07 -0400, Bob Duff wrote:

> Jeffrey Carter <spam.jrcarter.not@spam.not.acm.org> writes:
> 
>> I can't accept as high level any language that requires, for a subprogram to be
>> able to modify a parameter, that the parameter must be declared as a pointer and
>> the caller must explicitly pass a pointer. FORTRAN, Pascal, and Ada, to name a
>> few, don't. C does.
> 
> I'd prefer you to say, "I don't like a language that requires [above stuff
> about output parameters]."
> What's it got to do with "levels"?

It does. It breaks the abstraction of parameter passing.

Thus C is lower level than Ada but still higher than Assembler.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-02 16:41                           ` Dmitry A. Kazakov
@ 2015-04-04 10:02                             ` jm.tarrasa
  2015-04-04 11:16                               ` Dmitry A. Kazakov
  0 siblings, 1 reply; 94+ messages in thread
From: jm.tarrasa @ 2015-04-04 10:02 UTC (permalink / raw)

El jueves, 2 de abril de 2015, 18:41:18 (UTC+2), Dmitry A. Kazakov  escribió:
> On Thu, 2 Apr 2015 07:48:45 -0700 (PDT), j_...@gmail.com wrote:
> 

> You must start with showing that "severity", "frequency" etc are random
> variables. There is a clear definition of what a random variable is. It is
> a measurable function from the set of elementary outcomes. The first step
> is presenting the set.

What's the problem with severity and frequency?

They are measurable. 

Program "A" in a two-years period
had bug1 which crashed program one time per day, and took a week to fix it
had bug2 which destroyed data, one per month, and took two days to fix it
had bug3 which was annoying, three per day, and took a month to fix it

Program "B" in a two-years period
had bug1 which crashed program one time per hour, and took a month to fix it
had bug2 which destroyed data, one a month, and took a week to fix it

Apply some formula (discrete, continuous or a complex algorithm) to each bug, sum the values of each bug for each program. Call it "bug score".

The formula just joins the data to come out with a single and comparable value to overcome summing "apples and oranges". It is a very common practice to compare several cases with many variables. Chess programs uses such formulas to score a position, and they can be very complex and they are constantly evolving. Google scores pages with a formula that involves as lot of variables and comes out with a single value.

Of course the formula is highly debatable. You can play with the formula to change the results up and down. Favour one factor reduce the other. But the data measured and is there waiting for a better formula.

Probably studying bugzilla history of several programs would be a good starting. Although I think bugzilla doesn't have "frequency", people classifies as more critical an annoying bug that happens a lot of times to many people than a destroying data one that happens once every blue moon.

Nevertheless, I insist that there are many variables in a program more important than language. So you would need to find a sample of programs equal in everything but the programming language, so you can isolate the variable language. And that wouldn't be easy.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-04 10:02                             ` jm.tarrasa
@ 2015-04-04 11:16                               ` Dmitry A. Kazakov
  0 siblings, 0 replies; 94+ messages in thread
From: Dmitry A. Kazakov @ 2015-04-04 11:16 UTC (permalink / raw)


On Sat, 4 Apr 2015 03:02:52 -0700 (PDT), jm.tarrasa@gmail.com wrote:

> El jueves, 2 de abril de 2015, 18:41:18 (UTC+2), Dmitry A. Kazakov  escribió:
>> On Thu, 2 Apr 2015 07:48:45 -0700 (PDT), j_...@gmail.com wrote:
> 
>> You must start with showing that "severity", "frequency" etc are random
>> variables. There is a clear definition of what a random variable is. It is
>> a measurable function from the set of elementary outcomes. The first step
>> is presenting the set.
> 
> What's the problem with severity and frequency?
> 
> They are measurable. 
> 
> Program "A" in a two-years period
> had bug1 which crashed program one time per day, and took a week to fix it
> had bug2 which destroyed data, one per month, and took two days to fix it
> had bug3 which was annoying, three per day, and took a month to fix it
> 
> Program "B" in a two-years period
> had bug1 which crashed program one time per hour, and took a month to fix it
> had bug2 which destroyed data, one a month, and took a week to fix it

'Measurable' in the definition of random variable has little to do with
measurements, e.g. physical measurements. It means a that the sets returned
by the function (random variable is a function) are measurable:

   http://en.wikipedia.org/wiki/Measurable_function

BTW, program crash is certainly not a random variable in most cases. The
only case when crash would be random is when the input is random, e.g.

   program Random_Crash is
      X : Dice;
   begin
      X := Read_From_Hardware_Random_Generator;
      if X then
         Execute ("format C: /q");
      end if;
   end Random_Crash;

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-04  3:05               ` Paul Rubin
@ 2015-04-04 14:46                 ` Dennis Lee Bieber
  2015-04-04 15:41                   ` brbarkstrom
  2015-04-04 19:20                   ` Paul Rubin
  0 siblings, 2 replies; 94+ messages in thread
From: Dennis Lee Bieber @ 2015-04-04 14:46 UTC (permalink / raw)


On Fri, 03 Apr 2015 20:05:01 -0700, Paul Rubin <no.email@nospam.invalid>
declaimed the following:

>Dennis Lee Bieber <wlfraed@ix.netcom.com> writes:
>> 	Yet that is also the feature that tends to keep it from the realms I'm
>> currently working in... Avionics tend to use a scheme in which all heap
>> allocation is done during an initialization phase (pre-allocating queue
>> entries, for example), and NO memory allocation/freeing is performed once
>> the system transitions to "operational".
>
>How big and complicated do those programs tend to be, if you don't mind
>my asking?  Do they have a lot of complex decision paths?
>
	It's not decision paths so much as deterministic timing... If there is
a possibility of an operation triggering an arbitrary garbage collection
(even if it is nothing more than combining adjacent free block into a
single available block) then the operation is not permitted, on the simple
basis that it makes the timing uncertain.
-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
    wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-04 14:46                 ` Dennis Lee Bieber
@ 2015-04-04 15:41                   ` brbarkstrom
  2015-04-04 19:20                   ` Paul Rubin
  1 sibling, 0 replies; 94+ messages in thread
From: brbarkstrom @ 2015-04-04 15:41 UTC (permalink / raw)



I find myself disagreeing with Dr. Kazakov and agreeing more with
Dr. Martinez.  In the following scenario, I've tried to be careful
about separating the statistical issues from the information that
is needed to distinguish why two samples differ.  From that standpoint, 
a statistician can distinguish whether two sampled distributions are likely to come from the same development methodology without knowing anything about the languages, team quality, organizational structure, and the related items that
have consumed the discussion in much of this thread.

Consider a consortium of development organizations.  Some of them use methodology "A".  Another group uses methodology "C".  All members of
the consortium keep records of their total maintenance costs for ten
years.  The consortium members agree it might be useful to share these two pieces of information.  Accordingly, they build a list containing just three pieces of information for each member: name of the organization, the development methodology, and the ten year cost.
For example, the first four items in this list might contain the elements
	Company 1,	"A",		$10,000
	Company 2,	"A",		$25,000
	Company 3,	"C",		$50,000
	Company 4,	"C", 	        $5,000
and so on.

Company Q's CEO hires a statistician to sample this list.  The statistician doesn't want to read the whole list, so he arranges to sample it so that it creates two new lists.  The first of these contains the information only from companies that use method "A".  The second contains information only from 
companies that use method "C".

To assure that the samples selected from the original list are not
duplicated, the statistician selects the samples by starting with a random initial index.  He selects the line with that starting index.  It contains just tho three properties in the sample above.  If the methodology is "A", the information goes into the new list with just "A" information.  If the methodology is "C",it goes into another new list with just the "C" information.

The statistician's algorithm then lets an integer pseudorandom number generator select a new number and adds that to the index for selecting the next sample from the original list.  The algorithm treats the three fields from the new sample in the same way as the first sample.

This process continues until the statistician is satisfied he has enough samples.  [This will probably be somewhere in the vicinity of 20, although it might be more.]

The statistician now has two lists - selected at random from the original list.  At this point, he can apply standard statistical tests to the numerical values for cost to see if the distributions differ significantly.  If he wishes to treat the numerical values as integers, the standard test is the Chi-Square Test.  Because the number of samples with methodology "A" is likely to differ from the number using "C", the standard test is a two-sided Chi-Square.  If he wants to treat the numerical values as floating point (or real), then the standard test is the Kolmogorov-Smirnov test.  Again, there is a one-sided version (for testing whether the distribution is different from a known distribution) and a two-sided version.

Press, W. H., Flannery, B. P., Teukolsky, S. A., and Vetterling, W. T., 1986: Numerical Recipes: The Art of Scientific Computing, Cambridge University Press, Cambridge, UK, pp. 469-475

provides a very readable introduction to all four of these tests.
Note that their copyright is still enforced, so be careful about simply lifting the code from this source (or later editions).

Knuth, D. E., 1998: The Art of Computer Programming, Vol. 2/Seminumerical Algorithms, Third Edition, Addison-Wesley, Boston, MA, pp. 41-58

provides the full derivation of the K-S distribution, as well as other tests for whether a random number generator is producing reliably random output.  The K-S test is covered on pp. 48-55 (including the algorithm in full detail).

At the end of the testing, the statistician can report to the CEO on whether the distribution of ten-year costs for methodology "A" has a statistically significant difference from the distribution of costs with methodology "B".

Note that the conclusion says nothing about why the two methodologies differ.  If they are different, maybe it's because the companies using "A" are using a higher level language.  Maybe it's because the companies using "A" are more experienced or have "higher quality" developers.  Those attributes are irrelevant to the question the CEO asked of the statistician - which was "are these two methodologies drawn from significantly different distributions?".

In order to understand the causes of the differences, the statistician would need to undertake a much broader investigation with many more samples and request many more pieces of information from the members of the consortium.  Probably the most widely-known example of this kind of exploration is found in

Boehm, B. W., Abts, C., Brown, A. W., Chulant, S., Clark, B. K., Horowitz, E., Madachy, R., Reifer, D. and Steece, B., 2000: Software Cost Estimation with COCOMO II, Prentice Hall PTR, Upper Saddle River, NJ

Boehm has been conducting research in understanding why software development with different methodologies creates differences in the cost of the software that's delivered for about 30 years.  The cover of this book shows twenty one factors that he's parameterized as having an effect on cost - and the contents provide all the details needed to estimate the costs.  Unfortunately for the discussion we've been having, Boehm puts language selection into a parameter he calls LTEX, which is "a measure of the level of programming language and software experience of the project team developing the software system or subsystem." [Boehm, et al., pp. 48-49]

It seems clear that it is quite possible to make a random selection
of development approaches even though the methodology of each team is fixed and determinate at the time of the sampling.  The math of demonstrating the statistical significance of difference between two sample distributions is also quite clear and well-established.

Bruce B


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-04 14:46                 ` Dennis Lee Bieber
  2015-04-04 15:41                   ` brbarkstrom
@ 2015-04-04 19:20                   ` Paul Rubin
  2015-04-04 20:00                     ` Dmitry A. Kazakov
  2015-04-05 13:57                     ` Dennis Lee Bieber
  1 sibling, 2 replies; 94+ messages in thread
From: Paul Rubin @ 2015-04-04 19:20 UTC (permalink / raw)

Dennis Lee Bieber <wlfraed@ix.netcom.com> writes:
>>How big and complicated do those programs tend to be, if you don't mind
>>my asking?  Do they have a lot of complex decision paths?
> 	It's not decision paths so much as deterministic timing... If there is
> a possibility of an operation triggering an arbitrary garbage collection

Yes, I can understand why a realtime system can't use gc.  Writing
simple programs without gc isn't too difficult.  Writing complicated
programs without gc often means spending a lot of effort on manual
memory management and debugging memory errors.  So I'm wondering if the
programs you wrote that way were complicated and if yes, how you dealt
with this issue.

> (even if it is nothing more than combining adjacent free block into a
> single available block) then the operation is not permitted, on the simple
> basis that it makes the timing uncertain.

In this case you may also have to consider the non-deterministic effects
of cache misses, if your cpu has cache memory; similarly for pipeline
stalls, serial multipliers, etc.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-04 19:20                   ` Paul Rubin
@ 2015-04-04 20:00                     ` Dmitry A. Kazakov
  2015-04-04 20:44                       ` Paul Rubin
  2015-04-05 13:57                     ` Dennis Lee Bieber
  1 sibling, 1 reply; 94+ messages in thread
From: Dmitry A. Kazakov @ 2015-04-04 20:00 UTC (permalink / raw)

On Sat, 04 Apr 2015 12:20:39 -0700, Paul Rubin wrote:

> Dennis Lee Bieber <wlfraed@ix.netcom.com> writes:
>>>How big and complicated do those programs tend to be, if you don't mind
>>>my asking?  Do they have a lot of complex decision paths?
>> 	It's not decision paths so much as deterministic timing... If there is
>> a possibility of an operation triggering an arbitrary garbage collection
> 
> Yes, I can understand why a realtime system can't use gc.  Writing
> simple programs without gc isn't too difficult.  Writing complicated
> programs without gc often means spending a lot of effort on manual
> memory management and debugging memory errors.

It is not manual vs. automatic. In Ada memory management is automatic
except for some cases where the language design is flawed (e.g. task
components and some cases of construction/destruction of limited and
unconstrained types etc).

It is explicit vs. implicit collection of non-scoped objects.

> So I'm wondering if the
> programs you wrote that way were complicated and if yes, how you dealt
> with this issue.

GC cannot deal with these issues because it does not have information
necessary to determine implicit scope of the object. It is undecidable in
most non-trivial cases. Reachability as a criterion does not work for
complex structures, so management of scopes is required anyway by marking
inter-object references as weak and strong and maybe with shades in
between.

>> (even if it is nothing more than combining adjacent free block into a
>> single available block) then the operation is not permitted, on the simple
>> basis that it makes the timing uncertain.
> 
> In this case you may also have to consider the non-deterministic effects
> of cache misses, if your cpu has cache memory; similarly for pipeline
> stalls, serial multipliers, etc.

Indeterministic time is of minor importance compared to indeterministic
behavior in general, which means the program being non-testable and
unmaintainable. Since GC cannot be statically analyzed, because if it could
be, you would not need GC, having it non-testable eliminates all means of
quality assurance. End of story.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-04 20:00                     ` Dmitry A. Kazakov
@ 2015-04-04 20:44                       ` Paul Rubin
  2015-04-05  8:00                         ` Dmitry A. Kazakov
  0 siblings, 1 reply; 94+ messages in thread
From: Paul Rubin @ 2015-04-04 20:44 UTC (permalink / raw)

"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
> In Ada memory management is automatic

Do you mean something other than stack allocation and "new"?  For most
of us, automatic memory management means gc.

> Indeterministic time is of minor importance compared to
> indeterministic behavior in general, which means the program being
> non-testable and unmaintainable.  

Dennis said specifically, his program needs determinstic time.  Some
programs are like that.

> Since GC cannot be statically analyzed, because if it could be, you
> would not need GC,

That's a cool observation, thanks for it.

> having it non-testable eliminates all means of quality assurance. End
> of story.

I don't think that's right.  Being unable to statically analyze GC means
you can't prove your program will never crash from memory exhaustion.
In avionics, that may be intolerable and you better not use GC.  In
other areas (example: compilers), you may care more that your program
never computes a wrong result than that it never crash, so if after
reasonable testing it doesn't seem to have memory leaks, you are fine
and GC is great.  

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-04 20:44                       ` Paul Rubin
@ 2015-04-05  8:00                         ` Dmitry A. Kazakov
  2015-04-05  9:55                           ` Brian Drummond
  2015-04-06 17:07                           ` Paul Rubin
  0 siblings, 2 replies; 94+ messages in thread
From: Dmitry A. Kazakov @ 2015-04-05  8:00 UTC (permalink / raw)


On Sat, 04 Apr 2015 13:44:49 -0700, Paul Rubin wrote:

> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
>> In Ada memory management is automatic
> 
> Do you mean something other than stack allocation and "new"?

Stack

>> Indeterministic time is of minor importance compared to
>> indeterministic behavior in general, which means the program being
>> non-testable and unmaintainable.  
> 
> Dennis said specifically, his program needs determinstic time.  Some
> programs are like that.

He could run GC task at the lowest priority level. There would be hell of
priority inversion problems upon finalization, but GC does not work well
with finalization anyway. The latter is again the issue of
non-deterministic behavior in general, which is far more serious than
real-time arguments [*].

>> having it non-testable eliminates all means of quality assurance. End
>> of story.
> 
> I don't think that's right.  Being unable to statically analyze GC means
> you can't prove your program will never crash from memory exhaustion.
> In avionics, that may be intolerable and you better not use GC.  In
> other areas (example: compilers), you may care more that your program
> never computes a wrong result than that it never crash, so if after
> reasonable testing it doesn't seem to have memory leaks, you are fine
> and GC is great.

It is an especially interesting example because exactly GNAT or GCC, don't
know which of them, has a huge problem with memory consumption. I don't say
it is because of GC [**], but it is the problem you said irrelevant, which
makes it sometimes impossible to compile Ada program at a 32-bit system.

---------------
* Real-time arguments are kind of Godwin Law in programming.

**  Most compilers use arena pools for data structures. It would be insane
to deploy GC there, IMO.
 
-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-05  8:00                         ` Dmitry A. Kazakov
@ 2015-04-05  9:55                           ` Brian Drummond
  2015-04-06 21:27                             ` Randy Brukardt
  2015-04-06 17:07                           ` Paul Rubin
  1 sibling, 1 reply; 94+ messages in thread
From: Brian Drummond @ 2015-04-05  9:55 UTC (permalink / raw)

On Sun, 05 Apr 2015 10:00:46 +0200, Dmitry A. Kazakov wrote:

> On Sat, 04 Apr 2015 13:44:49 -0700, Paul Rubin wrote:
> 
>> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
>>> In Ada memory management is automatic
>> 
>> Do you mean something other than stack allocation and "new"?
> 
> Stack

Also storage pools, though you might call that a semi-automatic memory 
management system.

>>> having it non-testable eliminates all means of quality assurance. End
>>> of story.
>> 
>> I don't think that's right.  Being unable to statically analyze GC
>> means you can't prove your program will never crash from memory
>> exhaustion. In avionics, that may be intolerable and you better not use
>> GC.  In other areas (example: compilers), you may care more that your
>> program never computes a wrong result than that it never crash, so if
>> after reasonable testing it doesn't seem to have memory leaks, you are
>> fine and GC is great.
> 
> It is an especially interesting example because exactly GNAT or GCC,
> don't know which of them, has a huge problem with memory consumption. I
> don't say it is because of GC [**], but it is the problem you said
> irrelevant, which makes it sometimes impossible to compile Ada program
> at a 32-bit system.

I suspect gcc, having seen the same phenomenon with versions of ghdl that 
use gcc as an optimising back-end. Some machine-generated VHDL programs 
(like gate-level netlists) would compile and simulate easily with ghdl's 
own JIT compiler, but blow up in gcc's optimisation passes, unless these 
were turned off completely. We're talking about source files of a few 
million lines, that (in some optimisation passes) consume many GB and 
either crash (if you're lucky) or eat all your swap space (if you're not).

> **  Most compilers use arena pools for data structures. It would be
> insane to deploy GC there, IMO.

Not only does gcc use GC, it buries most of the GC source code in the 
object code directories, making it harder to find. (This code is auto-
generated by shell scripts and awk scripts and the like, trawling the 
actual source code for object definitions. It's a fruitful source of head-
scratching if, like me, you're not a gcc expert, and "grep variable_name" 
in the source tree returns a blank. Especially when the scripts aren't re-
run leaving the GC code out of sync with the source)

-- Brian.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-04 19:20                   ` Paul Rubin
  2015-04-04 20:00                     ` Dmitry A. Kazakov
@ 2015-04-05 13:57                     ` Dennis Lee Bieber
  1 sibling, 0 replies; 94+ messages in thread
From: Dennis Lee Bieber @ 2015-04-05 13:57 UTC (permalink / raw)


On Sat, 04 Apr 2015 12:20:39 -0700, Paul Rubin <no.email@nospam.invalid>
declaimed the following:

>Dennis Lee Bieber <wlfraed@ix.netcom.com> writes:
>>>How big and complicated do those programs tend to be, if you don't mind
>>>my asking?  Do they have a lot of complex decision paths?
>> 	It's not decision paths so much as deterministic timing... If there is
>> a possibility of an operation triggering an arbitrary garbage collection
>
>Yes, I can understand why a realtime system can't use gc.  Writing
>simple programs without gc isn't too difficult.  Writing complicated
>programs without gc often means spending a lot of effort on manual
>memory management and debugging memory errors.  So I'm wondering if the
>programs you wrote that way were complicated and if yes, how you dealt
>with this issue.
>
	At present, I'm only on the fringe of things -- my 30 years at Lockheed
were on non-realtime number crunchers that ran one or two days in advance
of operations, for purposes of scheduling resources. My 2 years at GE
Aviation has been more debug support -- though what I've encountered has
been mostly the executive/device-driver layers; the end application tends
to be provided by the client using the documented APIs of the executive.

>In this case you may also have to consider the non-deterministic effects
>of cache misses, if your cpu has cache memory; similarly for pipeline
>stalls, serial multipliers, etc.

	It likely gets verified as part of the flight certification process. I
do know that multicore processors aren't currently supported due to such
conflicts.
-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
    wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-05  8:00                         ` Dmitry A. Kazakov
  2015-04-05  9:55                           ` Brian Drummond
@ 2015-04-06 17:07                           ` Paul Rubin
  2015-04-06 17:41                             ` Dmitry A. Kazakov
  1 sibling, 1 reply; 94+ messages in thread
From: Paul Rubin @ 2015-04-06 17:07 UTC (permalink / raw)

"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
>> Dennis said specifically, his program needs determinstic time.  Some
>> programs are like that.
> He could run GC task at the lowest priority level.

I got the impression that there was no room for timing variation at all
in the program, and no idle time during which a GC could run.
E.g. something like a controller inside a jet engine, that has to do
something synchronized with the turbine spinning and can never be off by
even a microsecond.

> **  Most compilers use arena pools for data structures. It would be insane
> to deploy GC there, IMO.

Some compilers written in old, non-gc languages use arenas as a simple
way to get some of gc's benefits.  But compiler-writing is the
archetypal application of languages like Haskell, which rely on GC.
CompCert (compcert.inria.fr) could be seen from a certain perspective as
the world's only truly modern compiler.  It's written mostly in Coq,
which could be thought of as an Ocaml dialect with a fancier type
system.  Coq "compiles" by extracting Ocaml code from the Coq program
after type checking, so you can compile and run the extracted Ocaml
code.  The Coq code is far removed from anything like memory allocation
so it has to rely completely on gc.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-06 17:07                           ` Paul Rubin
@ 2015-04-06 17:41                             ` Dmitry A. Kazakov
  2015-04-06 18:35                               ` Paul Rubin
  0 siblings, 1 reply; 94+ messages in thread
From: Dmitry A. Kazakov @ 2015-04-06 17:41 UTC (permalink / raw)


On Mon, 06 Apr 2015 10:07:47 -0700, Paul Rubin wrote:

> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
>>> Dennis said specifically, his program needs determinstic time.  Some
>>> programs are like that.
>> He could run GC task at the lowest priority level.
> 
> I got the impression that there was no room for timing variation at all
> in the program, and no idle time during which a GC could run.

There is always idle time because it is the essence of any synchronization.

> E.g. something like a controller inside a jet engine, that has to do
> something synchronized with the turbine spinning and can never be off by
> even a microsecond.

I don't know anything about jet engine, but typical controllers run 1-10ms 
cycles. Things requiring shorter times (e.g. frequency measurements) are 
performed by dedicated microcontrollers.

>> **  Most compilers use arena pools for data structures. It would be insane
>> to deploy GC there, IMO.
> 
> Some compilers written in old, non-gc languages use arenas as a simple
> way to get some of gc's benefits.

No, the benefit of arena is unbeatable speed of allocation and zero 
deallocation time.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-06 17:41                             ` Dmitry A. Kazakov
@ 2015-04-06 18:35                               ` Paul Rubin
  2015-04-06 21:46                                 ` Randy Brukardt
  2015-04-07  0:36                                 ` Dennis Lee Bieber
  0 siblings, 2 replies; 94+ messages in thread
From: Paul Rubin @ 2015-04-06 18:35 UTC (permalink / raw)


"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
> I don't know anything about jet engine, but typical controllers run 1-10ms 
> cycles. Things requiring shorter times (e.g. frequency measurements) are 
> performed by dedicated microcontrollers.

I assumed that program was on a dedicated microcontroller.

> No, the benefit of arena is unbeatable speed of allocation and zero 
> deallocation time.

You get that from gc with region inference too, and you sort of get it
with semispace gc.  Unless the compiler is spending a lot of its time in
gc, arenas by definition can't be that big a win.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-04  6:59                 ` Dmitry A. Kazakov
@ 2015-04-06 21:12                   ` Paul Rubin
  2015-04-07  5:57                     ` Dmitry A. Kazakov
  0 siblings, 1 reply; 94+ messages in thread
From: Paul Rubin @ 2015-04-06 21:12 UTC (permalink / raw)

"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
>> GC is not like matrix operations.  It's not a library that you call.  It
>> pervades the operation of your entire program.
> So is the floating-point unit on x86. You must save registers and restore
> them when doing some unrelated operations. Does it make floating-point
> multiplication high-level?

That's different, the FPU save/compute/restore is completely localized
as long as you have a small amount of private memory to save the
registers.  You can call a floating point routine that saves the
registers, computes stuff, restores, and returns, and it doesn't have to
care what the rest of the program does or vice versa.

GC is different, it affects every part of the program that uses memory,
which is to say every part of the program.  Having precise GC requires
enforcing invariants across all of memory, and the absence of language
features (except maybe some special purpose unsafe fragment) that can
break those invariants.  All the code generated by the compiler has to
take the GC's requirements into account.  That's what I mean by
pervasive.

> GC does not make C# high level, though C# is. It only makes it bad
> language, because GC is a bad idea.

Heh.  There was a big increase in software development productivity in
the early 2000's, credited variously to the spread of OOP, static type
systems, agile development methodology, the conciseness of scripting
languages, and all sorts of other things.  But there's another view that
claims the real hero wasn't any of those things, it was GC.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-05  9:55                           ` Brian Drummond
@ 2015-04-06 21:27                             ` Randy Brukardt
  0 siblings, 0 replies; 94+ messages in thread
From: Randy Brukardt @ 2015-04-06 21:27 UTC (permalink / raw)

"Brian Drummond" <brian@shapes.demon.co.uk> wrote in message 
news:mfr0rb$s0v$1@dont-email.me...
> On Sun, 05 Apr 2015 10:00:46 +0200, Dmitry A. Kazakov wrote:
>
>> On Sat, 04 Apr 2015 13:44:49 -0700, Paul Rubin wrote:
>>
>>> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
>>>> In Ada memory management is automatic
>>>
>>> Do you mean something other than stack allocation and "new"?
>>
>> Stack
>
> Also storage pools, though you might call that a semi-automatic memory
> management system.

Especially true if you are using subpools.

You forgot containers. One of the most important purposes of the container 
packages is that they can take over the memory management (especially useful 
for indefinite types, like class-wide types). That means that one can build 
complex data structures but yet treat them for memory management purposes as 
if they are stack allocated.

For instance, an expression tree in a compiler can be made up of tagged 
nodes for the different kind of expression parts (operators, memberships, 
conditionals, allocators, etc.), and the tree can be stored in a tree 
container. Thus, no explicit memory management is needed to deal with tree 
temporaries and the like (and you really don't need any management for 
global trees [ones like default expressions and the like that get stored in 
the symboltable], as they never get freed until the end of the compilation).

Yes, the implementer has to create the memory management used to implement 
the containers, but the same is true for garbage collection. And the former 
is a lot easier to get right than the latter.

IMHO, the order of memory management preferability goes approximately as 
follows:

(1) Stack-based (in Ada, this includes dynamically-sized objects, such as an 
array whose length depends on a parameter; this probably isn't implemented 
with a traditional stack, but the compiler will do the management).

(2) Container-based. (discussed above).

(3) Finalization-based. (Using a controlled wrapper of some sort; deallocate 
on finalize. Smart pointers fall into this category. The only reason that 
these fall below (2) is that these are user-defined, meaning there is a 
higher possibility of errors. [Implementers make mistakes, but there are 
more people using there stuff so they don't last as long.])

(4) Subpool-based. (Sometimes called "arenas"; this is a semi-automatic 
mechanism in that all items of a group are freed together.)

(5) Manual allocate/deallocate (with or without storage pools).

Garbage collection probably helps with (5) [few programmers are good enough 
to do that correctly all of the time]; there might be an argument that it is 
better than (4) [I doubt it, because the overhead is so much worse, but at 
least that's open to debate.] Whether GC is better than (3) depends on the 
quality of one's programmers; if they know enough to use existing libraries 
rather than rolling their own (or at least copy from existing designs), then 
(3) is a clear win over GC. GC might be an advantage to a naive team, but 
I'd rather they stay far away from any software. ;-)

GC has nothing to offer for (1) and (2), and in some cases might even be 
harmful (by covering up bugs, as pointers to stuff that ought to be dead 
still exist and appear to work).

Anyone that thinks that Garbage Collection is something required hasn't 
thought enough about how memory management can be done better. Indeed, it 
seems to foster terrible programming practices because it's "easy"; for 
example, Firefox doesn't seem to release memory until a tab is closed, then 
not then until a decent amount of time afterwards. It has no problem going 
catotonic from running out of memory (and the reason it hangs is that the GC 
thrashes).

                                   Randy.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-06 18:35                               ` Paul Rubin
@ 2015-04-06 21:46                                 ` Randy Brukardt
  2015-04-06 22:12                                   ` Paul Rubin
  2015-04-07  0:36                                 ` Dennis Lee Bieber
  1 sibling, 1 reply; 94+ messages in thread
From: Randy Brukardt @ 2015-04-06 21:46 UTC (permalink / raw)

"Paul Rubin" <no.email@nospam.invalid> wrote in message 
news:87lhi5ayuv.fsf@jester.gateway.sonic.net...
> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
>> I don't know anything about jet engine, but typical controllers run 
>> 1-10ms
>> cycles. Things requiring shorter times (e.g. frequency measurements) are
>> performed by dedicated microcontrollers.
>
> I assumed that program was on a dedicated microcontroller.
>
>> No, the benefit of arena is unbeatable speed of allocation and zero
>> deallocation time.
>
> You get that from gc with region inference too, and you sort of get it
> with semispace gc.  Unless the compiler is spending a lot of its time in
> gc, arenas by definition can't be that big a win.

Unless the compiler was designed by trained apes, GC isn't going to be much 
of a win, either. Janus/Ada (the compiler I've been working on for the last 
34+ years [gosh!]) does almost no deallocation of anything. Back in the old 
days, we used to deallocate a few things (needed to for compiling on 
MS-DOS), but even that proved to be too expensive and thus the compile never 
deallocates anything at this point unless there is an extreme memory 
shortage (and I've never seen a case where that even worked). GC overhead 
(and the corresponding allocation overhead) would be a lot more than the 
combination of free chaining and static allocation that we ended up with. 
(And yes, we verified that with profiling; I would never have believed it 
otherwise.) Moreover, the vast majority of memory used in the compiler is 
never freed anyway (it's in global data like the symbol table), until the 
end of the compiler pass. Any overhead to manage that is completely wasted.

                                     Randy.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-06 21:46                                 ` Randy Brukardt
@ 2015-04-06 22:12                                   ` Paul Rubin
  2015-04-06 23:40                                     ` Jeffrey Carter
  2015-04-07 19:07                                     ` Randy Brukardt
  0 siblings, 2 replies; 94+ messages in thread
From: Paul Rubin @ 2015-04-06 22:12 UTC (permalink / raw)

"Randy Brukardt" <randy@rrsoftware.com> writes:
> GC overhead (and the corresponding allocation overhead) would be a lot
> more than the combination of free chaining and static allocation ...

GC doesn't use that much of the runtime of typical programs in GC'd
languages, and the cpu overhead can be made quite small with
generational gc's.  There is no allocation overhead (just bump a
pointer) and no overhead of freeing garbage.  There is some overhead
from tracing and copying the recently created live data (minor
collections, the amount of data touched is usually small) and occasional
more expensive major collections (they also trace older data).  This
does have a cost in memory footprint but the cpu load isn't that big.

http://cowlark.com/2014-04-27-ada/index.html says:

  Personally, I think that the original Ada designers wanted a garbage
  collector --- a relaxed-rules Ada with closures, inline maps, the
  ability to have range<> arrays in structures, variable-length strings
  etc would be totally epic. Unfortunately not only does such a thing
  not exist, but I have yet to find an implementation that even has a
  garbage collector. Go figure.

which I thought was interesting.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-06 22:12                                   ` Paul Rubin
@ 2015-04-06 23:40                                     ` Jeffrey Carter
  2015-04-07 19:07                                     ` Randy Brukardt
  1 sibling, 0 replies; 94+ messages in thread
From: Jeffrey Carter @ 2015-04-06 23:40 UTC (permalink / raw)

On 04/06/2015 03:12 PM, Paul Rubin wrote:
> 
> http://cowlark.com/2014-04-27-ada/index.html says:
> 
>   Personally, I think that the original Ada designers wanted a garbage
>   collector --- a relaxed-rules Ada with closures, inline maps, the
>   ability to have range<> arrays in structures, variable-length strings
>   etc would be totally epic. Unfortunately not only does such a thing
>   not exist, but I have yet to find an implementation that even has a
>   garbage collector. Go figure.
> 
> which I thought was interesting.

The ARM has always been carefully written so as to allow GC without requiring
it, so Ichbiah probably expected compilers with GC to come about. On the other
hand, the Alsys compiler didn't have GC, which might be a better indicator of
what Ichbiah thought Ada should be.

Someone on here named Brukardt has said that he thinks Ichbiah meant for variant
records to be implemented the way Janus/Ada does, not the way GNAT does (I don't
remember how the Alsys compiler implemented this).

In the 1980s some people said they thought Ichbiah intended for compilers to
automatically detect passive tasks and optimize them accordingly, and there were
papers published in /Ada Letters/ on ways to do this, but this all fell by the
wayside when the Ada-9X effort created a completely new construct called
protected types that had some features passive tasks don't (functions) and some
unnecessary restrictions on what they can do (which GNAT ignores by default). No
compiler ever automatically detected passive tasks, including the Alsys
compiler, which again might be the best indicator of what was intended.

In other words, almost everyone who wants something that Ada could do, but that
usually isn't implemented, thinks Ichbiah intended it. Probably what Ichbiah
intended is in the Ada-83 ARM, including all the optional features and the
varying ways to implement others.

-- 
Jeff Carter
"Every sperm is sacred."
Monty Python's the Meaning of Life
55

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-06 18:35                               ` Paul Rubin
  2015-04-06 21:46                                 ` Randy Brukardt
@ 2015-04-07  0:36                                 ` Dennis Lee Bieber
  1 sibling, 0 replies; 94+ messages in thread
From: Dennis Lee Bieber @ 2015-04-07  0:36 UTC (permalink / raw)


On Mon, 06 Apr 2015 11:35:52 -0700, Paul Rubin <no.email@nospam.invalid>
declaimed the following:

>"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
>> I don't know anything about jet engine, but typical controllers run 1-10ms 
>> cycles. Things requiring shorter times (e.g. frequency measurements) are 
>> performed by dedicated microcontrollers.
>
>I assumed that program was on a dedicated microcontroller.
>

	Well... If it means anything, and if I'm not getting too specific into
proprietary stuff... I'm currently studying the 68040 manual, and the other
system appears to be some variant of PowerPC chip.

	At least we aren't flying mil-std 1750a...
-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
    wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-06 21:12                   ` Paul Rubin
@ 2015-04-07  5:57                     ` Dmitry A. Kazakov
  2015-04-08  4:12                       ` Paul Rubin
  0 siblings, 1 reply; 94+ messages in thread
From: Dmitry A. Kazakov @ 2015-04-07  5:57 UTC (permalink / raw)


On Mon, 06 Apr 2015 14:12:08 -0700, Paul Rubin wrote:

> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
>>> GC is not like matrix operations.  It's not a library that you call.  It
>>> pervades the operation of your entire program.
>> So is the floating-point unit on x86. You must save registers and restore
>> them when doing some unrelated operations. Does it make floating-point
>> multiplication high-level?
> 
> That's different, the FPU save/compute/restore is completely localized
> as long as you have a small amount of private memory to save the
> registers.  You can call a floating point routine that saves the
> registers, computes stuff, restores, and returns, and it doesn't have to
> care what the rest of the program does or vice versa.

Except that you must call this subprogram each and every place.

> GC is different, it affects every part of the program that uses memory,
> which is to say every part of the program.  Having precise GC requires
> enforcing invariants across all of memory, and the absence of language
> features (except maybe some special purpose unsafe fragment) that can
> break those invariants.  All the code generated by the compiler has to
> take the GC's requirements into account.  That's what I mean by
> pervasive.

Saving FPU registers, any registers, as a matter of fact, is as much
pervasive. How is it high level?

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-06 22:12                                   ` Paul Rubin
  2015-04-06 23:40                                     ` Jeffrey Carter
@ 2015-04-07 19:07                                     ` Randy Brukardt
  2015-04-08  3:53                                       ` Paul Rubin
  1 sibling, 1 reply; 94+ messages in thread
From: Randy Brukardt @ 2015-04-07 19:07 UTC (permalink / raw)

"Paul Rubin" <no.email@nospam.invalid> wrote in message 
news:87oan0aote.fsf@jester.gateway.sonic.net...
> "Randy Brukardt" <randy@rrsoftware.com> writes:
>> GC overhead (and the corresponding allocation overhead) would be a lot
>> more than the combination of free chaining and static allocation ...
>
> GC doesn't use that much of the runtime of typical programs

Given that there is no such thing as a "typical program" when it comes to 
memory management, that's a meaningless statement. The memory management 
patterns in a compiler, for instance, are wildly different than in a web 
server. (Having written both, I have some practical experience there. :-) 
The real-time systems that Ada customers build are very different from both.

> in GC'd languages, and the cpu overhead can be made quite small with
> generational gc's.

And this isn't remotely the major expense, so that's just a case of 
optimizing the wrong thing.

> There is no allocation overhead (just bump a pointer)

And this is bogus. Either items are collected in place, which means that you 
have to allocate from fragmented memory (a lot of more expensive than 
"bumping a pointer" - that was the cause of the issues we had with our 
compiler back in the day), or they're compacted somehow. And compaction is 
very expensive, simply because copying things around for no reason is 
expensive. (Ada 2012 has to add access-in-place operations to the Ada 
Containers specifically to eliminate that overhead.)

> and no overhead of freeing garbage.  There is some overhead
> from tracing and copying the recently created live data (minor
> collections, the amount of data touched is usually small) and occasional
> more expensive major collections (they also trace older data).  This
> does have a cost in memory footprint but the cpu load isn't that big.

And this is what is complete fantasy. Being able to trace objects requires 
keeping a lot of extra information at runtime. Keeping enough information to 
allow runtime debugging slows down our code 5-10% (it makes calls to small 
subprograms much more expensive). You'd need that and more to be able to do 
any sort of trace. And I'm dubious about the supposed cost of those traces. 
Our compiler uses a lot of pointers; most of them act as handles and aren't 
actually dereferenced in the majority of the code, but they do get passed 
around a lot. That means a lot of copies of pointers, so lots of data 
structures to trace.

                                            Randy.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-07 19:07                                     ` Randy Brukardt
@ 2015-04-08  3:53                                       ` Paul Rubin
  2015-04-08 21:16                                         ` Randy Brukardt
  0 siblings, 1 reply; 94+ messages in thread
From: Paul Rubin @ 2015-04-08  3:53 UTC (permalink / raw)

"Randy Brukardt" <randy@rrsoftware.com> writes:
>> GC doesn't use that much of the runtime of typical programs
> Given that there is no such thing as a "typical program" when it comes to 
> memory management, that's a meaningless statement.

Take all the actively used programs out there and sort by decreasing
percentage of their runtime used by GC.  Throw away the top and bottom
10% as outliers and call the other 80% typical.  That seems meaningful
to me.

>> in GC'd languages, and the cpu overhead can be made quite small 
> And this isn't remotely the major expense,

So what is the major expense, and how often should anyone care about it?

>> There is no allocation overhead (just bump a pointer)
> And this is bogus. Either items are collected in place, which means that you 
> have to allocate from fragmented memory (a lot of more expensive than 
> "bumping a pointer" - that was the cause of the issues we had with our 
> compiler back in the day), or they're compacted somehow.

Yes, the idea is to use a copying collector, which compacts, so
allocation is just bumping a pointer.  I don't claim that the
copying/compaction phase has no overhead.  But, it is not too bad in
practice, for most applications in which GC is usable at all.

> And compaction is very expensive,

Maybe your info is out of date.  Modern gc's use generational copying,
so during most collections, only recently created data gets copied,
which isn't very much data.  There are also larger collections when
everything gets copied and that takes longer, but it's less frequent.
The aggregate overhead isn't too much.  There's tons of heavily used
code out there using GC these days, e.g. Gmail is written in Java unless
I'm mistaken.  They wouldn't have done that if GC was unaffordable.

> simply because copying things around for no reason is expensive. (Ada
> 2012 has to add access-in-place operations to the Ada Containers
> specifically to eliminate that overhead.)

That sounds like you're talking about copying in the application and not
in the GC.  The access frequency could be far higher in the app,
increasing the copying costs.

> Our compiler uses a lot of pointers; most of them act as handles and
> aren't actually dereferenced in the majority of the code, but they do
> get passed around a lot. That means a lot of copies of pointers, so
> lots of data structures to trace.

You trace from each pointer just once, either by setting a mark bit in
the object in the case of a mark-sweep collector, or by overwriting the
pointed-to object with a forwarding pointer in the case of a copying
collector.  So the number of objects matters much more than the number
of pointers.

Anyway, look at it like this.  If your compiler was written 30 years ago
and ran at tolerable speed then, it's 1000x faster now because of
advances in hardware.  Even if using GC would have slowed it down by 2x,
it's still 500x faster than before.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-07  5:57                     ` Dmitry A. Kazakov
@ 2015-04-08  4:12                       ` Paul Rubin
  2015-04-08  6:45                         ` Dmitry A. Kazakov
  0 siblings, 1 reply; 94+ messages in thread
From: Paul Rubin @ 2015-04-08  4:12 UTC (permalink / raw)


"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
> Saving FPU registers, any registers, as a matter of fact, is as much
> pervasive. How is it high level?

GC has to understand the layout of every object in memory, so it can
know what to trace.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-08  4:12                       ` Paul Rubin
@ 2015-04-08  6:45                         ` Dmitry A. Kazakov
  0 siblings, 0 replies; 94+ messages in thread
From: Dmitry A. Kazakov @ 2015-04-08  6:45 UTC (permalink / raw)

On Tue, 07 Apr 2015 21:12:25 -0700, Paul Rubin wrote:

> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
>> Saving FPU registers, any registers, as a matter of fact, is as much
>> pervasive. How is it high level?
> 
> GC has to understand the layout of every object in memory, so it can
> know what to trace.

Similarly you must know the layout of the objects you load into
register(s). You might need to expand sign for example. Consider an
advanced optimizer that loads small records and arrays into register sets.
What about fat pointer objects, loaded into registers while doing checks,
maybe loading memory pages etc.

However complicated implementation of a language construct in the machine
code is, that does not automatically make the construct high level, IMO.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-08  3:53                                       ` Paul Rubin
@ 2015-04-08 21:16                                         ` Randy Brukardt
  2015-04-09  1:36                                           ` Paul Rubin
                                                             ` (2 more replies)
  0 siblings, 3 replies; 94+ messages in thread
From: Randy Brukardt @ 2015-04-08 21:16 UTC (permalink / raw)

"Paul Rubin" <no.email@nospam.invalid> wrote in message 
news:878ue3ff6y.fsf@jester.gateway.sonic.net...
> "Randy Brukardt" <randy@rrsoftware.com> writes:
>>> GC doesn't use that much of the runtime of typical programs
>> Given that there is no such thing as a "typical program" when it comes to
>> memory management, that's a meaningless statement.
>
> Take all the actively used programs out there and sort by decreasing
> percentage of their runtime used by GC.  Throw away the top and bottom
> 10% as outliers and call the other 80% typical.  That seems meaningful
> to me.
>
>>> in GC'd languages, and the cpu overhead can be made quite small
>> And this isn't remotely the major expense,
>
> So what is the major expense, and how often should anyone care about it?

Tracing, as I discussed below.

>>> There is no allocation overhead (just bump a pointer)
>> And this is bogus. Either items are collected in place, which means that 
>> you
>> have to allocate from fragmented memory (a lot of more expensive than
>> "bumping a pointer" - that was the cause of the issues we had with our
>> compiler back in the day), or they're compacted somehow.
>
> Yes, the idea is to use a copying collector, which compacts, so
> allocation is just bumping a pointer.  I don't claim that the
> copying/compaction phase has no overhead.  But, it is not too bad in
> practice, for most applications in which GC is usable at all.
>
>> And compaction is very expensive,
>
> Maybe your info is out of date.  Modern gc's use generational copying,
> so during most collections, only recently created data gets copied,
> which isn't very much data.

That's a fantasy for many programs. And for the ones that it isn't, subpools 
work just as well.

> There are also larger collections when
> everything gets copied and that takes longer, but it's less frequent.

Of course. But those are the only ones that actually do anything. And it's 
that overhead (plus the excessive tracing overhead) that matter.

> The aggregate overhead isn't too much.  There's tons of heavily used
> code out there using GC these days, e.g. Gmail is written in Java unless
> I'm mistaken.  They wouldn't have done that if GC was unaffordable.

They did it because it allowed them to use trained apes to program. :-)

Let me be clear here: Ada is used by programmers that want to get their 
programs right *and* need native code performance (performance might mean 
space or mean speed). That means that we're typically worrying about the 
last 25%. GC sucks up that last 25%.

"Most programmers", OTOH, don't care about correctness (well, more 
accurately, their management doesn't care, because bugs allow them to sell 
subscriptions and other products [why is anti-virus needed, for instance? An 
unwillingness to adopt simple rules to prevent the running of foreign code, 
mostly caused by programming in languages like C that don't check bounds.] 
And often they don't care about performance, either, concentrating on 
time-to-market and other issues.

These people are not (IMHO, I know David B. would disagree) ever going to be 
Ada users. While there would be some benefit in faster time-to-market if one 
spends less time debugging, and Ada development is more reproducible for 
that reason, the up-front costs of actually designing your code will be too 
much a time sink for such users to buy into. Unfortunate, but reality.

That's what I call "trained ape" programming. It's so easy, anyone can do 
it, so anyone does, and we get the mounds of crapware out there that barely 
works and steals your identity at the same time. Bah-humbug. ;-)

>> simply because copying things around for no reason is expensive. (Ada
>> 2012 has to add access-in-place operations to the Ada Containers
>> specifically to eliminate that overhead.)
>
> That sounds like you're talking about copying in the application and not
> in the GC.  The access frequency could be far higher in the app,
> increasing the copying costs.

Copying (of anything) is unnecessary overhead. If the programmer put it 
there, that's their problem. OTOH, if the implementation puts it there, it's 
a problem for all programmers, and that's a real problem. Remember again 
that I'm trying to get the fastest/smallest possible code to implement the 
programmer's algorithm. Any implementation overhead that can be eliminated 
should be eliminated, because that's overhead that the programmer can do 
nothing about.

>> Our compiler uses a lot of pointers; most of them act as handles and
>> aren't actually dereferenced in the majority of the code, but they do
>> get passed around a lot. That means a lot of copies of pointers, so
>> lots of data structures to trace.
>
> You trace from each pointer just once, either by setting a mark bit in
> the object in the case of a mark-sweep collector, or by overwriting the
> pointed-to object with a forwarding pointer in the case of a copying
> collector.  So the number of objects matters much more than the number
> of pointers.

That makes no sense to me whatsoever. The whole point of tracing is to 
determine what is reachable (since GC cleans up objects that are no longer 
reachable). There's two ways to do that; one is to retrace everything that 
exists (based on some algorithm to avoid retracing stuff that hasn't 
changed), alternatively, one can somehow determine when pointers are 
destroyed and update data structures based on that. In the first case (which 
seems to be what you are talking about), you have to revisit every pointer 
variable that might have been changed (which is pretty much all of them) 
with some frequency in order to determine what they're pointing at. Pointer 
constants are rare in Ada (primarily, just formal parameters); everything 
else is a variable.

And of course, a "forwarding pointer" is just an extra level of indirection. 
There is an old saying in the compiler world that any problem can be solved 
with a level of indirection. The problem is, that such a level of 
indirection costs quite a bit, as much as 10% for frequently accessed 
objects. And of course in native code, once that indirection is in place, it 
will stay there forever (one could use a thunk instead to make the 
indirection conditional, but that would be even more expensive, because of 
cache effects).

> Anyway, look at it like this.  If your compiler was written 30 years ago
> and ran at tolerable speed then, it's 1000x faster now because of
> advances in hardware.  Even if using GC would have slowed it down by 2x,
> it's still 500x faster than before.

Our compiler was barely usable 30 years ago; the only thing that made it 
useful was that everybody was in the same boat. (On the order of 5 minutes 
per compilation unit.) There were many things that we could not do because 
of lack of CPU power. The current version is faster by a lot, but it still 
can take minutes per compilation unit (although most units are much faster). 
There's still a lot of stuff that could be done that most users would not 
tolerate the wait for. I'm not remotely interested in wasting a decent 
amount of time on a mechanism that is not needed for most real-world Ada 
code (not to mention a compiler). People who don't know better (or don't 
care) may come to a different conclusion. They're probably not trying to 
integrate proof into the compiler. :-)

                                      Randy.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-08 21:16                                         ` Randy Brukardt
@ 2015-04-09  1:36                                           ` Paul Rubin
  2015-04-09 23:26                                             ` Randy Brukardt
  2015-04-09  2:36                                           ` David Botton
  2015-04-09  8:55                                           ` Georg Bauhaus
  2 siblings, 1 reply; 94+ messages in thread
From: Paul Rubin @ 2015-04-09  1:36 UTC (permalink / raw)

"Randy Brukardt" <randy@rrsoftware.com> writes:
> Let me be clear here: Ada is used by programmers that want to get their 
> programs right *and* need native code performance (performance might mean 
> space or mean speed). That means that we're typically worrying about the 
> last 25%. GC sucks up that last 25%.

Hmm, ok, but that's a very narrow niche of programs in my experience.
Non real-time programs where the last 25% matters.  If it's realtime, GC
is off the table and there's no decision to make.  For something like a
compiler, people use GCC all day long despite other compilers being 10x
faster, so I'd say the 25% isn't of consequence.  So why wouldn't I use
GC in a compiler if it was going to make my job easier?

> "Most programmers", OTOH, don't care about correctness ...  These
> people are not (IMHO, I know David B. would disagree) ever going to be
> Ada users.

I'd say in non-critical but real-time or memory-constrained applications
(think of consumer electronics typically programmed in C), correctness
at the level of high-end formal verification is not an
issue--conventional QA testing results in products that meet customer
expectations.  Yet Ada is enough of a productivity win to make it
attractive over C.  If correctness is critical then that makes Ada's
advantage even bigger.

In non-real-time applications critically needing what I call
denotational correctness (is that the right term?  It means the program
must never deliver a wrong answer, compilers being an archetypal
example), I probably want GC unless I can't stand that 25% performance
hit.  I compile code all the time and I'd like a 2x or 5x or 20x faster
compiler, but 25% faster is a yawner.
>
> That's what I call "trained ape" programming. It's so easy, anyone can
> do it, so anyone does,

If GC gives those trained apes such a boost that their output competes
with your experts, just imagine how much better your experts would be if
they used it too.

> Copying (of anything) is unnecessary overhead. 

There's a huge difference between an application copying something out
of an Ada container 1e5 or 1e6 times a second, and a GC copying it once
or twice a second.  

>> You trace from each pointer just once, either by setting a mark bit...
> That makes no sense to me whatsoever. The whole point of tracing is to 
> determine what is reachable 

I just mean you trace from an object to all the other objects reachable
from it, then you set a bit in the object marking it as already having
been traced.  Then the next time you find a pointer to the object, you
see that the bit is set, and you don't have to trace it again.

> And of course, a "forwarding pointer" is just an extra level of indirection. 

The forwarding pointer only exists during GC.  It means you've copied
the object from the old place to the new place, so as you copy other
objects that contained the old address, you update them to point to the
new address.

> Our compiler was barely usable 30 years ago.... (On the order of 5 minutes 
> per compilation unit.) There were many things that we could not do because 
> of lack of CPU power. The current version is faster by a lot, but it still 
> can take minutes per compilation unit (although most units are much
> faster). 

OK, so let's say a little under 2 minutes per CU on the average (single
threaded compilation).  Your customer's program has 2000 CU's and he
wants to compile it and run regression tests every night, so he wants
the compilation to complete in 2 hours.  His build farm uses commodity
4-core Intel CPU's so he needs 8 compilation servers.  Your competitor's
programmers are about equal to yours, and his compiler does about the
same thing, but he chose to use GC, so his compiler is 25% slower, which
means the customer needs 10 servers instead of 8.  So the customer will
pay more for your compiler--but not much, since x86 servers are cheap.
Your costs are mostly from programmer salaries developing the compiler.
Did your competitor's productivity gains from GC let him underbid you by
enough that the customer chooses him?  If yes, maybe you ought to
reconsider GC.

That is actually a pretty extreme case.  A real such customer is more
likely to say there's no difference between a 2 hour nightly compilation
and a 2.5 hour one.  Programs where the last 25% matters are the ones
where the program runs on hundreds of servers 24/7.  Those programs
exist, but they're not that common.  The economics of computing have
changed so that programmer time is more valuable than machine time by a
far larger ratio than before.  There is no longer a company timesharing
system doing the compilations, whose depreciation and operation cost is
equivalent to several full time salaries.  With cloud computing, if you
occasionally need 100 servers for something, you can spin them up with
an API, pay a few cents per hour per server til you've completed your
task, then spin them back down.

> People who don't know better... are probably not trying to integrate
> proof into the compiler. :-)

What exactly is your compiler doing?  The compilers I know of with
serious proof automation use external SMT solvers.  Admittedly these
tend to be written in C++ ;-)

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-08 21:16                                         ` Randy Brukardt
  2015-04-09  1:36                                           ` Paul Rubin
@ 2015-04-09  2:36                                           ` David Botton
  2015-04-09  8:55                                           ` Georg Bauhaus
  2 siblings, 0 replies; 94+ messages in thread
From: David Botton @ 2015-04-09  2:36 UTC (permalink / raw)


<<Java (not C) is probably Ada's main competition for big new 
projects these days, >>

Not really

Then, now and always, the reality is that the decision of what language to use is based on:

1) What tools and libraries to get things done exist. What Ichibiah called interfaces and clearly complained about their lack (see his famous letter to the 9X group). Most often the libraries/tools are found first and the language is dictated exclusively by this. (Rails, Django, Chip vendor's compiler, OS native language like objective-C / Swift, etc.)

2) What the staff whines about to their management that they want to use or are willing to use, usually because of popularity or next job prospects, sometimes because of technical reasons.

The only things that can override this are:

1. Mandates (and not really...)
2. Cost factors

So reality is that if Ada provides important #1s and cultivates #2 which is more about popularity and curiosity of staff can be generated than you get more Ada use.

So if you feel Ada would benefit the world of programming (or want your future jobs to be in Ada), you need to work on #1 and #2 as your marketing strategy not trying to see who can throw the "C" ball the farthest.

It's a popularity contest (ok and about greasing palms, like with free stuff for key people) always has been.

<<These people are not (IMHO, I know David B. would disagree) ever going to be 
Ada users. >>

Even in the niche market the last Ada vendor is still managing to swim in, the 2 factors of decision I mentioned are key. I watched people do so many stupid things with Java when it came out just because it was new and cool to know that unless those people you don't think will ever use Ada start thinking Ada is cool enough to want to try, than Ada will flat line soon enough. (and so my campaign for a face lift and some cool tools that I'm working on, to popularize an alternative free compiler suite, etc. I hope others will also think up some cool new bindings, libraries and tools to work on, or extend what others are doing and popularize their efforts).

David Botton

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-08 21:16                                         ` Randy Brukardt
  2015-04-09  1:36                                           ` Paul Rubin
  2015-04-09  2:36                                           ` David Botton
@ 2015-04-09  8:55                                           ` Georg Bauhaus
  2015-04-09  9:38                                             ` Dmitry A. Kazakov
  2015-04-09 23:35                                             ` Randy Brukardt
  2 siblings, 2 replies; 94+ messages in thread
From: Georg Bauhaus @ 2015-04-09  8:55 UTC (permalink / raw)


On 08.04.15 23:16, Randy Brukardt wrote:
> That's what I call "trained ape" programming. It's so easy, anyone can do
> it, so anyone does, and we get the mounds of crapware out there that barely
> works and steals your identity at the same time. Bah-humbug.;-)

Suppose

(1) there exists an implementation of Ada that supports automatic
memory management(*), and

(2) that doing all of the following becomes really easy:
- to specify a pool for objects that are handled automatically,
- to explain what that means, and
- to pick a pool that exists and is portable (sort of).

Then, I think, these conditions will describe something one might
have come to expect of Ada: it keeps promises (to be GC-able) and
makes everything explicit in source.

In addition, the properties of the collector and how it affects the
program might be much more visible and manageable than in many other
main stream languages.

(Or else, at least provide more efficient, more predictable containers
based on contracts. ;-)

__
(*) There happen to be two now: one (at least one) for JVM,
and for .NET. But do they count?


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-09  8:55                                           ` Georg Bauhaus
@ 2015-04-09  9:38                                             ` Dmitry A. Kazakov
  2015-04-09 13:14                                               ` G.B.
  2015-04-09 23:35                                             ` Randy Brukardt
  1 sibling, 1 reply; 94+ messages in thread
From: Dmitry A. Kazakov @ 2015-04-09  9:38 UTC (permalink / raw)


On Thu, 09 Apr 2015 10:55:38 +0200, Georg Bauhaus wrote:

> On 08.04.15 23:16, Randy Brukardt wrote:
>> That's what I call "trained ape" programming. It's so easy, anyone can do
>> it, so anyone does, and we get the mounds of crapware out there that barely
>> works and steals your identity at the same time. Bah-humbug.;-)
> 
> Suppose
> 
> (1) there exists an implementation of Ada that supports automatic
> memory management(*), and
> 
> (2) that doing all of the following becomes really easy:
> - to specify a pool for objects that are handled automatically,
> - to explain what that means, and
> - to pick a pool that exists and is portable (sort of).
> 
> Then, I think, these conditions will describe something one might
> have come to expect of Ada: it keeps promises (to be GC-able) and
> makes everything explicit in source.
> 
> In addition, the properties of the collector and how it affects the
> program might be much more visible and manageable than in many other
> main stream languages.
> 
> (Or else, at least provide more efficient, more predictable containers
> based on contracts. ;-)

And the smiley means? You certainly know that there is no way to contract
[*] GC.

GC is non-functional and unrelated to the program behavior, which is
another way to explain why GC is neither low nor high level. It is
non-existent.

--------------------
* True or false contracts equally.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-09  9:38                                             ` Dmitry A. Kazakov
@ 2015-04-09 13:14                                               ` G.B.
  2015-04-09 14:35                                                 ` Dmitry A. Kazakov
  0 siblings, 1 reply; 94+ messages in thread
From: G.B. @ 2015-04-09 13:14 UTC (permalink / raw)

On 09.04.15 11:38, Dmitry A. Kazakov wrote:

>> (Or else, at least provide more efficient, more predictable containers
>> based on contracts. ;-)

I mentioned containers because Randy reminded us of
their usefulness as automatic storage managers.
(Containers with contracts have already appeared in GNAT.)

> And the smiley means? You certainly know that there is no way to contract
> [*] GC.

It seems possible to guarantee certain properties of GC.
That's the supplier's side of the contract.

Properties of the collection process can be abstracted
into mathematical estimates, and reliably so, I think.
Consider

   pragma Suppress

The idea of suppressing checks, if "ported" to containers
could be based on observing the container's expectations:
First,

    Container: "Don't do This!
      Because I will be assuming v(This) = False."

The algorithmic "conclusion" then becomes that all checks
that Ada normally associates with This can be Suppressed.
(Tampering, cursors carrying their containers, etc.)

That's the first win. It depends on contracts and clients
being written accordingly.

Now, WRT storage management, assume a bounded container.
The implementation may pre-allocate the maximum number of
objects.(*) Given that, why should it not be possible at all,
in no case, to guarantee a known complexity of storage
management? A bounded vector, say, could specify its
storage management needs, on condition that ... Much like
a hash table package would state that

   "As long as the number of slots is as least N/M,
    finding an object will take no more than f(M, N, g(Hash))."

Thus, if every 5 timeslices, 3 are needed for working with
the data stored in the container, and this container instance
guarantees that maximally 1 timeslice is needed for all possible
storage management cases, there is 1 left for any lags.

> GC is non-functional and unrelated to the program behavior, which is
> another way to explain why GC is neither low nor high level. It is
> non-existent.

Like everything that actually handles program data, GC does
influence the program's behavior, e.g. by taking time.
Hence, one may influence GC in server JVMs, for example.
But that's the point. Suppose there is an abstract description
of GC behavior, in terms of parameters. Will the description
be sufficiently detailed for guiding the design of an algorithm?

__
(*)For the sake of simplicity, WLOG. (I think GNAT's variant
records are similar.)

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-09 13:14                                               ` G.B.
@ 2015-04-09 14:35                                                 ` Dmitry A. Kazakov
  2015-04-09 15:43                                                   ` G.B.
  2015-04-09 18:40                                                   ` Niklas Holsti
  0 siblings, 2 replies; 94+ messages in thread
From: Dmitry A. Kazakov @ 2015-04-09 14:35 UTC (permalink / raw)

On Thu, 09 Apr 2015 15:14:59 +0200, G.B. wrote:

> Now, WRT storage management, assume a bounded container.
> The implementation may pre-allocate the maximum number of
> objects.(*) Given that, why should it not be possible at all,
> in no case, to guarantee a known complexity of storage
> management? A bounded vector, say, could specify its
> storage management needs, on condition that ... Much like
> a hash table package would state that
> 
>    "As long as the number of slots is as least N/M,
>     finding an object will take no more than f(M, N, g(Hash))."

Which is elementary incomputable, as it contains the future tense and thus
cannot be a part of any contract.

>> GC is non-functional and unrelated to the program behavior, which is
>> another way to explain why GC is neither low nor high level. It is
>> non-existent.
> 
> Like everything that actually handles program data, GC does
> influence the program's behavior, e.g. by taking time.

That is not the program behavior. The behavior is only things related to
the program logic = functional things. Any other things happening upon
execution of the program are abstracted away, which is the only way to
define 'behavior' without dragging the whole Universe into the picture.

> Hence, one may influence GC in server JVMs, for example.

Yes, and this is why GC is a bad idea, since its potentially breaks the
abstraction and causes unpredicted behavior = *use* of GC is a [potential]
bug.

> But that's the point. Suppose there is an abstract description
> of GC behavior, in terms of parameters. Will the description
> be sufficiently detailed for guiding the design of an algorithm?

Sure. But that is beside the point. You can design GC and you can do it in
a fully contractual manner under SPARK with all bells and whistles. But
that would not change anything for the application program which would try
to use GC. These are two unrelated things. A correctly behaving GC is still
a bug. Compare it with goto. There is no doubt that jumps and labels behave
correctly...

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-09 14:35                                                 ` Dmitry A. Kazakov
@ 2015-04-09 15:43                                                   ` G.B.
  2015-04-09 17:26                                                     ` Dmitry A. Kazakov
  2015-04-09 18:40                                                   ` Niklas Holsti
  1 sibling, 1 reply; 94+ messages in thread
From: G.B. @ 2015-04-09 15:43 UTC (permalink / raw)


On 09.04.15 16:35, Dmitry A. Kazakov wrote:

>>     "As long as the number of slots is as least N/M,
>>      finding an object will take no more than f(M, N, g(Hash))."
>
> Which is elementary incomputable, as it contains the future tense and thus
> cannot be a part of any contract.

"as long as" makes it universal in time. Invariant:

  "... if #slots >= N/M, then
   for all x . TIME(x, Table, Find) <= f(M, N, g(Hash))"


>> Like everything that actually handles program data, GC does
>> influence the program's behavior, e.g. by taking time.
>
> That is not the program behavior. The behavior is only things related to
> the program logic = functional things.

Time is "functionally" essential (milliseconds response time,
say), is not "the universe", but surely is part of observable
behavior?

>> Hence, one may influence GC in server JVMs, for example.
>
> Yes, and this is why GC is a bad idea, since its potentially breaks the
> abstraction and causes unpredicted behavior = *use* of GC is a [potential]
> bug.

Precisely whence a GC contract is required.

>> But that's the point. Suppose there is an abstract description
>> of GC behavior, in terms of parameters. Will the description
>> be sufficiently detailed for guiding the design of an algorithm?
>
> Sure. But that is beside the point. You can design GC and you can do it in
> a fully contractual manner under SPARK with all bells and whistles. But
> that would not change anything for the application program which would try
> to use GC.

Pardon? Why else would you write the contract?

> Compare it with goto. There is no doubt that jumps and labels behave
> correctly...

Which is why Ada adds clauses that make "goto" be (relatively)
well behaved. Like a contract does.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-09 15:43                                                   ` G.B.
@ 2015-04-09 17:26                                                     ` Dmitry A. Kazakov
  0 siblings, 0 replies; 94+ messages in thread
From: Dmitry A. Kazakov @ 2015-04-09 17:26 UTC (permalink / raw)


On Thu, 09 Apr 2015 17:43:46 +0200, G.B. wrote:

> On 09.04.15 16:35, Dmitry A. Kazakov wrote:
> 
>>>     "As long as the number of slots is as least N/M,
>>>      finding an object will take no more than f(M, N, g(Hash))."
>>
>> Which is elementary incomputable, as it contains the future tense and thus
>> cannot be a part of any contract.
> 
> "as long as" makes it universal in time. Invariant:
> 
>   "... if #slots >= N/M, then
>    for all x . TIME(x, Table, Find) <= f(M, N, g(Hash))"

How are you going to compute this? Not to mention that is does not depend
on what the subprogram actually does, on its code. Let you take two CPUs
running it. One is circling a neutron star. Which one fulfils the contract
and according to which clock?

>>> Like everything that actually handles program data, GC does
>>> influence the program's behavior, e.g. by taking time.
>>
>> That is not the program behavior. The behavior is only things related to
>> the program logic = functional things.
> 
> Time is "functionally" essential (milliseconds response time,
> say), is not "the universe", but surely is part of observable
> behavior?

So might be power consumption, weight, emitted radiation etc. None of which
is functional.

>>> Hence, one may influence GC in server JVMs, for example.
>>
>> Yes, and this is why GC is a bad idea, since its potentially breaks the
>> abstraction and causes unpredicted behavior = *use* of GC is a [potential]
>> bug.
> 
> Precisely whence a GC contract is required.

The contract you cannot state.

>>> But that's the point. Suppose there is an abstract description
>>> of GC behavior, in terms of parameters. Will the description
>>> be sufficiently detailed for guiding the design of an algorithm?
>>
>> Sure. But that is beside the point. You can design GC and you can do it in
>> a fully contractual manner under SPARK with all bells and whistles. But
>> that would not change anything for the application program which would try
>> to use GC.
> 
> Pardon? Why else would you write the contract?

"Why" and "else" applied to what?

>> Compare it with goto. There is no doubt that jumps and labels behave
>> correctly...
> 
> Which is why Ada adds clauses that make "goto" be (relatively)
> well behaved.

gotos are absolutely well-behaved = as defined in the RM. Which changes
nothing. You are confusing language with the program (not for the first
time).

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-09 14:35                                                 ` Dmitry A. Kazakov
  2015-04-09 15:43                                                   ` G.B.
@ 2015-04-09 18:40                                                   ` Niklas Holsti
  2015-04-09 19:02                                                     ` Dmitry A. Kazakov
  1 sibling, 1 reply; 94+ messages in thread
From: Niklas Holsti @ 2015-04-09 18:40 UTC (permalink / raw)

On 15-04-09 17:35 , Dmitry A. Kazakov wrote:
> On Thu, 09 Apr 2015 15:14:59 +0200, G.B. wrote:
>
>> Now, WRT storage management, assume a bounded container.
>> The implementation may pre-allocate the maximum number of
>> objects.(*) Given that, why should it not be possible at all,
>> in no case, to guarantee a known complexity of storage
>> management? A bounded vector, say, could specify its
>> storage management needs, on condition that ... Much like
>> a hash table package would state that
>>
>>     "As long as the number of slots is as least N/M,
>>      finding an object will take no more than f(M, N, g(Hash))."
>
> Which is elementary incomputable, as it contains the future tense and thus
> cannot be a part of any contract.

It is quite possible to extend the semantics of a programming language 
with a predefined global (or task-specific) variable "elapsed execution 
time", and the above contract can be stated as a constraint on the 
difference between the pre- and post-values of this variable.

In fact, Ada.Execution_Time provides such variables. The Time_Remaining 
function could be used to express the above constraint.

>> Like everything that actually handles program data, GC does
>> influence the program's behavior, e.g. by taking time.
>
> That is not the program behavior. The behavior is only things related to
> the program logic = functional things.

This is niggling about words, but for real-time systems the execution 
time of computations is certainly an important property of the program, 
whether it is called "behaviour" or not.

That it is a platform-dependent property does not make it less important 
- only harder to manage. Formal analysis tools such as SPARK already 
take into account other platform-dependent properties such as the range 
of the predefined Integer type. The formal analysis tools called 
Worst-Case Execution-Time (WCET) Analyzers try to prove exactly this 
kind of contracts, sometimes using very detailed models of the platforms.

Of course garbage collection (GC) complicates WCET analysis a lot, at 
least if it is done synchronously, interrupting the normal computation 
at unpredictable times. A "background" GC would be more manageable, 
perhaps running on a dedicated core or in a low-priority task. (It is 
then necessary to prove, by some analysis, that the garbage-collection 
rate is sufficiently higher than the garbage-creation rate that a memory 
allocation never fails. That would be similar to analyses already being 
done, such as schedulability analyses and analyses to show that buffers 
never overflow.)

-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-09 18:40                                                   ` Niklas Holsti
@ 2015-04-09 19:02                                                     ` Dmitry A. Kazakov
  2015-04-09 20:38                                                       ` Paul Rubin
  0 siblings, 1 reply; 94+ messages in thread
From: Dmitry A. Kazakov @ 2015-04-09 19:02 UTC (permalink / raw)


On Thu, 09 Apr 2015 21:40:06 +0300, Niklas Holsti wrote:

> On 15-04-09 17:35 , Dmitry A. Kazakov wrote:
>> On Thu, 09 Apr 2015 15:14:59 +0200, G.B. wrote:
>>
>>> Now, WRT storage management, assume a bounded container.
>>> The implementation may pre-allocate the maximum number of
>>> objects.(*) Given that, why should it not be possible at all,
>>> in no case, to guarantee a known complexity of storage
>>> management? A bounded vector, say, could specify its
>>> storage management needs, on condition that ... Much like
>>> a hash table package would state that
>>>
>>>     "As long as the number of slots is as least N/M,
>>>      finding an object will take no more than f(M, N, g(Hash))."
>>
>> Which is elementary incomputable, as it contains the future tense and thus
>> cannot be a part of any contract.
> 
> It is quite possible to extend the semantics of a programming language 
> with a predefined global (or task-specific) variable "elapsed execution 
> time", and the above contract can be stated as a constraint on the 
> difference between the pre- and post-values of this variable.

I had impression that Georg wished to compare the complexity [of free block
search?] after cleaning the garbage with the complexity without cleaning.
No time measurements can help that, obviously. And "will take" is not
"took". Furthermore, it is not a proper contract anyway, as it cannot be
verified *before* a program run. The [proper] contract must apply to all
possible runs (all program states).

> In fact, Ada.Execution_Time provides such variables. The Time_Remaining 
> function could be used to express the above constraint.
> 
>>> Like everything that actually handles program data, GC does
>>> influence the program's behavior, e.g. by taking time.
>>
>> That is not the program behavior. The behavior is only things related to
>> the program logic = functional things.
> 
> This is niggling about words, but for real-time systems the execution 
> time of computations is certainly an important property of the program, 
> whether it is called "behaviour" or not.

It is a constraint.

My point was that it is a lesser problem, IMO, when GC violates constraints
such as time and space constraints in an unpredictable manner. The bigger
problem is that it may violate [proper] behavior because of issues with
finalization and references order (weak, strong, the things Randy said
about designing memory management).

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-09 19:02                                                     ` Dmitry A. Kazakov
@ 2015-04-09 20:38                                                       ` Paul Rubin
  0 siblings, 0 replies; 94+ messages in thread
From: Paul Rubin @ 2015-04-09 20:38 UTC (permalink / raw)

"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
> And "will take" is not "took". Furthermore, it is not a proper
> contract anyway, as it cannot be verified *before* a program run. The
> [proper] contract must apply to all possible runs (all program
> states).

As G.B. explained, the contract is simply:

   "... if #slots >= N/M, then
  for all x . TIME(x, Table, Find) <= f(M, N, g(Hash))"

That contract has a quantifier ("for all x") but that's not inherently
an obstacle to static verification, given a precise enough semantics for
the implementation language that the TIME function can be computed.
It's just like any other mathematical theorem, e.g.

  if integer n > 2, then 
    for all integers a,b,c > 0, a**n + b**n /= c**n

Proof of this theorem is left as an exercise ;-)
The point is that verifying the contract means proving a certain
sentence in predicate logic.  It's likely to be messy in practice, but
conceptually it's straightforward.  We know how to build proof systems
that deal with quantifiers.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-09  1:36                                           ` Paul Rubin
@ 2015-04-09 23:26                                             ` Randy Brukardt
  0 siblings, 0 replies; 94+ messages in thread
From: Randy Brukardt @ 2015-04-09 23:26 UTC (permalink / raw)

"Paul Rubin" <no.email@nospam.invalid> wrote in message 
news:87sicadqvs.fsf@jester.gateway.sonic.net...
> "Randy Brukardt" <randy@rrsoftware.com> writes:
>> Let me be clear here: Ada is used by programmers that want to get their
>> programs right *and* need native code performance (performance might mean
>> space or mean speed). That means that we're typically worrying about the
>> last 25%. GC sucks up that last 25%.
>
> Hmm, ok, but that's a very narrow niche of programs in my experience.
> Non real-time programs where the last 25% matters.  If it's realtime, GC
> is off the table and there's no decision to make.

Right, and that's Ada's #1 market area. And the second group is the other 
bunch where Ada really has something to offer. Otherwise, you're probably 
better off using the dynamic language de-jeure, especially if you are trying 
to get young programmers excited.

>  For something like a
> compiler, people use GCC all day long despite other compilers being 10x
> faster, so I'd say the 25% isn't of consequence.  So why wouldn't I use
> GC in a compiler if it was going to make my job easier?

Because it won't, at least in a compiler. You don't need to free anything in 
a compiler on a modern large memory machine, 'cause it runs for a short time 
and then quits. It's possible but very unlikely to run out of memory (I've 
never seen a program that uses more than 16 megabytes in our compiler [not 
that I check that very often]; it would take a lot larger program to get to 
2GB on a 32-bit machine, not to mention the larger sizes available on a 
64-bit machine).

And of course, GC covers up dangling pointer bugs (of course, Ada does worse 
with such bugs, so it's a wash at best. But that's not an argument in favor 
of GC, but a problem with Ada). It's trivial to use a dead object that one 
mistakenly has a pointer to. It's probably an improvement to not corrupt 
memory in that case, but that means that such bugs are even less likely to 
be found (they're symptom-free).

...
>> That's what I call "trained ape" programming. It's so easy, anyone can
>> do it, so anyone does,
>
> If GC gives those trained apes such a boost that their output competes
> with your experts, just imagine how much better your experts would be if
> they used it too.

Naw, experts don't need those crutches. Their apparent productivity will be 
lower, because they're writing preconditions and constraints and assertions 
and package specs rather than slinging code. And they'll probably get laid 
off because the buggy mess will be "done" well before the expert code that 
actually works right reaches that point.

...
>>> You trace from each pointer just once, either by setting a mark bit...
>> That makes no sense to me whatsoever. The whole point of tracing is to
>> determine what is reachable
>
> I just mean you trace from an object to all the other objects reachable
> from it, then you set a bit in the object marking it as already having
> been traced.  Then the next time you find a pointer to the object, you
> see that the bit is set, and you don't have to trace it again.

Sure, but it's the large mess of top-level pointers (not the ones within 
objects) that are so expensive to trace. And there's at least as many of 
those in my code (once you count parameters, local variables, and the like). 
Plus the pointers within objects can change (and sometimes frequently 
change). So there has to be overhead to clear the tracing whenever something 
changes in the object. That's a global, distributed overhead (it's in every 
object).

...
>> Our compiler was barely usable 30 years ago.... (On the order of 5 
>> minutes
>> per compilation unit.) There were many things that we could not do 
>> because
>> of lack of CPU power. The current version is faster by a lot, but it 
>> still
>> can take minutes per compilation unit (although most units are much
>> faster).
>
> OK, so let's say a little under 2 minutes per CU on the average (single
> threaded compilation).  Your customer's program has 2000 CU's and he
> wants to compile it and run regression tests every night, so he wants
> the compilation to complete in 2 hours.  His build farm uses commodity
> 4-core Intel CPU's so he needs 8 compilation servers.  Your competitor's
> programmers are about equal to yours, and his compiler does about the
> same thing, but he chose to use GC, so his compiler is 25% slower, which
> means the customer needs 10 servers instead of 8.  So the customer will
> pay more for your compiler--but not much, since x86 servers are cheap.
> Your costs are mostly from programmer salaries developing the compiler.
> Did your competitor's productivity gains from GC let him underbid you by
> enough that the customer chooses him?  If yes, maybe you ought to
> reconsider GC.

I've yet to find a customer that doesn't want their compiler faster. And the 
point of the 25% isn't any particular 25%, but the fact that you need to 
find a bunch of 5% improvements in order to make any sort of significant 
difference.

There is also the cognetive part of compilation speeds; the time taken to do 
something is not perceived linearly by humans. There is a point at which 
people tend to go off and do something else while waiting, and that is a 
productivity drag that far outweighs anything that a programming language 
could offer. A two hour build would be unacceptable to many projects (I 
personally hate anything that runs over 10 minutes).

Anyway, the "productivity gains from GC" are an illusion. If you use local 
variables in Ada (which can be dynamically sized, remember), the compiler 
manages the memory and surely GC is not easier than that. If you use 
containers (including the map and tree containers for complex data 
structures, and whose elements can be classwide so that they will allow any 
member of a class), then the memory management is done by the container. 
With the Ada 2012 syntax, they work like an array in many ways. Easy.

GC only could possibly have an advantage when one uses allocated objects, 
but the use of "access" and allocated objects should be a last resort in 
modern Ada -- to be used only in the rare case when performance of the 
built-in data structures is inadequate.

Anyway, GC is realistically incompatible with modern Ada. Since Ada 95, 
finalization of objects has been defined to happen at a specified time 
(depending on how and when they are declared). For objects that are 
allocated, that's when the type goes away (unless the object is specifically 
freed with Unchecked_Deallocation or Unchecked_Deallocate_Subpool). Since 
most access types are allocated at library-level, any object with a 
controlled component (which should be true of most ADTs) cannot be collected 
until the program ends. That pretty much makes GC useless (at a minimum, it 
makes it of very restricted utility).

I've tried on several occassions to change those rules to allow 
"unreachable" objects to be finalized sooner, but those proposals have never 
gotten any traction. It's a chicken-and-egg problem: hardly anyone wants to 
fix the language unless there is serious interest in GC, but there cannot be 
serious interest in GC because the language doesn't really allow it.

> The economics of computing have
> changed so that programmer time is more valuable than machine time by a
> far larger ratio than before.

Quite possibly you're right, in which case there is no need for me or for 
Ada (at least not the Ada I know).

...
>> People who don't know better... are probably not trying to integrate
>> proof into the compiler. :-)
>
> What exactly is your compiler doing?  The compilers I know of with
> serious proof automation use external SMT solvers.  Admittedly these
> tend to be written in C++ ;-)

Not much yet. But I don't trust code that I can't fix; all of the foreign 
code we've integrated over the years has caused trouble and ended up needing 
to be replaced. I'd replace the whole OS with Ada if I could afford to do 
it. :-)

In any event, I think the proof stuff has to be an intergral part of the 
compiler, because it seriously effects the code that gets generated. (If, 
after all, you can prove F(X) = 10 is True, you can replace F(X) with 10 
appropriately. That can be huge win in runtime, especially in things like 
the preconditions of Ada.)

                                      Randy.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-09  8:55                                           ` Georg Bauhaus
  2015-04-09  9:38                                             ` Dmitry A. Kazakov
@ 2015-04-09 23:35                                             ` Randy Brukardt
  2015-04-10 14:16                                               ` G.B.
  1 sibling, 1 reply; 94+ messages in thread
From: Randy Brukardt @ 2015-04-09 23:35 UTC (permalink / raw)

"Georg Bauhaus" <bauhaus@futureapps.invalid> wrote in message 
news:mg5eoh$k18$1@dont-email.me...
> On 08.04.15 23:16, Randy Brukardt wrote:
>> That's what I call "trained ape" programming. It's so easy, anyone can do
>> it, so anyone does, and we get the mounds of crapware out there that 
>> barely
>> works and steals your identity at the same time. Bah-humbug.;-)
>
> Suppose
>
> (1) there exists an implementation of Ada that supports automatic
> memory management(*), and

That's not really possible, unless you're willing to forget controlled 
objects. (See my previous message for details.) There are some technical 
reasons that make it rather hard to fix this (we don't want finalization 
occuring at random places in the program code, because of the possibility 
that it has tasking or other side-effects -- it would be very difficult to 
write a correct Finalize routine if that was possible). [My proposal was 
that "early" finalization would only be allowed at places in the code where 
finalization would normally happen, scope exit and the like].

> (2) that doing all of the following becomes really easy:
> - to specify a pool for objects that are handled automatically,

pragma Default_Storage_Pool. See 13.11.3 in Ada 2012. :-)

> - to explain what that means, and

See 13.11.3 in Ada 2012. :-)

> - to pick a pool that exists and is portable (sort of).

It would seem easy enough for those to exist, assuming that some mechanism 
for tracing is invented. (That's the thing that a user can't sensibly write 
themselves, meaning the implementer and the language would need to be 
somehow involved. Otherwise, you could just use Grein's Smart Pointers and 
forget the whole special pool thing.)

> Then, I think, these conditions will describe something one might
> have come to expect of Ada: it keeps promises (to be GC-able) and
> makes everything explicit in source.

Sounds good. But how to do tracing without a lot of overhead??

                                 Randy.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-09 23:35                                             ` Randy Brukardt
@ 2015-04-10 14:16                                               ` G.B.
  2015-04-10 20:58                                                 ` Randy Brukardt
  0 siblings, 1 reply; 94+ messages in thread
From: G.B. @ 2015-04-10 14:16 UTC (permalink / raw)


On 10.04.15 01:35, Randy Brukardt wrote:
> But how to do tracing without a lot of overhead??

One lightweight way of allocating and tracing might be so
dumb that it is not always useful. But I don't know if
it is never useful, or not useful enough to be considered.
The point is to really request memory once, and then to do
almost nothing. Nothing new here (arena pools).

Imagine a procedure that manipulates a graph.
Or tons of matrices. Or a sequence of short messages.
You need "new T" for the objects(*).

1 - require the pool to have enough room for Max objects:
     one stretch of storage.

2 - make Allocate return the address of the next free
     storage element, etc.; remember the then next free
     storage element in the pool object.

3 - Deallocate does nothing.

When the region ends, Finalize of the pool frees memory.
So, I guess tracing overhead is one assignment for
keeping track of the next free storage element's address.

Depending on the algorithm, this might be wasteful or,
on the contrary, use no more memory than the algorithm
needs in any case. And the programmer does not need to
consider Unchecked_Deallocation and finalization of the
objects. (And one doesn't add a tag to all of the objects
of that one specific type that the algorithm is using!
1st level cache is a scarce resource… ;-)

That tie is inflexible anyway: Rather than turning some
final event in an objects life into its only catch all,
I'd find it more clear and more explicit to have objects
that can react to events, somewhat like POs can react to
interrupts. Freeing memory need not be an immediate
consequence of some object's finalization.

__
(*) With GCC, no large objects can be allocated on the
stack anymore.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-10 14:16                                               ` G.B.
@ 2015-04-10 20:58                                                 ` Randy Brukardt
  0 siblings, 0 replies; 94+ messages in thread
From: Randy Brukardt @ 2015-04-10 20:58 UTC (permalink / raw)

"G.B." <bauhaus@futureapps.invalid> wrote in message 
news:mg8ltb$nql$1@dont-email.me...
> On 10.04.15 01:35, Randy Brukardt wrote:
>> But how to do tracing without a lot of overhead??
>
> One lightweight way of allocating and tracing might be so
> dumb that it is not always useful. But I don't know if
> it is never useful, or not useful enough to be considered.
> The point is to really request memory once, and then to do
> almost nothing. Nothing new here (arena pools).

Right. Subpools provide a (possibly) better way to manage those.

> Imagine a procedure that manipulates a graph.
> Or tons of matrices. Or a sequence of short messages.
> You need "new T" for the objects(*).
>
> 1 - require the pool to have enough room for Max objects:
>     one stretch of storage.

Pretty much like any pool. :-)

> 2 - make Allocate return the address of the next free
>     storage element, etc.; remember the then next free
>     storage element in the pool object.

Pretty much like most user-defined pools. I have a pool that does exactly 
that, although the purpose is to detect valid vs. dangling pointers.

> 3 - Deallocate does nothing.

Typical in pools that implement subpools; only a subpool deallocate does 
anything. The mark-release pool example in the RM works this way.

> When the region ends, Finalize of the pool frees memory.
> So, I guess tracing overhead is one assignment for
> keeping track of the next free storage element's address.

Right. The subpool version is a bit more flexible, because it allows dumping 
a bunch of objects at a time (say all of the parts of a single matrix ), 
without dumping everything. Of course, if you use subpools, you reintroduce 
the possibility of dangling pointers. One could combine subpools and smart 
pointers such that dangling pointers are detected, that might be a very 
appealing combination. That wouldn't have to be very expensive (the 
expensive part of smart pointers is the implicit finalization, and one 
wouldn't need it here, at least if you could tolerate someone deallocating a 
pointer that is actively being used -- not a very likely occurrence), and it 
could be built in Ada today.

                               Randy. 

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: Languages don't  matter.  A mathematical refutation
  2015-04-03 17:33           ` Bob Duff
@ 2015-04-26 11:38             ` David Thompson
  0 siblings, 0 replies; 94+ messages in thread
From: David Thompson @ 2015-04-26 11:38 UTC (permalink / raw)


On Fri, 03 Apr 2015 13:33:21 -0400, Bob Duff <bobduff@theworld.com>
wrote:

> "J-P. Rosen" <rosen@adalog.fr> writes:
> 
> > [In C] A number of things (handling of integer overflows,
> > semantics of %, dereferencing the null pointer...) are defined in the
> > language as "like the machine does" (this is spelled: implementation
> > defined). Unless it has changed in recent versions of C ?
> 
> No, no C standard has ever said "like the machine does".  Those things
> are defined as "undefined behavior" in C, and that does not mean "like
> the machine does", it means "anything can happen" (as far as the C
> standard is concerned).  "Like the machine does" is not even meaningful
> in a high-level language.  What the machine does depends on what code
> the compiler generates.  For example, <snip>

Actually standard C had and has three levels:

- undefined behavior: anything can happen. This can include the
program crashing (and even the whole system or device crashing if not
using some kind of protected mode), corrupting data, or the favorite
hyperbolic example on comp.lang.c: causing demons to fly out of your
nose. (Google if interested.) Like Ada erroneous.

- unspecified (usually result): one from a set of things defined by
the standard occurs, but you don't know which. A common example is the
evaluation of operands to a nonsequencing two-operand operator, or
arguments in a function call: they can be done in any order, even
overlapped, but each one must be evaluated exactly once. (Although
some confusion occurs because multiple _assignments_ within one
statement are usually undefined behavior; this often occurs as a
result of using nested assignment operators like a++ and a*=b that
also yield a useful value to an outer expression. In Ada, like most(?)
other HLLs, assignment is a statement and never directly nested.)

- implementation-defined: like unspecified, but the implementation
must _document_ the choice, or at least any constraints on the choice.
These freedoms _commonly_ are used by a compiler to do whatever the
machine does for the simplest, fastest, and/or otherwise preferred
code, but the compiler can do other and sometimes surprising things if
it "wants" i.e. the compiler writer chooses to for whatever reason.

One example has changed: % was and is defined in terms of / i.e. x%y =
x-(x/y)*y for nonzero y. But / was originally implementation-defined
if any operand negative; C99 changed / to standardly round-to-zero and
thus % to be same sign as numerator, or zero. Division and remainder
by 0 were/are undefined behavior (in practice often trap/abort). 

Also C _signed_ integer overflow was/is undefined (and in practice
_usually_ wraparound) but _unsigned_ overflow was/is modulo 2^nbits
with nbits implementation-defined. Floating overflow was/is undefined
by C, but may be (and often is) defined by IEEE aka IEC[60]559.

^ permalink raw reply	[flat|nested] 94+ messages in thread

end of thread, other threads:[~2015-04-26 11:38 UTC | newest]

Thread overview: 94+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-25 11:46 Languages don't matter. A mathematical refutation Jean François Martinez
2015-03-25 15:19 ` Paul Rubin
2015-04-03  0:50   ` robin.vowels
2015-04-03  2:18     ` Jeffrey Carter
2015-04-03 13:37       ` Bob Duff
2015-04-03 14:13         ` Dmitry A. Kazakov
2015-04-03 17:34           ` Paul Rubin
2015-04-03 19:34             ` Dmitry A. Kazakov
2015-04-03 19:58               ` Paul Rubin
2015-04-04  6:59                 ` Dmitry A. Kazakov
2015-04-06 21:12                   ` Paul Rubin
2015-04-07  5:57                     ` Dmitry A. Kazakov
2015-04-08  4:12                       ` Paul Rubin
2015-04-08  6:45                         ` Dmitry A. Kazakov
2015-04-04  0:41             ` Dennis Lee Bieber
2015-04-04  3:05               ` Paul Rubin
2015-04-04 14:46                 ` Dennis Lee Bieber
2015-04-04 15:41                   ` brbarkstrom
2015-04-04 19:20                   ` Paul Rubin
2015-04-04 20:00                     ` Dmitry A. Kazakov
2015-04-04 20:44                       ` Paul Rubin
2015-04-05  8:00                         ` Dmitry A. Kazakov
2015-04-05  9:55                           ` Brian Drummond
2015-04-06 21:27                             ` Randy Brukardt
2015-04-06 17:07                           ` Paul Rubin
2015-04-06 17:41                             ` Dmitry A. Kazakov
2015-04-06 18:35                               ` Paul Rubin
2015-04-06 21:46                                 ` Randy Brukardt
2015-04-06 22:12                                   ` Paul Rubin
2015-04-06 23:40                                     ` Jeffrey Carter
2015-04-07 19:07                                     ` Randy Brukardt
2015-04-08  3:53                                       ` Paul Rubin
2015-04-08 21:16                                         ` Randy Brukardt
2015-04-09  1:36                                           ` Paul Rubin
2015-04-09 23:26                                             ` Randy Brukardt
2015-04-09  2:36                                           ` David Botton
2015-04-09  8:55                                           ` Georg Bauhaus
2015-04-09  9:38                                             ` Dmitry A. Kazakov
2015-04-09 13:14                                               ` G.B.
2015-04-09 14:35                                                 ` Dmitry A. Kazakov
2015-04-09 15:43                                                   ` G.B.
2015-04-09 17:26                                                     ` Dmitry A. Kazakov
2015-04-09 18:40                                                   ` Niklas Holsti
2015-04-09 19:02                                                     ` Dmitry A. Kazakov
2015-04-09 20:38                                                       ` Paul Rubin
2015-04-09 23:35                                             ` Randy Brukardt
2015-04-10 14:16                                               ` G.B.
2015-04-10 20:58                                                 ` Randy Brukardt
2015-04-07  0:36                                 ` Dennis Lee Bieber
2015-04-05 13:57                     ` Dennis Lee Bieber
2015-04-03 16:17         ` J-P. Rosen
2015-04-03 17:33           ` Bob Duff
2015-04-26 11:38             ` David Thompson
2015-04-03 19:00         ` Georg Bauhaus
2015-04-03 19:12         ` Jeffrey Carter
2015-04-03 22:37           ` Bob Duff
2015-04-03 23:38             ` Jeffrey Carter
2015-04-04  0:15               ` Bob Duff
2015-04-04  7:06                 ` Dmitry A. Kazakov
2015-04-04  2:59               ` Paul Rubin
2015-04-04  0:56             ` Dennis Lee Bieber
2015-03-25 17:12 ` Jean François Martinez
2015-03-26 13:43 ` Maciej Sobczak
2015-03-26 15:01   ` Jean François Martinez
2015-03-26 17:45     ` Jeffrey Carter
2015-03-26 15:21   ` Dmitry A. Kazakov
2015-03-27 11:25     ` Jean François Martinez
2015-03-27 17:36       ` Dmitry A. Kazakov
2015-03-30 10:31         ` Jean François Martinez
2015-03-30 11:52           ` Dmitry A. Kazakov
2015-03-30 12:32             ` G.B.
2015-03-30 13:48               ` Dmitry A. Kazakov
2015-03-30 15:47                 ` G.B.
2015-03-30 16:05                   ` Dmitry A. Kazakov
2015-04-02 12:59                     ` brbarkstrom
2015-04-02 13:35                       ` Dmitry A. Kazakov
2015-04-02 14:48                         ` jm.tarrasa
2015-04-02 15:55                           ` brbarkstrom
2015-04-02 16:21                             ` Jean François Martinez
2015-04-02 16:48                             ` Dmitry A. Kazakov
2015-04-02 16:41                           ` Dmitry A. Kazakov
2015-04-04 10:02                             ` jm.tarrasa
2015-04-04 11:16                               ` Dmitry A. Kazakov
2015-04-02 15:58                         ` Jean François Martinez
2015-04-02 16:39                           ` Dmitry A. Kazakov
2015-04-03  9:46                             ` Jean François Martinez
2015-04-03 14:00                               ` Dmitry A. Kazakov
2015-04-03 17:12                                 ` Jean François Martinez
2015-04-02 17:17                         ` G.B.
2015-04-02 19:09                           ` Dmitry A. Kazakov
2015-04-02 18:24                       ` Niklas Holsti
2015-04-02 18:43                       ` Jeffrey Carter
2015-03-30 11:36         ` Jean François Martinez
2015-03-30 10:48       ` jm.tarrasa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox