From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00, REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4 Path: border2.nntp.dca1.giganews.com!nntp.giganews.com!news.glorb.com!aioe.org!.POSTED!not-for-mail From: "Dmitry A. Kazakov" Newsgroups: comp.lang.ada Subject: Re: Languages don't matter. A mathematical refutation Date: Fri, 27 Mar 2015 18:36:40 +0100 Organization: cbb software GmbH Message-ID: <1ht5q4lxmtf3p.mntbczbpti5n.dlg@40tude.net> References: <59ac455c-72f6-43e2-8a79-efc0f3e16d9a@googlegroups.com> <19qfgu5pjszm5.s5y5u8r0zx8k.dlg@40tude.net> <161a69af-a392-4214-bd92-0e20e7522cca@googlegroups.com> Reply-To: mailbox@dmitry-kazakov.de NNTP-Posting-Host: w2sqUGEBZqsVBYNL7Ky3Kg.user.speranza.aioe.org Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8bit X-Complaints-To: abuse@aioe.org User-Agent: 40tude_Dialog/2.0.15.1 X-Notice: Filtered by postfilter v. 0.8.2 Xref: number.nntp.giganews.com comp.lang.ada:192601 Date: 2015-03-27T18:36:40+01:00 List-Id: On Fri, 27 Mar 2015 04:25:21 -0700 (PDT), Jean François Martinez wrote: > On Thursday, March 26, 2015 at 4:21:45 PM UTC+1, Dmitry A. Kazakov wrote: >> On Thu, 26 Mar 2015 06:43:01 -0700 (PDT), Maciej Sobczak wrote: >> >>> Unless you prove that this had no influence on the results in question, I >>> refute the whole of your mathematical proof. >> >> Yep, when *mathematical* statistics is used, then the burden of showing it >> applicable lies on the author's shoulders. In particular, the software >> metric used for some SW design must be shown to be a random variable. The >> elementary outcomes presented. Their independence explained etc. >> >>> But more seriously, it would be interesting to flip languages every year >>> and see whether the advantages of Ada hold in terms of better results. Or >>> run two classes in parallel (with random assignments of students to each >>> class) and compare results over the years. >> >> I doubt the process of random selection of a developing team from a pool of >> "indistinguishable" teams were a good basis for a statistical model of >> software development. >> >> The bottom line: statistic were applied incorrectly and the results >> obtained carry no prediction power. > > Really? Text books on statistics are choke a full of similar examples and > exercises like comparing the proportion of defective products at > manufacting plant A and B then checking the null hypothesis before > deciding that A's products are better. It is not similar. A production process is a physical one of which we know that the behavior is stochastic because the laws of physics are (e.g. laws of thermodynamics are statistical). A developing team is not governed by these laws, because decision making is not random and the system as a whole has memory (people learn in process). > In our case we have a random > variable and that is group > quality. How is this random? A given group has given quality. It is nowhere random: if you measure the quality repeatedly you will get same quality. > And it is common in this field to assume taht your population is a sample > of an infinitely-sized conceptual population. Which is a way different model. If you randomly select a team, all randomness is not in the team but in the selection process. For this to be correct you should show that selected teams are equivalent in terms of given SW metrics. It would be quite difficult to do because students have different exposures to programming, everything is changing and you cannot "recycle" teams. Now, let you managed to do this. What would be the conclusion? You would have shown that a randomly selected team of uneducated pupils produce better SW metrics with Ada than with C when faced given educational objective. So what? Why should this imply anything for SW processes as performed by professionals? Again, we know Ada is better for that. But this study would not show any casualty or statistically proven correlation between both. It is nothing more than a "by-the-way" argument from the statistical POV. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de