From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: 103376,c1131ea1fcd630a
X-Google-Attributes: gid103376,public
From: bobduff@world.std.com (Robert A Duff)
Subject: Re: To Initialise or not
Date: 1996/05/09
Message-ID: <Dr5H2B.5FD@world.std.com>
X-Deja-AN: 153917541
references: <318508FE.204B@sanders.lockheed.com>
 <318E3A26.3565@lmtas.lmco.com> <Dr0MCu.1JG@world.std.com>
 <3190A8D3.3D53@lmtas.lmco.com>
organization: The World Public Access UNIX, Brookline, MA
newsgroups: comp.lang.ada
Date: 1996-05-09T00:00:00+00:00
List-Id: <comp.lang.ada>


In article <3190A8D3.3D53@lmtas.lmco.com>,
Ken Garlington  <garlingtonke@lmtas.lmco.com> wrote:
>Robert A Duff wrote:
>> 
>> Let me ask you this:  Suppose Ada had a rule that all integers were
>> default-initialized to T'First, where T is the subtype of the variable.
>> Would you be happy with that language?  Would it change the way you
>> write code?
>
>I would be happy with that language (ignoring the resource utilization
>effects), and any time I had an integer that I needed to initialize to
>T'First, I would leave off the initialization (knowing that it was going
>to be set to T'First anyway). Why not?

OK, this explains our disagreement.  I would *not* be happy with such a
language.  You would, and would rely on default-initialization to
T'First.  IMHO, that just masks real bugs in the program.  IYHO, it is
helpful, in that it avoids extra "useless" code.  This is a
philosophical difference that explains all the differences in detail
(which I argue about endlessly below;-)).

Let me be quite clear about what I would *like* in a language: Any type
that's built in to the language, and is treated as a non-composite
entity, should have run-time detection of uninit vars.  This includes
integers, floats, pointers, etc.  It might even include strings,
depending on the language.  (Note: one could view floats as a record
containing exponent, mantissa, etc.  Ada doesn't do that, it views the
whole thing together as a number, so that thing taken as a whole should
be subject to uninit var detection.)

A compiler that detects such errors at compile time, in some cases, is
nice, but we know (halting theorem and all that), that we can't do it in
all cases, so run-time detection is the best we can do, in general.

If this run-time detection causes efficiency problems, or problems in
interfacing to C, or to some hardware, or to anything else, the
*programmer* (not the language designer) can turn it off.

If I'm coding in a language that doesn't support the above ideal, then
I'll probably code in a way *as* *if* the above were true, assuming
that's safe (i.e. assuming that my assumptions are a subset of what's
actually required by the language).

>> >However, I think you've missed my point. Let me retry: Do you have a block
>> >of comments in each compilation unit that says:
>> >
>> >-- NOTE: When a declaration using an access type has an explicit initial
>> >-- value of null, this means that the declared object can be used without
>> >-- any further initialization.
>> 
>> No, of course not.  It only needs to be stated once.
>
>Where?

Wherever all the other project-wide information goes.  Come on Ken,
don't you have *any* project-wide information that needs to be
understood by all programmers who wish to modify the code?  Do you
really expect a programmer to jump in to the middle of the code, grab
some random module, and start hacking on it without knowing anything
about the complete product of which it is a part?  It seems like your
arguments here could be applied to *any* coding conventions -- do you
really want to say that coding conventions are evil, and that any code
that obeys the language standard should be OK?

>> And there are lots
>> of other things that need to be stated once, and you need to give the
>> reader some way of easily finding that place.
>
>Could you give some examples?

E.g., "This project obeys the Ada Quality and Style Guide, which may be
found at <so-and-so>."?  Clearly, anybody meddling with the code
*anywhere* in this project needs to know this fact.  But documenting it
in every source file seems like overkill.  You've got to have a central
place to put these interesting facts.

>> I freely admit that if
>> somebody doesn't understand the coding convention, it won't help them
>> understand the code.  (It probably won't hurt, either, though.)
>
>I disagree. Any time I maintain code that does something non-obvious, I have to 
>stop and figure out why the code was written in the strange manner. (Did
>he really just set that pointer to null, or did I misread "null"?)

OK, fair enough.  But still, before meddling with the code on *my*
project, you will be required to read the document that talks about
project-wide conventions, and you will be required to obey those
conventions (or else, if you think they're stupid, get them changed at
the project level, rather than going off on your own and hacking however
you like).

>> In this example, I meant that it is safe to pretend that Ada doesn't
>> initialize pointers.  Please re-read what I wrote in this light -- that
>> is, when I said "THIS is safe", I meant "THIS PRETENSE is a safe
>> assumption".
>
>I disagree. Re-read my comments about valid access objects which can be
>named similarly to null. In a more general sense, the more (unnecessary)
>code written, the greater the likelihood of error.

Yawn.  Yes, there's a possibility that somebody will type "nell" when
they meant "null".  This seems like a minor concern, since misspelling
names is *always* a concern.  For integers, if I write "X: Some_Int =
O;", I might have meant 0 instead of O.  Big deal.  So don't declare
integer variables called "O", and don't declare pointers called "nell".

>> This hypothetical not-quite-Ada language was an attempt to answer your
>> comment saying that it is hard to understand *what* the coding
>> convention *is*.  It's *not* hard, if you're capable of imagining such a
>> language.
>
>And you have the context of this discussion. Without your explanation,
>within ready reach of the reader, it is quite difficult to guess the
>magic intent of the explicit initialization to null. At least, it is for
>me.

I agree that, for the coding convention to be useful, it has to be
documented, and anybody meddling with the code has to know where that
documentation is, and read it.

>> It doesn't make sense to answer this hypothetical explanation with,
>> "But, in Ada 95, pointers *are* initialized to null.  The RM says so."
>
>But it does make sense to ask, "How does the reader get this explanation?"
>It's in some "central" place, apparently, but that's all I know so far...
>
>It also does make sense to ask, as I did originally, "Why is this convention
>useful?" I'm still not sure I see that part of the discussion...

Yes, it does make sense to ask *that*.  If the convention is truly
useless, then it's bad to require that extra code.  IMHO, the convention
is useful.  I'll try (again) to explain why, but I assume we're past the
question of "how do you even know what the convention is, i.e., how does
the programmer know when to put ':= null'".

>> By the way, all your examples concerned local variables.  The more
>> interesting cases are package-body variables, and record components.
>
>Then present an example  of either type, showing why this convention conveys
>useful information.

I did -- the doctor/patient thing.  You didn't buy it.

>> No, no, no.  That's not what I meant about "safe".  Of course implicit
>> null's are just as safe as explicit nulls.
>
>Safer, IMHO.

This explains why you like my hypothetical language that initializes all
integers to T'First.

>> It just seems to me that it's valuable to know whether a given value is
>> going to be used.  It makes the code more readable, IMHO.
>
>I guess I would be more convinced with an actual example.
>
>Does this approach extend to other cases? For example, suppose a record
>object is being assigned with an aggregate, and some of the aggregate values
>will be used in the code, and others will be overwritten first. Do you have
>some mechanism for distinguishing them?

Good question.  Ada doesn't do detection of uninit vars, so it's not
surprising that aggregates have to be complete, even though some parts
will never be used.  Yes, I *do* think it would be valuable to have that
information in the source code, but Ada doesn't provide any way to say
that, except as a comment.  This is no big deal -- every language has
some unexpressible thing, which is why every language has comments.

>What about loops? When I write "for J in Buffer'Range loop", there's
>an implicit initialization of J to Buffer'First. Is there an issue here?

No, there's no issue here.  It is quite clear (and explicit) what J is
initialized to.  Actually, you state it wrong: J is *not* initialized to
Buffer'First, and then incremented through the loop.  That's true at the
implementation level, but at the semantic level, the right way to think
about it is that a new J is created, each time through the loop, and
this J is constant, and destroyed at the end of that particular
iteration.

>It just seems like a strange way to represent data flow information.

You prefer comments, I guess.

>> In this case, the Ada *language* is what violates the KISS principle.  I
>> understand the reasons behind it, but still, the rule is that some
>> variables get a "junk" initial value, and some variables get a
>> well-defined initial value (by default, of course), which is obviously
>> more complicated than treating all variables the same.
>
>However, if you want all (scalar) variables to get a well-defined default
>initial value, you can do that in Ada 95 with Normalize_Scalars (and 'Size),
>right? I don't know why this is critical for you, but it's there if you want
>it...

Pragma Normalize_Scalars is nice, but it doesn't go far enough, for my
taste, because (1) it doesn't work when there are no "extra bits", and
(2) it doesn't require every read of an uninit scalar to be detected --
the program has to actually do something that trips over that value, and
causes some other constraint check to be violated.  And, of course, it
can't detect uninitialized access values, because any access value that
is conceptually uninitialized is actually set to null by the
implementation, thus masking any such bugs.

>> I claim that the coding convention we're arguing about is actually
>> *more* KISS, since it treats integers and pointers alike.
>
>I disagree that consistency necessarily equates to simplicity.

This is a huge difference in our philosophies.  I won't quite say
"necessarily", but "usually", consisistency = simplicity.  This is
certainly one such case.

>(See the example below).
>
>> >I understand that you want pointers to act like integers. I just don't
>> >understand _why_, since pointers are conceptually different from integers.
>> 
>> I don't see any way in which integers and pointers are different with
>> respect to *this* issue (whether or not they should be initialized).
>
>Well, assuming you don't resort to Unchecked_Conversion or something nasty
>like that, any cases I can think of where you can use integers to reference
>a memory location will automatically have bounds (e.g., as an array reference).
>Pointers are unbounded in Ada 95, as far as I can tell -- there is no easy
>way to check for "in range" in a machine-independent fashion. Therefore,
>to avoid illegal references, you have to do something special, right?

Ada 95 had to *add* some rules to achieve this, for integers.  In Ada
83, "A(I) := 3;" will overwrite random memory if I is uninitialized.  In
Ada 95, it will either raise an exception, or overwrite some random
component of A.  Better than nothing, I suppose.  The rule for "Ptr.all
:= 3;" could be the same, except that Ada 83 already said you can rely
on Ptr being initialized to null.

>> >Here's a different example: Suppose you saw code that looked like:
>> >
>> >  X : Integer := Init (Thousands => 1, Hundreds => 4, others => 0)
>> >                                             -- set X to 1400.
>> >
>> >Would this be a preferred approach, since I'm just creating numeric literals
>> >the way I create array aggregates?
>> 
>> Sorry, but you lost me there.
>
>To slightly misquote a famous man: "I claim that the coding convention I'm 
>proposing is actually *more* KISS, since it treats integers and arrays alike....
>I don't see any way in which integers and arrays are different with
>respect to *this* issue (how they should be initialized)."

I don't know of any *famous* man that said anything like that.  ;-)

I guess I see what you're getting at -- if integers are like pointers
with regard to initialization, then all types should be like all other
types, with regard to aggregates.  Seems like a bogus argument, to me.
I said integers and pointers should behave the same with respect to
initialization.  Composites should not.  I certainly didn't say that
integers should be initialized using aggregates.

>However, applying my guideline that you don't write extra code unless there's
>a clear purpose, this coding convention for integer initialization is silly.
>It doesn't communicate, it obfuscates.

I still don't see how you translate a requirement to initialize, into a
requirement to initialize with a weird aggregate with Hundreds
components and so forth.

>> >> But initializing all access values to null does
>> >> *not* constitute a "reasonable" initial value, in many cases.
>> >
>> >This is certainly true. However, can you identify a case where null _is_ a
>> >reasonable initial value, but not a reasonable initial _default_ value?
>> 
>> Sure.  I have a hard time imagining a type that is otherwise.
>> 
>> Suppose (stealing from another thread;-)) we have Patients and Doctors.
>> Each Patient is represented by a record, containing (among other
>> things), a pointer to its Doctor.  Because of the way this application
>> works, you can't create a Patient without giving it a Doctor -- i.e.
>> there is no reasonable default value for the Doctor field, and it must
>> always be non-null.
>
>Wait a minute! In this case, null is neither a reasonable initial value, nor
>a reasonable initial default value. Therefore, it is not a valid example.
>
>> However, there's a hash table, implemented as an array of pointers to
>> Doctors.  I want to initialize all the slots to "null", which means
>> "this slot is empty now".  Conceptually, the array elements should be
>> initialized to null, whereas the My_Doctor field of Patient should be
>> initialized to "Don't touch this!".
>
>Wait a minute! In this case, null is both a reasonable initial value, and
>a reasonable initial default value. Therefore, it is also not a valid example.

But default values are per type.  You can't say, "wait a minute", null
should be the default but null should not be the default.  We have a
single type here -- the language has to define that it is
default-initialized to null, or that it is not default-initialized to
null.  My point is that for one particular type, we want *some* things
initialzied to null, and we want *some* things to be initially undefined
(and we wish our compiler would detect any errors).  Of course, we might
well want *some* things to be initialized to

    := The_Default_Doctor_Assigned_By_The_HMO;

>> I would write ":= (others => null);" on the array, to signify that I
>> intend to *depend* upon the initial null value, but I would *not* write
>> ":= null" on the My_Doctor component of type Patient, to signify that I
>> intend to always explicitly initialize it.
>
>Now, consider both of the following:
>
>1. When I read the Patient data structure, how important to me is it to know
>that Doctor should never be null? Wouldn't this be better as a comment (or even an 
>assertion) within the code which creates Patients, since that's probably
>where I need to know this information?

As a comment?  Surely you're not going to claim that comments are more
reliable than coding conventions?!

As an assertion when the Patient is created?  No, it's an *invariant*,
which needs to be understood by the person writing the code to create
one, and also by the person writing the code to look at one.  When
writing code, you need to know whether
"Sue_Doctor(Some_Patient.My_Doctor)" is valid -- you need to know
whether you have to write:

    if Some_Patient.My_Doctor = null then
        Dont_Bother; -- Patient was self-medicating; noone to blame.
    else
        Sue_Doctor(Some_Patient.My_Doctor);
    end if;

instead.

>2. Suppose I change the application slightly, such that a Patient is created
>when they enter the waiting room. In this case, it may be quite valid for a
>Patient to exist, but not have an associated Doctor. In order to maintain
>your coding convention, the maintainer must go back to the Patient record and
>add an explicit initialization. Is this likely to happen, particularly if the
>code is separated from the data structure, and nothing will happen functionally
>if the maintainer fails to do this? Or will your coding convention "drift" over
>time, causing more confusion?

If the application is different, then yes, you have to change the code.
If you used comments, instead, then you'd have to change the comments.
Either way, there's a danger that the programmer will forget to do it.
This is the nature of comments, which is the same nature as un-enforced
coding conventions.  I don't see any way around that, except to outlaw
both comments and coding conventions.

>So, here's my summary:
>
>1. The convention is not self-evident. It has to be explained somewhere. If the 
>maintainer fails to get that understanding, it will cause confusion while (s)he 
>searches for the meaning to this "useless" code.

How is this different from any other coding convention?  Do you claim
that coding conventions are always evil, because of this problem?

>2. Even if the convention is understood, its value is hampered by the fact that
>it doesn't convey information at the point where it is needed (the algorithm using 
>the data structure).

No algorithm using a data structure can be understood without looking at
the type declarations involved.  This is just one more such case -- if
you see "Some_Component: Some_Pointer := null;", that tells you that you
can (and should) write an algorithm that reads Some_Component before
setting it.

>3. As a corollary to #2, further maintenance of the code makes it easy for the
>coding convention to be inconsistently applied. This further obfuscates the code.

Agreed.  What's the alternative?  Comments?  Same maintenance problem.
Don't bother?  Well, there is some useful information here, do you
really want to hide it from the users of this type?

>4. Because it is "active" code which the compiler analyzes (unlike a comment), 
>there is also the danger (admittedly slight) of a coding error being introduced.
>
>Overall, I stand by my original statement: I don't see the attraction of this 
>style.

I guess we'll have to agree to disagree.  If you work on *my* project,
you'll have to obey *my* conventions.  If I work on *your* project I
will, of course, obey your conventions.  Either way, the maintainer can
benefit from knowing what the coding conventions are.

- Bob