From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: 109fba,df854b5838c3e14
X-Google-Attributes: gid109fba,public
X-Google-Thread: 1014db,df854b5838c3e14
X-Google-Attributes: gid1014db,public
X-Google-Thread: 10db24,fec75f150a0d78f5
X-Google-Attributes: gid10db24,public
X-Google-Thread: 103376,df854b5838c3e14
X-Google-Attributes: gid103376,public
From: c2a192@ugrad.cs.ubc.ca (Kazimir Kylheku)
Subject: Re: ANSI C and POSIX (was Re: C/C++ knocks the crap out of Ada)
Date: 1996/04/09
Message-ID: <4kdspcINN6ct@keats.ugrad.cs.ubc.ca>
X-Deja-AN: 146550074
references: <JSA.96Feb16135027@organon.com> <dewar.828987544@schonberg>
 <4kbuebINNrho@keats.ugrad.cs.ubc.ca> <dewar.829048603@schonberg>
organization: Computer Science, University of B.C., Vancouver, B.C., Canada
newsgroups: comp.lang.ada,comp.lang.c,comp.lang.c++,comp.edu
Date: 1996-04-09T00:00:00+00:00
List-Id: <comp.lang.ada>

In article <dewar.829048603@schonberg>, Robert Dewar <dewar@cs.nyu.edu> wrote:
>"This is so deeply entrenched in the realm of common sense that it isn't even
>worth mentioning in a standard document! Nevertheless, I have access to the
>POSIX.1 standard and will look into this."
>
>This seems complete nonsense. There are two possible semantics that ould
>be defined for read (buffer must be at least size of the read argument,
>or buffer must be at least size of data read). Both are easy to specify,
>both are easy to implement. You cannot rely on common sense (especially
>dubious reasoning about kernels and what not that are totally irrelevant
>to the semantic specification). The idea that specs are derived from

You are right. This has more to do with those unwritten rules that you
mentioned earlier (my wording, not yours).

Expecting that you only have to specify a buffer large enough to hold the
actual data that will be read, while telling the read function that the buffer
is bigger is just not reasonable.

Suppose you don't know whether you may or may not lie in specifying the buffer
size, since no documentation explicitly allows it nor prohibits it. Which way
do you make the decision? Which method is safer? Giving a buffer that is as
large as you promise it is, or giving a smaller buffer?

There is no telling that even if you know 100% that so many bytes will be read,
the rest of the buffer will not be accessed.

>implementations (either by looking at the implementation, or reasoning
>about it with "common sense" or otherwise) is completely unacceptable!

You are the one who advocates empirical approaches: in a recent posting you
said that if something works on all the platforms, it is portable regardless
whether it invokes undefined behavior.

>(though unfortunately very common, especially when people are writing in 
>a language that does not make a big deal about separating spec and
>implementation details).
>
>My only at-hand sources are K&R, which has nothing whatever to say on 
>the subject, the Zortech C++ reference, which also has nothing to say,
>(both describe read, but say nothing about the buffer length), and
>the Microsoft Runtime Reference which talks about "attempting to
>read n bytes", but is otherwise unclear.
>
>We are after all dealing with a language interface where in practice the
>proper check (whichever it is) cannot be made, because the called routine
>does not know the length of the buffer passed. I think a natural default
>assumption, in the absence of any statement to the contrary, is that the
>bytes are blindly read into the buffer, and disaster strikes if the number
>of bytes read is greater than the buffer length, but otherwise all is well.
>Unless there is a VERY clear statement of semantics to the contrary, I
>don't see how anyone can call code that makes this assumption obviously
>broken.

You are right about that, of course. You can't call the code ``obviously
broken'', but I would call the programmer imprudent.

>This is of course a rather trivial detail but is instructive with regard
>to the importance of writing precise specs. Kazimir's claim that the spec
>obviously requires that the buffer length match the requested, rather
>than actual length, based on some dubious reasoning about likely 
>implementation models is just the sort of thing that needs to be

>eliminated from programming practices. Specs need to be made precise,

But here there is a clear lack of precise specs! I'm advocating the _safer_,
more _prudent_ assumption. There is clearly more opportunity to screw up if you
falsely represent your buffer size to a system call or library function.

Even if it were OK to do so on every system, the program may later change such
that the hidden assumption is violated. Suddenly, not 68, but 113 bytes come
from the file, for some reason, and the program fails of behaves strangely.
Even all those UNIXes that check against the actual transfer size rather than
buffer size will not necessarily catch this, since the check is usually only
good to the granularity of a page.

The current maintainer, of course, doesn't know what hack had been perpetrated
and may be faced with tracing down problems that could have been avoided.

>so that a caller knows EXACTLY what the requirements are without having
>to guess, or, worse still, examine the actual implementation code.

I agree. I was dismayed when I was not able to find a definitive answer in the
POSIX.1 standard itself. These sorts of things should be specified so that the
programmers don't have to rationalize about what is likely to be safer. 

Here is my ``dubious'' reasoning laid out step by step, so criticize at will:

1.	In the C language, the size of an object has a specific meaning. If I
	malloc 100 bytes, or declare 100 bytes in static or automatic storage,
	the size of that object is not 101, not 1000, but 100 bytes.

	(Granted, the third argument of read() is not usually referred to as
	the _size_ of the object pointed at by the second argument, but as a
	_count_ of bytes to be read into the buffer. It does have a size_t type
	which is used in ANSI C to hold the sizes of objects, and is the
	return type of the sizeof operator).

2.	No documentation has ever explicitly stated that the argument may be
	greater than the actual size of the object to which a pointer can
	be given.

3.	In choosing between two alternatives, choose the safer one, all else
	being equal.
	
4.	Even if the apparently less safe alternative is actually safe, it
	depends on preconditions in the program which may change, namely
	assumptions about how many bytes are left in the particular file, pipe
	or whatever. This will could cause problems in the maintenance cycle
	of the software.
--