From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: fc89c,97188312486d4578
X-Google-Attributes: gidfc89c,public
X-Google-Thread: 103376,97188312486d4578
X-Google-Attributes: gid103376,public
X-Google-Thread: 1014db,6154de2e240de72a
X-Google-Attributes: gid1014db,public
X-Google-Thread: 109fba,baaf5f793d03d420
X-Google-Attributes: gid109fba,public
From: ok@goanna.cs.rmit.edu.au (Richard A. O'Keefe)
Subject: Re: What's the best language to start with? [was: Re: Should I learn
 C or Pascal?]
Date: 1996/08/30
Message-ID: <50650h$rek@goanna.cs.rmit.edu.au>
X-Deja-AN: 177386415
references: <31FBC584.4188@ivic.qc.ca>
 <01bb83f5$923391e0$87ee6fce@timpent.airshields.com>
 <4uah1k$b2o@solutions.solon.com>
 <01bb853b$ca4c8e00$87ee6fce@timpent.airshields.com>
 <4udb2o$7io@solutions.solon.com>
 <01bb8569$9910dca0$87ee6fce@timpent.airshields.com>
 <4urqam$r9u@goanna.cs.rmit.edu.au>
 <01bb8b84$200baa80$87ee6fce@timpent.airshields.com>
 <4vbbf6$g0a@goanna.cs.rmit.edu.au> <01bb8f18$713e0e60$32ee6fce@timhome2>
 <4vroh3$17f@goanna.cs.rmit.edu.au> <slrn523tpg.ce.mdw@excessus.demon.co.uk>
organization: Comp Sci, RMIT, Melbourne, Australia
newsgroups: comp.lang.c,comp.lang.c++,comp.unix.programmer,comp.lang.ada
nntp-posting-user: ok
Date: 1996-08-30T00:00:00+00:00
List-Id: <comp.lang.ada>


I wrote:
>> strlen() is a pure function and its argument does not change, so the
>> strlen(s) computation can be hoisted out of the loop.

mdw@excessus.demon.co.uk (Mark Wooding) writes:

>Erk!  No it isn't.  A pure function is one whose value depends only on
>its arguments.  sin() is pure.  strlen() isn't.

We are in dispute about words, not facts.
MOST annoyingly, my copy of the C standard has walked, so I can't refer
to the official rule book.

>The argument to strlen() is a pointer to a string whose length we want.

This is where I was using sloppy language.
The way I was using language, the argument *is* the "null-terminated
byte string" (to use C++ draft standard terminology), and the pointer
is merely the mechanism used to refer to it.

In the program fragment under discussion, it was not merely the pointer
that was constant, but the NTBS itself.

In what sense is strlen() pure?  In the sense that it depends only on
the NTBS variable which is the intended argument and not on the time
of day, the phase of the moon, the colour of the programmer's socks,
or any storage _other_ than the NTBS.

If I do
	n = strlen(s);
	/* much code that does not change s */
	/* NOT the NTBS that s refers to */
	m = strlen(s);
it is necessarily the case that m and n will be the same size_t value.

>It's the address, not the string itself.  Because the string can change
>between calls to strlen(), it might give different results given the
>same string address.  So the compiler can't just use its general `pure'
>function mechanism for common-subexpression-optimising strlen()

Yes it can:  in the sense in which I intended it (and I apologise for
not using the terminology in the New Testament aka ISO C standard)
the argument is NBTS(s), and detecing whether NBTS(s) has changed is
admittedly very difficult in the presence of aliasing, but it isn't
difficult *all* the time.

>Now, can a compiler do clever things and optimise strlen() all by
>itself?

The answer is unequivocally "Yes it CAN."  The standard allows it to,
and nothing in the code fragment I presented suggested that it s
referred to a global buffer.  In fact, most of my uses of strlen()
have been either to local buffers (which I have just filled from a
file) or to stored which I can be damn sure hasn't been touched
since last time.  (And I then save the result of strlen() and never
ever ask for it again.  We're talking about what makes sense, not
what I do.)

>You comment that it might spot assignments to the string.  This
>is true, but not all such assignments are visible to the compiler.  For
>automatic buffers, this /is/ true, but (in my experience) calls to
>strlen() and similar functions are comparitively rarerely used on
>locally allocated buffers.

In my experience, people spell comparatively with only one "i"
and "rarely" with only two "r"s.  Clearly our experience differs.

The question was not about what you do or what I do or what your
experience was or what my experience was but about whether repeated
calls are ever sensible.

>If the buffer is not local to the function,
[which makes an assumption not grounded in my code fragment]
>there's no guarantee that (in an extreme case) its address hasn't been
>made available to a signal handler which maliciously changes the
>string's length.

Well actually, such a guarantee is easy to come by:
 - if the program contains no calls to signal(), it can't happen.
 - if the buffer is local to a file, and its address doesn't leak
   outside, and no function in the file is passed to signal(),
   again, it can't happen.
 - a program that conforms to the C standard (we were talking about
   standard C, no?) may not legally set any variable external to the
   handler if that variable is not of type sig_atomic_t.  Now on _this_
   system, sig_atomic_t is int, so a *legal* signal handler, however
   malicious, may not change a char buffer.  (This obviously won't
   apply to systems in which sig_atomic_t is char, but it _does_
   apply to this system, and the compiler is entitled to rely on it.)

Let's take an example which _doesn't_ involve function-local buffers.
Consider

	char *dupcat(char const *a, char const *b) {
	    char *result = malloc(strlen(a) + strlen(b) + 1);
	    assert(result != 0);
	    memcpy(result, a, strlen(a));
	    memcpy(result + strlen(a), b, strlen(b));
	    memcpy(result + strlen(a) + strlen(b), "", 1);
	    return result;
	}

I repeat, this is NOT the way I would normally write it.
I contend that it is _inefficient_ but not "stupid".

If a malicious (and not strictly conforming) signal handler smashes
NBTS(a) or NBTS(b), then you have worse problems than just possible
changes to strlen().  Manually saving the numbers won't protect you
from the fact that the null-terminated-byte-strings have changed,
so that the function result (as an NBTS) is not the concatenation
of its NBTS arguments.

My understanding of the standard is that a compiler is entitled to
assume that NBTS(a) and NBTS(b) will not change (because a legal
standard C program _can't_ undergo such changes) and therefore to
optimise the calls to strlen().

If anyone wants to argue that, take it to comp.std.c and ask the
experts.

The point I am concerned to defend is solely this:
the function dupcat() may be inefficient,
but it clearly and correctly expresses the programmer's intent,
and is therefore not "stupid".

Surely nobody but an assembly hacker would dispute the point
that there are times when it is RATIONAL to write suboptimal
code (in order to devote your resources to activities with a
higher payoff)?

-- 
Australian citizen since 14 August 1996.  *Now* I can vote the xxxs out!
Richard A. O'Keefe; http://www.cs.rmit.edu.au/%7Eok; RMIT Comp.Sci.