From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: fc89c,97188312486d4578 X-Google-Attributes: gidfc89c,public X-Google-Thread: 103376,97188312486d4578 X-Google-Attributes: gid103376,public X-Google-Thread: 1014db,6154de2e240de72a X-Google-Attributes: gid1014db,public X-Google-Thread: 109fba,baaf5f793d03d420 X-Google-Attributes: gid109fba,public From: ok@goanna.cs.rmit.edu.au (Richard A. O'Keefe) Subject: Re: What's the best language to start with? [was: Re: Should I learn C or Pascal?] Date: 1996/08/30 Message-ID: <50650h$rek@goanna.cs.rmit.edu.au> X-Deja-AN: 177386415 references: <31FBC584.4188@ivic.qc.ca> <01bb83f5$923391e0$87ee6fce@timpent.airshields.com> <4uah1k$b2o@solutions.solon.com> <01bb853b$ca4c8e00$87ee6fce@timpent.airshields.com> <4udb2o$7io@solutions.solon.com> <01bb8569$9910dca0$87ee6fce@timpent.airshields.com> <4urqam$r9u@goanna.cs.rmit.edu.au> <01bb8b84$200baa80$87ee6fce@timpent.airshields.com> <4vbbf6$g0a@goanna.cs.rmit.edu.au> <01bb8f18$713e0e60$32ee6fce@timhome2> <4vroh3$17f@goanna.cs.rmit.edu.au> organization: Comp Sci, RMIT, Melbourne, Australia newsgroups: comp.lang.c,comp.lang.c++,comp.unix.programmer,comp.lang.ada nntp-posting-user: ok Date: 1996-08-30T00:00:00+00:00 List-Id: I wrote: >> strlen() is a pure function and its argument does not change, so the >> strlen(s) computation can be hoisted out of the loop. mdw@excessus.demon.co.uk (Mark Wooding) writes: >Erk! No it isn't. A pure function is one whose value depends only on >its arguments. sin() is pure. strlen() isn't. We are in dispute about words, not facts. MOST annoyingly, my copy of the C standard has walked, so I can't refer to the official rule book. >The argument to strlen() is a pointer to a string whose length we want. This is where I was using sloppy language. The way I was using language, the argument *is* the "null-terminated byte string" (to use C++ draft standard terminology), and the pointer is merely the mechanism used to refer to it. In the program fragment under discussion, it was not merely the pointer that was constant, but the NTBS itself. In what sense is strlen() pure? In the sense that it depends only on the NTBS variable which is the intended argument and not on the time of day, the phase of the moon, the colour of the programmer's socks, or any storage _other_ than the NTBS. If I do n = strlen(s); /* much code that does not change s */ /* NOT the NTBS that s refers to */ m = strlen(s); it is necessarily the case that m and n will be the same size_t value. >It's the address, not the string itself. Because the string can change >between calls to strlen(), it might give different results given the >same string address. So the compiler can't just use its general `pure' >function mechanism for common-subexpression-optimising strlen() Yes it can: in the sense in which I intended it (and I apologise for not using the terminology in the New Testament aka ISO C standard) the argument is NBTS(s), and detecing whether NBTS(s) has changed is admittedly very difficult in the presence of aliasing, but it isn't difficult *all* the time. >Now, can a compiler do clever things and optimise strlen() all by >itself? The answer is unequivocally "Yes it CAN." The standard allows it to, and nothing in the code fragment I presented suggested that it s referred to a global buffer. In fact, most of my uses of strlen() have been either to local buffers (which I have just filled from a file) or to stored which I can be damn sure hasn't been touched since last time. (And I then save the result of strlen() and never ever ask for it again. We're talking about what makes sense, not what I do.) >You comment that it might spot assignments to the string. This >is true, but not all such assignments are visible to the compiler. For >automatic buffers, this /is/ true, but (in my experience) calls to >strlen() and similar functions are comparitively rarerely used on >locally allocated buffers. In my experience, people spell comparatively with only one "i" and "rarely" with only two "r"s. Clearly our experience differs. The question was not about what you do or what I do or what your experience was or what my experience was but about whether repeated calls are ever sensible. >If the buffer is not local to the function, [which makes an assumption not grounded in my code fragment] >there's no guarantee that (in an extreme case) its address hasn't been >made available to a signal handler which maliciously changes the >string's length. Well actually, such a guarantee is easy to come by: - if the program contains no calls to signal(), it can't happen. - if the buffer is local to a file, and its address doesn't leak outside, and no function in the file is passed to signal(), again, it can't happen. - a program that conforms to the C standard (we were talking about standard C, no?) may not legally set any variable external to the handler if that variable is not of type sig_atomic_t. Now on _this_ system, sig_atomic_t is int, so a *legal* signal handler, however malicious, may not change a char buffer. (This obviously won't apply to systems in which sig_atomic_t is char, but it _does_ apply to this system, and the compiler is entitled to rely on it.) Let's take an example which _doesn't_ involve function-local buffers. Consider char *dupcat(char const *a, char const *b) { char *result = malloc(strlen(a) + strlen(b) + 1); assert(result != 0); memcpy(result, a, strlen(a)); memcpy(result + strlen(a), b, strlen(b)); memcpy(result + strlen(a) + strlen(b), "", 1); return result; } I repeat, this is NOT the way I would normally write it. I contend that it is _inefficient_ but not "stupid". If a malicious (and not strictly conforming) signal handler smashes NBTS(a) or NBTS(b), then you have worse problems than just possible changes to strlen(). Manually saving the numbers won't protect you from the fact that the null-terminated-byte-strings have changed, so that the function result (as an NBTS) is not the concatenation of its NBTS arguments. My understanding of the standard is that a compiler is entitled to assume that NBTS(a) and NBTS(b) will not change (because a legal standard C program _can't_ undergo such changes) and therefore to optimise the calls to strlen(). If anyone wants to argue that, take it to comp.std.c and ask the experts. The point I am concerned to defend is solely this: the function dupcat() may be inefficient, but it clearly and correctly expresses the programmer's intent, and is therefore not "stupid". Surely nobody but an assembly hacker would dispute the point that there are times when it is RATIONAL to write suboptimal code (in order to devote your resources to activities with a higher payoff)? -- Australian citizen since 14 August 1996. *Now* I can vote the xxxs out! Richard A. O'Keefe; http://www.cs.rmit.edu.au/%7Eok; RMIT Comp.Sci.