From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: 103376,45b47ecb995e7a3
X-Google-Attributes: gid103376,public
X-Google-ArrivalTime: 2001-08-13 23:15:21 PST
Path: 
 archiver1.google.com!newsfeed.google.com!newsfeed.stanford.edu!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!feed2.news.rcn.net!rcn!howland.erols.net!news-out.worldnet.att.net.MISMATCH!wn3feed!worldnet.att.net!135.173.83.71!wnfilter1!worldnet-localpost!bgtnsc06-news.ops.worldnet.att.net.POSTED!not-for-mail
Message-ID: <3B78C290.4DD088A8@worldnet.att.net>
From: James Rogers <jimmaureenrogers@worldnet.att.net>
X-Mailer: Mozilla 4.76 [en] (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
Newsgroups: comp.lang.ada
Subject: Re: Ada Idioms Progress Preview
References: <3B6F1B2F.4FC3C833@gsde.hou.us.ray.com>
 <SCIb7.37009$Kd7.22894159@news1.rdc1.sfba.home.com>
 <5ee5b646.0108071819.6e84e33d@posting.google.com>
 <3_Xc7.45$NM5.84779@news.pacbell.net> <E8Rd7.282$D4.307@www.newsranger.com>
 <umqr8ug55d9.fsf@maestro.clustra.com> <3B783712.88029BB8@worldnet.att.net>
 <3B787A30.F806DB00@home.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Date: Tue, 14 Aug 2001 06:15:20 GMT
NNTP-Posting-Host: 12.86.33.35
X-Complaints-To: abuse@worldnet.att.net
X-Trace: bgtnsc06-news.ops.worldnet.att.net 997769720 12.86.33.35 (Tue,
 14 Aug 2001 06:15:20 GMT)
NNTP-Posting-Date: Tue, 14 Aug 2001 06:15:20 GMT
Organization: AT&T Worldnet
Xref: archiver1.google.com comp.lang.ada:11889
Date: 2001-08-14T06:15:20+00:00
List-Id: <comp.lang.ada>

"Warren W. Gay VE3WWG" wrote:
> 
> James Rogers wrote:
> > An old saying is "there is no free lunch". In other words, nothing
> > comes for free. In the case of a C string, you do not explicitly
> > carry around the length of a string. Instead, you rely on a convention
> > stating that the logical end of the string is indicated by a null
> > character.
> >
> > The C approach presents two very real costs:
> >
> > 1) You must serially read the string to find the terminating null
> >    character. This operation is very expensive if you only need to
> >    determine the length of the string.
> 
> To be honest, it works reasonably well for C/C++ because most strings in
> a program tend to be short (of course this varies by application!) It would
> be nice to have someone sample some Open Sourced packages and come
> up with an average length, but I suspect that it would
> be short enough. It is true for _some_ C strings/applications, that this
> could be a significant overhead factor.

I would not like to make a claim about the "average" size of strings
in C applications. I suspect the size varies quite a bit.

> > 2) Sometimes the null character is omitted. Since C arrays are
> >    unbounded, this causes your program to read beyond the end of
> >    the string until it finds a null character. The resulting
> >    length will be incorrect.
> 
> I'm not really wanting to support C/C++, but we should be careful
> about what is being said here.. its really only a problem if you
> _need_ a nul byte at the end. There are C programs that work with
> fixed sized strings, like Ada, though this tends to be rarer (it
> sometimes is done with embedded SQL/C). If you then need to pass
> the fixed string to a printf() or other function that expects a
> "C string", then yes, this then becomes a problem (just as it does
> for Ada supplying a string for C).

The definition of a C string is a null terminated array of characters.
C functions that do not require the null termination do not
actually use strings. They merely use arrays of characters. This may
seem like a subtle point, but it is critical. Functions expecting
a C string for an argument absolutely rely on the existence of the
nul byte at the end of the logical string.

> > When copying or editing a string
> >    this problem will result in data corruption and undefined
> >    behaviors.
> 
> This is again, not necessarily true, but it does happen if the
> C programmer is not careful. If instead, the user
> uses strncpy() for example, where the maximum size of the destination
> array is given, then this does not happen. However, if you strncpy()
> the maximum # of characters, you don't get a nul byte at the end.
> Novice C programmers often miss this subtle point ;-)

Absolutely true. This problem only occurs when the C programmer
omits the terminating nul byte. Unfortunately this omission is easy
to achieve.

> One way to avoid this is to use a technique with strncpy() :
> 
> #define BUF_LEN 8
> 
> void
> func(const char *in_str_with_maybe_no_null) {
>    char my_buf[BUF_LEN];
> 
>    strncpy(my_buf,in_str_with_maybe_no_null,BUF_LEN-1)[BUF_LEN-1] = 0;
> 
> This restricts the copy to BUF_LEN-1 characters + 1 guaranteed nul byte.
> It works because strncpy() the function, returns the (char *) pointer to
> my_buf, resulting in the final assignment:
> 
>    my_buf[BUF_LEN-1] = 0;
> 
> You can do this in a much less cryptic way, but I have found it useful
> in C programs, and it takes up less screen real-estate this way ;-)

And this approach assumes that copying only BUF_LEN characters will
result is valid data. Sometimes it will. Sometimes BUF_LEN may be
too small, resulting in a truncated string.

> 
> > Another less common cost occurs when copying C strings. The most
> > efficient copy operation for C arrays is the memcpy function.
> > This function allows you to copy blocks of memory efficiently.
> > If you try to use memcpy to copy strings you will find some
> > real problems. In those cases you want to copy the actual array
> > of characters, not just the logical string contained in it.
> 
> Have you said this in reverse? Normally you don't want to copy
> anything beyond the nul byte, unless you're copying fixed length
> arrays of characters, without treating nul as a special marker.

No, I am saying that strncpy() is less efficient than memcpy.
It is possible to make an exact copy of a string using memcpy.
In fact the entire character array will be copied, not just the
data up to the null.

> > The problem is that the C sizeof operator does not report the
> > correct size of arrays outside the immediate scope where they
> > are declared.
> 
> OK, you're saying when you pass arrays into a C function, when
> the array is declared external to that function. Something like:
> 
> void func2(char *str) {
>    // what is the array size of str?
> }
> 
> void func1() {
>    char my_array[31];
> 
>    func2(my_array);
> }
> 
> This is a weakness, but if you know that func2() should work
> with fixed length arrays of a certain size, you can use:
> 
> void func2(char str[31]) {
>    // what is the array size of str?
> }
> 
> instead. However, I agree that this is feeble, compared to the
> way Ada passes array bounds information.
> 
> > Instead you will only get the size of the pointer
> > to the first element of the array.
> 
> OK, it sounds like you're suggesting the following:
> 
> void my_func(char *str) {
>    int slen = sizeof str;    // which does not make sense

No, I am suggesting:

void my_func( char str[])
   int slen = sizeof str; // which makes sense within the declaration
                          // scope of the actual parameter.

> However, if you declared this instead:
> 
> void my_func(char str[31]) {
>    int array_len = sizeof str;  // this comes close to size of array
> 
> (on many platforms : array_len=32 here due to padding)

Nope, not what I wanted at all. See, this works because sizeof is used
within the same scope as the declaration of str.

> 
> > Therefore, to efficiently
> > copy C strings using memcpy you must provide a second "length"
> > argument, which may not be readily available.
> >
> > Jim Rogers
> > Colorado Springs, Colorado USA
> 
> I'm not sure what you're pointing to here, but if you were to
> "efficiently" copy the string, you must have the assurance of
> a nul byte (so you can stop copying when you hit it with strcpy()) or a
> specified length (for memcpy()). Or you might need both if you use
> strncpy().

Correct. The point is that C *requires* the nul terminator for a
string but cannot enforce the requirement. Enforcement is left up
to the programmer. C functions cannot determine the size of an
arbitrary length array passed to them without also passing an
additional parameter.

> 
> But if it is a "C-string", then it does have a nul byte, and is
> efficient to copy (copying stops at the nul byte - memcpy() is not
> used to implement strncpy()).
> 
> I'm not wanting to defend C, but we want to be correct about the
> defects when we launch criticism of C/C++.  Otherwise, Ada programmers
> lose respect ;-)

I agree. The problem really revolves around the C attitude that the
programmer must provide all the discipline necessary to produce a
correct program. The language makes no distinction between liberty
and capability. My general rule for C is that almost anything is
allowed, but MANY things are not advised. 

All this aside, I believe I have demonstrated that C strings are not
necessarily more efficient or more self-describing than Ada strings.
This is particularly true when your solutions to the problems I 
pose are to create fixed length C strings, roughly paralleling the
structure of Ada strings. At the same time the C string still needs
the extra nul byte at the end to meet the definition of a string.

Jim Rogers
Colorado Springs, Colorado USA