comp.lang.ada
 help / color / mirror / Atom feed
From: James Rogers <jimmaureenrogers@worldnet.att.net>
Subject: Re: Ada Idioms Progress Preview
Date: Tue, 14 Aug 2001 06:15:20 GMT
Date: 2001-08-14T06:15:20+00:00	[thread overview]
Message-ID: <3B78C290.4DD088A8@worldnet.att.net> (raw)
In-Reply-To: 3B787A30.F806DB00@home.com

"Warren W. Gay VE3WWG" wrote:
> 
> James Rogers wrote:
> > An old saying is "there is no free lunch". In other words, nothing
> > comes for free. In the case of a C string, you do not explicitly
> > carry around the length of a string. Instead, you rely on a convention
> > stating that the logical end of the string is indicated by a null
> > character.
> >
> > The C approach presents two very real costs:
> >
> > 1) You must serially read the string to find the terminating null
> >    character. This operation is very expensive if you only need to
> >    determine the length of the string.
> 
> To be honest, it works reasonably well for C/C++ because most strings in
> a program tend to be short (of course this varies by application!) It would
> be nice to have someone sample some Open Sourced packages and come
> up with an average length, but I suspect that it would
> be short enough. It is true for _some_ C strings/applications, that this
> could be a significant overhead factor.

I would not like to make a claim about the "average" size of strings
in C applications. I suspect the size varies quite a bit.

> > 2) Sometimes the null character is omitted. Since C arrays are
> >    unbounded, this causes your program to read beyond the end of
> >    the string until it finds a null character. The resulting
> >    length will be incorrect.
> 
> I'm not really wanting to support C/C++, but we should be careful
> about what is being said here.. its really only a problem if you
> _need_ a nul byte at the end. There are C programs that work with
> fixed sized strings, like Ada, though this tends to be rarer (it
> sometimes is done with embedded SQL/C). If you then need to pass
> the fixed string to a printf() or other function that expects a
> "C string", then yes, this then becomes a problem (just as it does
> for Ada supplying a string for C).

The definition of a C string is a null terminated array of characters.
C functions that do not require the null termination do not
actually use strings. They merely use arrays of characters. This may
seem like a subtle point, but it is critical. Functions expecting
a C string for an argument absolutely rely on the existence of the
nul byte at the end of the logical string.

> > When copying or editing a string
> >    this problem will result in data corruption and undefined
> >    behaviors.
> 
> This is again, not necessarily true, but it does happen if the
> C programmer is not careful. If instead, the user
> uses strncpy() for example, where the maximum size of the destination
> array is given, then this does not happen. However, if you strncpy()
> the maximum # of characters, you don't get a nul byte at the end.
> Novice C programmers often miss this subtle point ;-)

Absolutely true. This problem only occurs when the C programmer
omits the terminating nul byte. Unfortunately this omission is easy
to achieve.

> One way to avoid this is to use a technique with strncpy() :
> 
> #define BUF_LEN 8
> 
> void
> func(const char *in_str_with_maybe_no_null) {
>    char my_buf[BUF_LEN];
> 
>    strncpy(my_buf,in_str_with_maybe_no_null,BUF_LEN-1)[BUF_LEN-1] = 0;
> 
> This restricts the copy to BUF_LEN-1 characters + 1 guaranteed nul byte.
> It works because strncpy() the function, returns the (char *) pointer to
> my_buf, resulting in the final assignment:
> 
>    my_buf[BUF_LEN-1] = 0;
> 
> You can do this in a much less cryptic way, but I have found it useful
> in C programs, and it takes up less screen real-estate this way ;-)

And this approach assumes that copying only BUF_LEN characters will
result is valid data. Sometimes it will. Sometimes BUF_LEN may be
too small, resulting in a truncated string.

> 
> > Another less common cost occurs when copying C strings. The most
> > efficient copy operation for C arrays is the memcpy function.
> > This function allows you to copy blocks of memory efficiently.
> > If you try to use memcpy to copy strings you will find some
> > real problems. In those cases you want to copy the actual array
> > of characters, not just the logical string contained in it.
> 
> Have you said this in reverse? Normally you don't want to copy
> anything beyond the nul byte, unless you're copying fixed length
> arrays of characters, without treating nul as a special marker.

No, I am saying that strncpy() is less efficient than memcpy.
It is possible to make an exact copy of a string using memcpy.
In fact the entire character array will be copied, not just the
data up to the null.

> > The problem is that the C sizeof operator does not report the
> > correct size of arrays outside the immediate scope where they
> > are declared.
> 
> OK, you're saying when you pass arrays into a C function, when
> the array is declared external to that function. Something like:
> 
> void func2(char *str) {
>    // what is the array size of str?
> }
> 
> void func1() {
>    char my_array[31];
> 
>    func2(my_array);
> }
> 
> This is a weakness, but if you know that func2() should work
> with fixed length arrays of a certain size, you can use:
> 
> void func2(char str[31]) {
>    // what is the array size of str?
> }
> 
> instead. However, I agree that this is feeble, compared to the
> way Ada passes array bounds information.
> 
> > Instead you will only get the size of the pointer
> > to the first element of the array.
> 
> OK, it sounds like you're suggesting the following:
> 
> void my_func(char *str) {
>    int slen = sizeof str;    // which does not make sense

No, I am suggesting:

void my_func( char str[])
   int slen = sizeof str; // which makes sense within the declaration
                          // scope of the actual parameter.

> However, if you declared this instead:
> 
> void my_func(char str[31]) {
>    int array_len = sizeof str;  // this comes close to size of array
> 
> (on many platforms : array_len=32 here due to padding)

Nope, not what I wanted at all. See, this works because sizeof is used
within the same scope as the declaration of str.

> 
> > Therefore, to efficiently
> > copy C strings using memcpy you must provide a second "length"
> > argument, which may not be readily available.
> >
> > Jim Rogers
> > Colorado Springs, Colorado USA
> 
> I'm not sure what you're pointing to here, but if you were to
> "efficiently" copy the string, you must have the assurance of
> a nul byte (so you can stop copying when you hit it with strcpy()) or a
> specified length (for memcpy()). Or you might need both if you use
> strncpy().

Correct. The point is that C *requires* the nul terminator for a
string but cannot enforce the requirement. Enforcement is left up
to the programmer. C functions cannot determine the size of an
arbitrary length array passed to them without also passing an
additional parameter.

> 
> But if it is a "C-string", then it does have a nul byte, and is
> efficient to copy (copying stops at the nul byte - memcpy() is not
> used to implement strncpy()).
> 
> I'm not wanting to defend C, but we want to be correct about the
> defects when we launch criticism of C/C++.  Otherwise, Ada programmers
> lose respect ;-)

I agree. The problem really revolves around the C attitude that the
programmer must provide all the discipline necessary to produce a
correct program. The language makes no distinction between liberty
and capability. My general rule for C is that almost anything is
allowed, but MANY things are not advised. 

All this aside, I believe I have demonstrated that C strings are not
necessarily more efficient or more self-describing than Ada strings.
This is particularly true when your solutions to the problems I 
pose are to create fixed length C strings, roughly paralleling the
structure of Ada strings. At the same time the C string still needs
the extra nul byte at the end to meet the definition of a string.

Jim Rogers
Colorado Springs, Colorado USA



  reply	other threads:[~2001-08-14  6:15 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-08-03  4:16 Ada Idioms Progress Preview James Rogers
2001-08-03 19:45 ` Robert Dewar
2001-08-03 22:02   ` James Rogers
2001-08-06 22:33   ` Stanley R. Allen
2001-08-07  2:45     ` tmoran
2001-08-07 12:15       ` Larry Kilgallen
2001-08-07 13:26         ` Philip Anderson
2001-08-08  2:23         ` Robert Dewar
2001-08-08  5:58           ` Ehud Lamm
2001-08-08  2:19       ` Robert Dewar
2001-08-08 15:13         ` Ted Dennison
2001-08-08 18:03           ` tmoran
2001-08-09 20:36           ` Florian Weimer
2001-08-10 21:02         ` Jay Nabonne
2001-08-10 21:51           ` Larry Kilgallen
2001-08-13 14:19             ` Ted Dennison
2001-08-13 14:05           ` Ted Dennison
2001-08-13 14:19             ` Marin David Condic
2001-08-13 15:47             ` Ole-Hjalmar Kristensen
2001-08-13 16:22               ` Marin David Condic
2001-08-13 18:48               ` Larry Kilgallen
2001-08-14  7:05                 ` Ole-Hjalmar Kristensen
2001-08-13 20:20               ` James Rogers
2001-08-14  1:09                 ` Warren W. Gay VE3WWG
2001-08-14  6:15                   ` James Rogers [this message]
2001-08-14 14:03                     ` Warren W. Gay VE3WWG
2001-08-21  5:54                   ` C strings, was " David Thompson
2001-08-16 18:42                 ` Jay Nabonne
2001-08-17  1:25                   ` Robert Dewar
2001-08-13 21:47               ` Ted Dennison
2001-08-14  7:37                 ` Ole-Hjalmar Kristensen
2001-08-14 14:59                   ` Ted Dennison
2001-08-14 13:22                 ` Marin David Condic
2001-08-14 15:12                   ` Ted Dennison
2001-08-14 15:33                     ` Marin David Condic
2001-08-14  8:49               ` Lutz Donnerhacke
2001-08-14  9:38                 ` Ole-Hjalmar Kristensen
2001-08-14  9:54                   ` Lutz Donnerhacke
2001-08-14 14:51                     ` James Rogers
2001-08-14 16:44                   ` Darren New
2001-08-14  1:39             ` Slicing ( Ada Idioms Progress Preview ) Warren W. Gay VE3WWG
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox