From: James Rogers <jimmaureenrogers@worldnet.att.net>
Subject: Re: Ada Idioms Progress Preview
Date: Tue, 14 Aug 2001 06:15:20 GMT
Date: 2001-08-14T06:15:20+00:00 [thread overview]
Message-ID: <3B78C290.4DD088A8@worldnet.att.net> (raw)
In-Reply-To: 3B787A30.F806DB00@home.com
"Warren W. Gay VE3WWG" wrote:
>
> James Rogers wrote:
> > An old saying is "there is no free lunch". In other words, nothing
> > comes for free. In the case of a C string, you do not explicitly
> > carry around the length of a string. Instead, you rely on a convention
> > stating that the logical end of the string is indicated by a null
> > character.
> >
> > The C approach presents two very real costs:
> >
> > 1) You must serially read the string to find the terminating null
> > character. This operation is very expensive if you only need to
> > determine the length of the string.
>
> To be honest, it works reasonably well for C/C++ because most strings in
> a program tend to be short (of course this varies by application!) It would
> be nice to have someone sample some Open Sourced packages and come
> up with an average length, but I suspect that it would
> be short enough. It is true for _some_ C strings/applications, that this
> could be a significant overhead factor.
I would not like to make a claim about the "average" size of strings
in C applications. I suspect the size varies quite a bit.
> > 2) Sometimes the null character is omitted. Since C arrays are
> > unbounded, this causes your program to read beyond the end of
> > the string until it finds a null character. The resulting
> > length will be incorrect.
>
> I'm not really wanting to support C/C++, but we should be careful
> about what is being said here.. its really only a problem if you
> _need_ a nul byte at the end. There are C programs that work with
> fixed sized strings, like Ada, though this tends to be rarer (it
> sometimes is done with embedded SQL/C). If you then need to pass
> the fixed string to a printf() or other function that expects a
> "C string", then yes, this then becomes a problem (just as it does
> for Ada supplying a string for C).
The definition of a C string is a null terminated array of characters.
C functions that do not require the null termination do not
actually use strings. They merely use arrays of characters. This may
seem like a subtle point, but it is critical. Functions expecting
a C string for an argument absolutely rely on the existence of the
nul byte at the end of the logical string.
> > When copying or editing a string
> > this problem will result in data corruption and undefined
> > behaviors.
>
> This is again, not necessarily true, but it does happen if the
> C programmer is not careful. If instead, the user
> uses strncpy() for example, where the maximum size of the destination
> array is given, then this does not happen. However, if you strncpy()
> the maximum # of characters, you don't get a nul byte at the end.
> Novice C programmers often miss this subtle point ;-)
Absolutely true. This problem only occurs when the C programmer
omits the terminating nul byte. Unfortunately this omission is easy
to achieve.
> One way to avoid this is to use a technique with strncpy() :
>
> #define BUF_LEN 8
>
> void
> func(const char *in_str_with_maybe_no_null) {
> char my_buf[BUF_LEN];
>
> strncpy(my_buf,in_str_with_maybe_no_null,BUF_LEN-1)[BUF_LEN-1] = 0;
>
> This restricts the copy to BUF_LEN-1 characters + 1 guaranteed nul byte.
> It works because strncpy() the function, returns the (char *) pointer to
> my_buf, resulting in the final assignment:
>
> my_buf[BUF_LEN-1] = 0;
>
> You can do this in a much less cryptic way, but I have found it useful
> in C programs, and it takes up less screen real-estate this way ;-)
And this approach assumes that copying only BUF_LEN characters will
result is valid data. Sometimes it will. Sometimes BUF_LEN may be
too small, resulting in a truncated string.
>
> > Another less common cost occurs when copying C strings. The most
> > efficient copy operation for C arrays is the memcpy function.
> > This function allows you to copy blocks of memory efficiently.
> > If you try to use memcpy to copy strings you will find some
> > real problems. In those cases you want to copy the actual array
> > of characters, not just the logical string contained in it.
>
> Have you said this in reverse? Normally you don't want to copy
> anything beyond the nul byte, unless you're copying fixed length
> arrays of characters, without treating nul as a special marker.
No, I am saying that strncpy() is less efficient than memcpy.
It is possible to make an exact copy of a string using memcpy.
In fact the entire character array will be copied, not just the
data up to the null.
> > The problem is that the C sizeof operator does not report the
> > correct size of arrays outside the immediate scope where they
> > are declared.
>
> OK, you're saying when you pass arrays into a C function, when
> the array is declared external to that function. Something like:
>
> void func2(char *str) {
> // what is the array size of str?
> }
>
> void func1() {
> char my_array[31];
>
> func2(my_array);
> }
>
> This is a weakness, but if you know that func2() should work
> with fixed length arrays of a certain size, you can use:
>
> void func2(char str[31]) {
> // what is the array size of str?
> }
>
> instead. However, I agree that this is feeble, compared to the
> way Ada passes array bounds information.
>
> > Instead you will only get the size of the pointer
> > to the first element of the array.
>
> OK, it sounds like you're suggesting the following:
>
> void my_func(char *str) {
> int slen = sizeof str; // which does not make sense
No, I am suggesting:
void my_func( char str[])
int slen = sizeof str; // which makes sense within the declaration
// scope of the actual parameter.
> However, if you declared this instead:
>
> void my_func(char str[31]) {
> int array_len = sizeof str; // this comes close to size of array
>
> (on many platforms : array_len=32 here due to padding)
Nope, not what I wanted at all. See, this works because sizeof is used
within the same scope as the declaration of str.
>
> > Therefore, to efficiently
> > copy C strings using memcpy you must provide a second "length"
> > argument, which may not be readily available.
> >
> > Jim Rogers
> > Colorado Springs, Colorado USA
>
> I'm not sure what you're pointing to here, but if you were to
> "efficiently" copy the string, you must have the assurance of
> a nul byte (so you can stop copying when you hit it with strcpy()) or a
> specified length (for memcpy()). Or you might need both if you use
> strncpy().
Correct. The point is that C *requires* the nul terminator for a
string but cannot enforce the requirement. Enforcement is left up
to the programmer. C functions cannot determine the size of an
arbitrary length array passed to them without also passing an
additional parameter.
>
> But if it is a "C-string", then it does have a nul byte, and is
> efficient to copy (copying stops at the nul byte - memcpy() is not
> used to implement strncpy()).
>
> I'm not wanting to defend C, but we want to be correct about the
> defects when we launch criticism of C/C++. Otherwise, Ada programmers
> lose respect ;-)
I agree. The problem really revolves around the C attitude that the
programmer must provide all the discipline necessary to produce a
correct program. The language makes no distinction between liberty
and capability. My general rule for C is that almost anything is
allowed, but MANY things are not advised.
All this aside, I believe I have demonstrated that C strings are not
necessarily more efficient or more self-describing than Ada strings.
This is particularly true when your solutions to the problems I
pose are to create fixed length C strings, roughly paralleling the
structure of Ada strings. At the same time the C string still needs
the extra nul byte at the end to meet the definition of a string.
Jim Rogers
Colorado Springs, Colorado USA
next prev parent reply other threads:[~2001-08-14 6:15 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2001-08-03 4:16 Ada Idioms Progress Preview James Rogers
2001-08-03 19:45 ` Robert Dewar
2001-08-03 22:02 ` James Rogers
2001-08-06 22:33 ` Stanley R. Allen
2001-08-07 2:45 ` tmoran
2001-08-07 12:15 ` Larry Kilgallen
2001-08-07 13:26 ` Philip Anderson
2001-08-08 2:23 ` Robert Dewar
2001-08-08 5:58 ` Ehud Lamm
2001-08-08 2:19 ` Robert Dewar
2001-08-08 15:13 ` Ted Dennison
2001-08-08 18:03 ` tmoran
2001-08-09 20:36 ` Florian Weimer
2001-08-10 21:02 ` Jay Nabonne
2001-08-10 21:51 ` Larry Kilgallen
2001-08-13 14:19 ` Ted Dennison
2001-08-13 14:05 ` Ted Dennison
2001-08-13 14:19 ` Marin David Condic
2001-08-13 15:47 ` Ole-Hjalmar Kristensen
2001-08-13 16:22 ` Marin David Condic
2001-08-13 18:48 ` Larry Kilgallen
2001-08-14 7:05 ` Ole-Hjalmar Kristensen
2001-08-13 20:20 ` James Rogers
2001-08-14 1:09 ` Warren W. Gay VE3WWG
2001-08-14 6:15 ` James Rogers [this message]
2001-08-14 14:03 ` Warren W. Gay VE3WWG
2001-08-21 5:54 ` C strings, was " David Thompson
2001-08-16 18:42 ` Jay Nabonne
2001-08-17 1:25 ` Robert Dewar
2001-08-13 21:47 ` Ted Dennison
2001-08-14 7:37 ` Ole-Hjalmar Kristensen
2001-08-14 14:59 ` Ted Dennison
2001-08-14 13:22 ` Marin David Condic
2001-08-14 15:12 ` Ted Dennison
2001-08-14 15:33 ` Marin David Condic
2001-08-14 8:49 ` Lutz Donnerhacke
2001-08-14 9:38 ` Ole-Hjalmar Kristensen
2001-08-14 9:54 ` Lutz Donnerhacke
2001-08-14 14:51 ` James Rogers
2001-08-14 16:44 ` Darren New
2001-08-14 1:39 ` Slicing ( Ada Idioms Progress Preview ) Warren W. Gay VE3WWG
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox