From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: 103376,45b47ecb995e7a3
X-Google-Attributes: gid103376,public
X-Google-ArrivalTime: 2001-08-13 18:09:06 PST
Path: 
 archiver1.google.com!newsfeed.google.com!newsfeed.stanford.edu!sn-xit-01!supernews.com!newshub2.rdc1.sfba.home.com!news.home.com!news1.rdc2.on.home.com.POSTED!not-for-mail
Message-ID: <3B787A30.F806DB00@home.com>
From: "Warren W. Gay VE3WWG" <ve3wwg@home.com>
X-Mailer: Mozilla 4.75 [en] (Windows NT 5.0; U)
X-Accept-Language: en
MIME-Version: 1.0
Newsgroups: comp.lang.ada
Subject: Re: Ada Idioms Progress Preview
References: <3B6F1B2F.4FC3C833@gsde.hou.us.ray.com>
 <SCIb7.37009$Kd7.22894159@news1.rdc1.sfba.home.com>
 <5ee5b646.0108071819.6e84e33d@posting.google.com>
 <3_Xc7.45$NM5.84779@news.pacbell.net> <E8Rd7.282$D4.307@www.newsranger.com>
 <umqr8ug55d9.fsf@maestro.clustra.com> <3B783712.88029BB8@worldnet.att.net>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Date: Tue, 14 Aug 2001 01:09:06 GMT
NNTP-Posting-Host: 24.141.193.224
X-Complaints-To: abuse@home.net
X-Trace: news1.rdc2.on.home.com 997751346 24.141.193.224 (Mon,
 13 Aug 2001 18:09:06 PDT)
NNTP-Posting-Date: Mon, 13 Aug 2001 18:09:06 PDT
Organization: Excite@Home - The Leader in Broadband http://home.com/faster
Xref: archiver1.google.com comp.lang.ada:11881
Date: 2001-08-14T01:09:06+00:00
List-Id: <comp.lang.ada>

James Rogers wrote:
> Ole-Hjalmar Kristensen wrote:
> > One thing which can be said in favour of having a terminator character
> > is that it frees you from having to store the length explicitly. The
> > length of a string is usually different from the size of the array
> > used to store the string.
> > So, in a sense a C string is more self-describing than a plain Ada
> > string.
> > Of course, as soon as you call a procedure, you can use a slice, but
> > you still need the actual length to decide which slice.
> >
> > On the balance, I would rather have Ada strings.
> 
> An old saying is "there is no free lunch". In other words, nothing
> comes for free. In the case of a C string, you do not explicitly
> carry around the length of a string. Instead, you rely on a convention
> stating that the logical end of the string is indicated by a null
> character.
> 
> The C approach presents two very real costs:
> 
> 1) You must serially read the string to find the terminating null
>    character. This operation is very expensive if you only need to
>    determine the length of the string.

To be honest, it works reasonably well for C/C++ because most strings in
a program tend to be short (of course this varies by application!) It would 
be nice to have someone sample some Open Sourced packages and come 
up with an average length, but I suspect that it would
be short enough. It is true for _some_ C strings/applications, that this 
could be a significant overhead factor.

> 2) Sometimes the null character is omitted. Since C arrays are
>    unbounded, this causes your program to read beyond the end of
>    the string until it finds a null character. The resulting
>    length will be incorrect. 

I'm not really wanting to support C/C++, but we should be careful
about what is being said here.. its really only a problem if you
_need_ a nul byte at the end. There are C programs that work with
fixed sized strings, like Ada, though this tends to be rarer (it
sometimes is done with embedded SQL/C). If you then need to pass
the fixed string to a printf() or other function that expects a
"C string", then yes, this then becomes a problem (just as it does
for Ada supplying a string for C).

> When copying or editing a string
>    this problem will result in data corruption and undefined
>    behaviors.

This is again, not necessarily true, but it does happen if the 
C programmer is not careful. If instead, the user
uses strncpy() for example, where the maximum size of the destination
array is given, then this does not happen. However, if you strncpy()
the maximum # of characters, you don't get a nul byte at the end.
Novice C programmers often miss this subtle point ;-)

One way to avoid this is to use a technique with strncpy() :

#define BUF_LEN 8

void
func(const char *in_str_with_maybe_no_null) {
   char my_buf[BUF_LEN];

   strncpy(my_buf,in_str_with_maybe_no_null,BUF_LEN-1)[BUF_LEN-1] = 0;

This restricts the copy to BUF_LEN-1 characters + 1 guaranteed nul byte.
It works because strncpy() the function, returns the (char *) pointer to
my_buf, resulting in the final assignment:

   my_buf[BUF_LEN-1] = 0;

You can do this in a much less cryptic way, but I have found it useful
in C programs, and it takes up less screen real-estate this way ;-)

> Another less common cost occurs when copying C strings. The most
> efficient copy operation for C arrays is the memcpy function.
> This function allows you to copy blocks of memory efficiently.
> If you try to use memcpy to copy strings you will find some
> real problems. In those cases you want to copy the actual array
> of characters, not just the logical string contained in it.

Have you said this in reverse? Normally you don't want to copy
anything beyond the nul byte, unless you're copying fixed length
arrays of characters, without treating nul as a special marker.

> The problem is that the C sizeof operator does not report the
> correct size of arrays outside the immediate scope where they
> are declared. 

OK, you're saying when you pass arrays into a C function, when
the array is declared external to that function. Something like:

void func2(char *str) {
   // what is the array size of str?
}

void func1() {
   char my_array[31];

   func2(my_array);
}

This is a weakness, but if you know that func2() should work
with fixed length arrays of a certain size, you can use:

void func2(char str[31]) {
   // what is the array size of str?
}

instead. However, I agree that this is feeble, compared to the 
way Ada passes array bounds information.

> Instead you will only get the size of the pointer
> to the first element of the array. 

OK, it sounds like you're suggesting the following:

void my_func(char *str) {
   int slen = sizeof str;    // which does not make sense

But this is nonsense anyway - no self respecting C programmer would
do this, because you are obviously asking for the size of the pointer ;-)

However, if you declared this instead:

void my_func(char str[31]) {
   int array_len = sizeof str;  // this comes close to size of array

(on many platforms : array_len=32 here due to padding)

> Therefore, to efficiently
> copy C strings using memcpy you must provide a second "length"
> argument, which may not be readily available.
> 
> Jim Rogers
> Colorado Springs, Colorado USA

I'm not sure what you're pointing to here, but if you were to 
"efficiently" copy the string, you must have the assurance of 
a nul byte (so you can stop copying when you hit it with strcpy()) or a
specified length (for memcpy()). Or you might need both if you use
strncpy().

But if it is a "C-string", then it does have a nul byte, and is 
efficient to copy (copying stops at the nul byte - memcpy() is not
used to implement strncpy()).

I'm not wanting to defend C, but we want to be correct about the
defects when we launch criticism of C/C++.  Otherwise, Ada programmers
lose respect ;-)
-- 
Warren W. Gay VE3WWG
http://members.home.net/ve3wwg