From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,45b47ecb995e7a3 X-Google-Attributes: gid103376,public X-Google-ArrivalTime: 2001-08-13 23:15:21 PST Path: archiver1.google.com!newsfeed.google.com!newsfeed.stanford.edu!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!feed2.news.rcn.net!rcn!howland.erols.net!news-out.worldnet.att.net.MISMATCH!wn3feed!worldnet.att.net!135.173.83.71!wnfilter1!worldnet-localpost!bgtnsc06-news.ops.worldnet.att.net.POSTED!not-for-mail Message-ID: <3B78C290.4DD088A8@worldnet.att.net> From: James Rogers X-Mailer: Mozilla 4.76 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 Newsgroups: comp.lang.ada Subject: Re: Ada Idioms Progress Preview References: <3B6F1B2F.4FC3C833@gsde.hou.us.ray.com> <5ee5b646.0108071819.6e84e33d@posting.google.com> <3_Xc7.45$NM5.84779@news.pacbell.net> <3B783712.88029BB8@worldnet.att.net> <3B787A30.F806DB00@home.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Date: Tue, 14 Aug 2001 06:15:20 GMT NNTP-Posting-Host: 12.86.33.35 X-Complaints-To: abuse@worldnet.att.net X-Trace: bgtnsc06-news.ops.worldnet.att.net 997769720 12.86.33.35 (Tue, 14 Aug 2001 06:15:20 GMT) NNTP-Posting-Date: Tue, 14 Aug 2001 06:15:20 GMT Organization: AT&T Worldnet Xref: archiver1.google.com comp.lang.ada:11889 Date: 2001-08-14T06:15:20+00:00 List-Id: "Warren W. Gay VE3WWG" wrote: > > James Rogers wrote: > > An old saying is "there is no free lunch". In other words, nothing > > comes for free. In the case of a C string, you do not explicitly > > carry around the length of a string. Instead, you rely on a convention > > stating that the logical end of the string is indicated by a null > > character. > > > > The C approach presents two very real costs: > > > > 1) You must serially read the string to find the terminating null > > character. This operation is very expensive if you only need to > > determine the length of the string. > > To be honest, it works reasonably well for C/C++ because most strings in > a program tend to be short (of course this varies by application!) It would > be nice to have someone sample some Open Sourced packages and come > up with an average length, but I suspect that it would > be short enough. It is true for _some_ C strings/applications, that this > could be a significant overhead factor. I would not like to make a claim about the "average" size of strings in C applications. I suspect the size varies quite a bit. > > 2) Sometimes the null character is omitted. Since C arrays are > > unbounded, this causes your program to read beyond the end of > > the string until it finds a null character. The resulting > > length will be incorrect. > > I'm not really wanting to support C/C++, but we should be careful > about what is being said here.. its really only a problem if you > _need_ a nul byte at the end. There are C programs that work with > fixed sized strings, like Ada, though this tends to be rarer (it > sometimes is done with embedded SQL/C). If you then need to pass > the fixed string to a printf() or other function that expects a > "C string", then yes, this then becomes a problem (just as it does > for Ada supplying a string for C). The definition of a C string is a null terminated array of characters. C functions that do not require the null termination do not actually use strings. They merely use arrays of characters. This may seem like a subtle point, but it is critical. Functions expecting a C string for an argument absolutely rely on the existence of the nul byte at the end of the logical string. > > When copying or editing a string > > this problem will result in data corruption and undefined > > behaviors. > > This is again, not necessarily true, but it does happen if the > C programmer is not careful. If instead, the user > uses strncpy() for example, where the maximum size of the destination > array is given, then this does not happen. However, if you strncpy() > the maximum # of characters, you don't get a nul byte at the end. > Novice C programmers often miss this subtle point ;-) Absolutely true. This problem only occurs when the C programmer omits the terminating nul byte. Unfortunately this omission is easy to achieve. > One way to avoid this is to use a technique with strncpy() : > > #define BUF_LEN 8 > > void > func(const char *in_str_with_maybe_no_null) { > char my_buf[BUF_LEN]; > > strncpy(my_buf,in_str_with_maybe_no_null,BUF_LEN-1)[BUF_LEN-1] = 0; > > This restricts the copy to BUF_LEN-1 characters + 1 guaranteed nul byte. > It works because strncpy() the function, returns the (char *) pointer to > my_buf, resulting in the final assignment: > > my_buf[BUF_LEN-1] = 0; > > You can do this in a much less cryptic way, but I have found it useful > in C programs, and it takes up less screen real-estate this way ;-) And this approach assumes that copying only BUF_LEN characters will result is valid data. Sometimes it will. Sometimes BUF_LEN may be too small, resulting in a truncated string. > > > Another less common cost occurs when copying C strings. The most > > efficient copy operation for C arrays is the memcpy function. > > This function allows you to copy blocks of memory efficiently. > > If you try to use memcpy to copy strings you will find some > > real problems. In those cases you want to copy the actual array > > of characters, not just the logical string contained in it. > > Have you said this in reverse? Normally you don't want to copy > anything beyond the nul byte, unless you're copying fixed length > arrays of characters, without treating nul as a special marker. No, I am saying that strncpy() is less efficient than memcpy. It is possible to make an exact copy of a string using memcpy. In fact the entire character array will be copied, not just the data up to the null. > > The problem is that the C sizeof operator does not report the > > correct size of arrays outside the immediate scope where they > > are declared. > > OK, you're saying when you pass arrays into a C function, when > the array is declared external to that function. Something like: > > void func2(char *str) { > // what is the array size of str? > } > > void func1() { > char my_array[31]; > > func2(my_array); > } > > This is a weakness, but if you know that func2() should work > with fixed length arrays of a certain size, you can use: > > void func2(char str[31]) { > // what is the array size of str? > } > > instead. However, I agree that this is feeble, compared to the > way Ada passes array bounds information. > > > Instead you will only get the size of the pointer > > to the first element of the array. > > OK, it sounds like you're suggesting the following: > > void my_func(char *str) { > int slen = sizeof str; // which does not make sense No, I am suggesting: void my_func( char str[]) int slen = sizeof str; // which makes sense within the declaration // scope of the actual parameter. > However, if you declared this instead: > > void my_func(char str[31]) { > int array_len = sizeof str; // this comes close to size of array > > (on many platforms : array_len=32 here due to padding) Nope, not what I wanted at all. See, this works because sizeof is used within the same scope as the declaration of str. > > > Therefore, to efficiently > > copy C strings using memcpy you must provide a second "length" > > argument, which may not be readily available. > > > > Jim Rogers > > Colorado Springs, Colorado USA > > I'm not sure what you're pointing to here, but if you were to > "efficiently" copy the string, you must have the assurance of > a nul byte (so you can stop copying when you hit it with strcpy()) or a > specified length (for memcpy()). Or you might need both if you use > strncpy(). Correct. The point is that C *requires* the nul terminator for a string but cannot enforce the requirement. Enforcement is left up to the programmer. C functions cannot determine the size of an arbitrary length array passed to them without also passing an additional parameter. > > But if it is a "C-string", then it does have a nul byte, and is > efficient to copy (copying stops at the nul byte - memcpy() is not > used to implement strncpy()). > > I'm not wanting to defend C, but we want to be correct about the > defects when we launch criticism of C/C++. Otherwise, Ada programmers > lose respect ;-) I agree. The problem really revolves around the C attitude that the programmer must provide all the discipline necessary to produce a correct program. The language makes no distinction between liberty and capability. My general rule for C is that almost anything is allowed, but MANY things are not advised. All this aside, I believe I have demonstrated that C strings are not necessarily more efficient or more self-describing than Ada strings. This is particularly true when your solutions to the problems I pose are to create fixed length C strings, roughly paralleling the structure of Ada strings. At the same time the C string still needs the extra nul byte at the end to meet the definition of a string. Jim Rogers Colorado Springs, Colorado USA