From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=0.2 required=5.0 tests=BAYES_00,INVALID_MSGID, REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,f868292008c639ce X-Google-Attributes: gid103376,public From: Florian Weimer Subject: Re: C vs. Ada - strings Date: 2000/05/05 Message-ID: <87k8h9v1iy.fsf@deneb.cygnus.argh.org>#1/1 X-Deja-AN: 619511415 References: <390F0D93.F835FAD9@ftw.rsc.raytheon.com> Mail-Copies-To: never Content-Type: text/plain; charset=us-ascii X-Complaints-To: abuse@cygnus.argh.org X-Trace: deneb.cygnus.argh.org 957515653 11861 192.168.1.2 (5 May 2000 08:34:13 GMT) Organization: Penguin on board User-Agent: Gnus/5.0806 (Gnus v5.8.6) Emacs/20.6 Mime-Version: 1.0 Reply-To: Florian Weimer NNTP-Posting-Date: 5 May 2000 08:34:13 GMT Newsgroups: comp.lang.ada Date: 2000-05-05T08:34:13+00:00 List-Id: Wes Groleau writes: > Two offices adjoining mine are occupied by persons > fond of saying "Ada strings suck" I've just written a parser for small, regular language -- in C. For these kind of jobs, C strings are quite handy, and I even think that the code is easy to read (it reuses the same idiom many times: keep a pointer to the beginning of the token, iterate over the token and replace the character delimiting it from the next token by a '\0', and finally skip additional delimiters). Of course, this only works if you're dealing with text strings, if you are dealing with binary data, you can't use in-band signalling of string terminators. A direct Ada translation would be a bit more complicated because you would have to keep track both of the start and the end of the tokens. Copying the token to a different, unbounded string variable is perhaps a translation which is more appropriate and even more readable than the C solution. Obviously, you can't do this in standard C because there are no unbounded strings. Each time you have to create strings whose maximum length is not given at compile time, you have to use heap allocation and worry about all the consequences. I find this rather unacceptable because unbounded strings are quite common. > I've had to twice write packages similar to the Ada 95 > string packages to avoid imitating other folks who > continually re-invent the same string handling logic > over and over. Most probably, I'll write my own string package some day, but entirely due to efficiency considerations. In fact, I'm going mimic the standard Ada interface as closely as possible. I have only worked with the GNAT implementation of the standard Ada strings, and two things annoy me particularly: First, the bounded strings tend to increase code size and compile time considerably. The string package Gautier mentioned could be used as a replacement in places where this is a concern, and perhaps the bounded strings can be implemented on top of it, reducing code bloat. Second, the unbounded strings are inefficient to a degree that it starts to irritate people. (There even was a thread with this topic a few weeks ago.) A reference-count based implementations could be considerably faster: you could preallocate storage so that you don't have to use an allocator and copy the entire string each time you add a character to the string, you could take into account that storage is allocated in chunks of certain sizes (for example, the smallest data chunck allocated by GNU malloc is 12 bytes large on 32 bit platforms), you don't have to use allocators constantly if you pass around strings, and so on. I'm sure such an implemention will greatly increase overall performance, although it is much more complicated than the current one. Unfortunately, it is difficult to exactly duplicate the semantics of the standard Ada unbouded strings. Standard Ada strings are immutable, but with some reference count tricks, you can even do in-place modification without losing immutability. Another issue is perhaps more problematic: At first, reference counts are not task at all. But a task-safe implementation does not require extensive locking if the hardware provides atomic load-increment-store and load-decrement-store-compare-to-zero operations on integers of a suitable size (e.g., 32 bits and more). The x86 architecture does have these operations (and for this kind of application, they are even completely SMP-safe), but these instructions look very CISCy, so they are probably not available on other architectures. At the moment, I'm not sure if mimicking the Ada semantics of unbounded strings is worth the trouble at all. Perhaps it's better to make the strings with reference counts mutable. (Immutable strings which aren't safe for tasking aren't an option. I'm sure people tend to forget that they're unsafe, and the resulting bugs are horribly to debug.)