comp.lang.ada
 help / color / mirror / Atom feed
From: Florian Weimer <fw@deneb.cygnus.argh.org>
Subject: Re: C vs. Ada - strings
Date: 2000/05/05
Date: 2000-05-05T08:34:13+00:00	[thread overview]
Message-ID: <87k8h9v1iy.fsf@deneb.cygnus.argh.org> (raw)
In-Reply-To: 390F0D93.F835FAD9@ftw.rsc.raytheon.com

Wes Groleau <wwgrol@ftw.rsc.raytheon.com> writes:

> Two offices adjoining mine are occupied by persons
> fond of saying "Ada strings suck"

I've just written a parser for small, regular language -- in C.  For
these kind of jobs, C strings are quite handy, and I even think that
the code is easy to read (it reuses the same idiom many times: keep
a pointer to the beginning of the token, iterate over the token and
replace the character delimiting it from the next token by a '\0', and
finally skip additional delimiters).  Of course, this only works if
you're dealing with text strings, if you are dealing with binary data,
you can't use in-band signalling of string terminators.

A direct Ada translation would be a bit more complicated because you
would have to keep track both of the start and the end of the tokens.
Copying the token to a different, unbounded string variable is perhaps
a translation which is more appropriate and even more readable than
the C solution.  Obviously, you can't do this in standard C because
there are no unbounded strings.  Each time you have to create strings
whose maximum length is not given at compile time, you have to use
heap allocation and worry about all the consequences.  I find this
rather unacceptable because unbounded strings are quite common.

> I've had to twice write packages similar to the Ada 95
> string packages to avoid imitating other folks who
> continually re-invent the same string handling logic
> over and over.

Most probably, I'll write my own string package some day, but entirely
due to efficiency considerations.  In fact, I'm going mimic the
standard Ada interface as closely as possible.

I have only worked with the GNAT implementation of the standard Ada
strings, and two things annoy me particularly: First, the bounded
strings tend to increase code size and compile time considerably.  The
string package Gautier mentioned could be used as a replacement in
places where this is a concern, and perhaps the bounded strings can be
implemented on top of it, reducing code bloat.

Second, the unbounded strings are inefficient to a degree that it
starts to irritate people.  (There even was a thread with this topic
a few weeks ago.)  A reference-count based implementations could
be considerably faster: you could preallocate storage so that you
don't have to use an allocator and copy the entire string each time
you add a character to the string, you could take into account that
storage is allocated in chunks of certain sizes (for example, the
smallest data chunck allocated by GNU malloc is 12 bytes large on
32 bit platforms), you don't have to use allocators constantly if
you pass around strings, and so on.  I'm sure such an implemention
will greatly increase overall performance, although it is much more
complicated than the current one.

Unfortunately, it is difficult to exactly duplicate the semantics
of the standard Ada unbouded strings.  Standard Ada strings are
immutable, but with some reference count tricks, you can even do
in-place modification without losing immutability.  Another issue is
perhaps more problematic: At first, reference counts are not task
at all.  But a task-safe implementation does not require extensive
locking if the hardware provides atomic load-increment-store and
load-decrement-store-compare-to-zero operations on integers of a
suitable size (e.g., 32 bits and more).  The x86 architecture does
have these operations (and for this kind of application, they are even
completely SMP-safe), but these instructions look very CISCy, so they
are probably not available on other architectures.

At the moment, I'm not sure if mimicking the Ada semantics of
unbounded strings is worth the trouble at all.  Perhaps it's better to
make the strings with reference counts mutable.  (Immutable strings
which aren't safe for tasking aren't an option.  I'm sure people tend
to forget that they're unsafe, and the resulting bugs are horribly to
debug.)




  parent reply	other threads:[~2000-05-05  0:00 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2000-05-02  0:00 C vs. Ada - strings Wes Groleau
2000-05-02  0:00 ` Robert A Duff
2000-05-03  0:00   ` Wes Groleau
2000-05-03  0:00     ` Tarjei Tj�stheim Jensen
2000-05-03  0:00       ` Ted Dennison
2000-05-03  0:00   ` Tarjei T. Jensen
2000-05-03  0:00     ` Charles Hixson
2000-05-04  0:00     ` Robert Dewar
2000-05-04  0:00       ` Charles Hixson
2000-05-06  0:00       ` Tarjei Tj�stheim Jensen
2000-05-04  0:00   ` Robert Dewar
2000-05-04  0:00     ` Hyman Rosen
2000-05-04  0:00       ` Jon S Anthony
2000-05-04  0:00       ` Robert Dewar
2000-05-04  0:00     ` Robert A Duff
2000-05-04  0:00       ` Robert Dewar
2000-05-05  0:00         ` Florian Weimer
2000-05-05  0:00           ` Pascal Obry
2000-05-05  0:00             ` Hyman Rosen
2000-05-06  0:00           ` Tarjei Tj�stheim Jensen
2000-05-06  0:00             ` Florian Weimer
2000-05-07  0:00               ` Robert Dewar
2000-05-09  0:00                 ` Florian Weimer
2000-05-02  0:00 ` Ted Dennison
2000-05-03  0:00   ` Pascal Obry
2000-05-03  0:00     ` Keith Thompson
2000-05-04  0:00       ` Wes Groleau
2000-05-18  0:00       ` Pete
2000-05-18  0:00         ` dale
2000-05-18  0:00           ` Robert A Duff
2000-05-19  0:00             ` dale
2000-05-21  0:00             ` Robert Dewar
2000-05-22  0:00               ` Robert A Duff
2000-05-22  0:00                 ` Keith Thompson
2000-05-24  0:00                 ` 'img Peter Hermann
2000-05-24  0:00                   ` 'img Robert Dewar
2000-05-24  0:00                     ` 'img Ted Dennison
2000-05-25  0:00                       ` 'img Peter Hermann
2000-05-25  0:00                         ` 'img Keith Thompson
2000-05-25  0:00                           ` 'img Ted Dennison
2000-05-26  0:00                         ` 'img dmitry
2000-05-26  0:00                           ` 'img Robert Dewar
2000-05-26  0:00                           ` 'img Brian Rogoff
2000-05-26  0:00                             ` 'img Robert Dewar
2000-05-25  0:00                       ` 'img Robert Dewar
2000-05-19  0:00         ` C vs. Ada - strings Geoff Bull
2000-05-19  0:00           ` mike
2000-05-21  0:00           ` Robert Dewar
2000-06-03  0:00           ` Pete
2000-06-03  0:00             ` Java vs. Ada - strings (was: C vs. Ada - strings) Ted Dennison
2000-06-04  0:00               ` Robert I. Eachus
2000-06-04  0:00               ` Pete
2000-06-04  0:00                 ` Jean-Pierre Rosen
2000-06-04  0:00                   ` Pete
2000-06-05  0:00                     ` Jean-Pierre Rosen
2000-06-05  0:00                 ` Ted Dennison
2000-06-05  0:00                   ` Marin D. Condic
2000-06-05  0:00                     ` David Botton
2000-06-05  0:00                       ` Marin D. Condic
2000-06-06  0:00                     ` Robert A Duff
2000-06-06  0:00                   ` Ken Garlington
2000-06-06  0:00                     ` Marin D. Condic
2000-06-03  0:00             ` C vs. Ada - strings Ken Garlington
2000-06-03  0:00               ` Ted Dennison
2000-06-04  0:00                 ` Ken Garlington
2000-06-04  0:00                 ` Dale Stanbrough
2000-05-03  0:00   ` Wes Groleau
2000-05-03  0:00     ` Ted Dennison
2000-05-04  0:00   ` Ole-Hjalmar Kristensen
2000-05-04  0:00     ` Gautier
2000-05-02  0:00 ` Larry Kilgallen
2000-05-05  0:00 ` Florian Weimer [this message]
2000-05-05  0:00   ` Robert Dewar
2000-05-05  0:00   ` Ted Dennison
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox