comp.lang.ada
 help / color / mirror / Atom feed
From: "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de>
Subject: Re: String filtering
Date: Tue, 27 Sep 2005 15:21:12 +0200
Date: 2005-09-27T15:21:12+02:00	[thread overview]
Message-ID: <1b54lwg8s1gk8.1t3jp1cmc2x32$.dlg@40tude.net> (raw)
In-Reply-To: m3ek7a69to.fsf@rr.trudgett

On Tue, 27 Sep 2005 21:15:15 +1000, David Trudgett wrote:

> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
> 
> OK, now I have:
> 
>    Alpha_Num_Space_Set : constant Character_Set
>      := Alphanumeric_Set or To_Set(' ');
> 
> since I realised I also need space.

And HT? And VT? LF, CR,... (:-))

>>>     function Strip_Non_Alphanumeric
>>>       (Str : in Unbounded_String) return Unbounded_String
>>>     is
>>>        New_Str : Unbounded_String
>>>          := To_Unbounded_String(Count(Str, Alphanumeric_Set));
>>
>> If you do this, then use String (1..Count (...));
> 
> If I did that then I would need to convert back to unbounded_string
> when I return the function result. Would that be significantly faster
> than working on a pre-allocated unbounded string?

It cannot be slower, because you already have one To_Unbounded_String to
initialize it. 

>>
>>>     begin
>>>        New_Str := To_Unbounded_String("");
>>
>> No need for that, it is initially an empty string.
> 
> I at first thought so myself, until I discovered that New_Str was
> uninitialised, as it says in the ARM. Hence, I added that line.

It is, with Null_Unbounded_String.

>> Firstly it is not clear why characters need to be filtered out. Or
>> better to say, how did it happen, that you get garbage in a string?
> 
> I am sanitising data received over a socket, which may be of any
> length. Hence my use of unbounded_string, and my desire to strip out
> non-alphanumeric characters.

But sockets normally work either as a stream or with a Storage_Element's
array. Thus you don't have Unbounded_String, you make it later. Do a String
instead.

>> Either, you need a character *stream* filtering, 
> 
> Possibly, but I'm not using a socket stream interface at the current
> time. The socket library I'm using right now doesn't do streams.

Anyway, you have some protocol, and non-alpha characters seem to violate
it. So, what your filter does, is inventing some meaning out of meaningless
rubbish. Usually it is rather a bad idea, see PL/1 and HTML. Errors should
be reported as early as possible.

>> long before you get a string token out of it, or, more realistically
>> an error message (exception), should have happened, for example if
>> you take some text from a GUI widget.
>>
>> Secondly, unbounded strings are rarely needed. 
> 
> For some definition of 'rarely', I suppose. :-) I'm sure some people
> must use them all the time, so it wouldn't be rare for them.

You must use it only under certain conditions. Which are: mutability and
"sufficiently" unknown in advance length. In your case they aren't
satisfied, so you don't have to, if you don't want to... (:-))

> Ada does make it a pain to use unbounded_strings, so it can seem like
> a virtue to avoid them, but other languages use them by default, with
> no ill-effects to show for it ;-)

It is a different story. Unbounded_String is a nasty kludge. But that does
not mean that if they were designed properly, they would be more needed!
(:-))

>> Especially in text parsing etc. It is quite uncommon to change a
>> string content there. In your example you don't do it either. You
>> create a new string. 
> 
> Yes, well, functions work that way in Ada (fortunately, or
> unfortunately, I don't know). I could have made it a procedure with an
> "in out" parameter, but I like functional programming better.
> Unfortunately, I haven't been able to do proper functional style
> programming in Ada so far, having been thwarted by strong typing and
> lack of "out" parameters in functions.

Well, out parameters in functions are much desired by almost anybody,
except the ARG members. (:-)) But that won't help. Try access parameters
instead and you will see. The problem is that an out parameter cannot
"return" constraints as the proper result can. So functional style is only
possible through the result. And you perfectly can create a local string of
needed length and return it as the result.

>> Also both the source and the result strings have *known* length.
> 
> Known but variable, with no particular bounds.

That's no matter. All string operations can be implemented this way:

function Op (...) return String is
   Result_Length : Natural;
begin
   -- evaluate Result_Length
   declare
      Result : String (1..Result_Length);
   begin
      -- Fill Result
      return Result;
   end;
end Op;

Such operations can always be used as:

declare
   X : String renames Op (...);
begin
   -- Using X
end;

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



  reply	other threads:[~2005-09-27 13:21 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-09-27  6:27 String filtering David Trudgett
2005-09-27  7:38 ` Jacob Sparre Andersen
2005-09-27  9:13   ` David Trudgett
2005-09-27  9:49     ` Dmitry A. Kazakov
2005-09-27 11:01       ` Martin Dowie
2005-09-27 11:12         ` Martin Dowie
2005-09-27 12:54           ` Dmitry A. Kazakov
2005-09-27 13:42             ` Martin Dowie
2005-09-27 14:24               ` Dmitry A. Kazakov
2005-09-28  0:06                 ` David Trudgett
2005-09-28  8:15                   ` Dmitry A. Kazakov
2005-09-28 10:39                     ` David Trudgett
2005-09-28 20:55                       ` Simon Wright
2005-09-28 21:53                         ` Martin Dowie
2005-09-28  9:08                   ` Jacob Sparre Andersen
2005-09-28  9:54                     ` David Trudgett
2005-09-29 14:05                       ` Georg Bauhaus
2005-10-01 19:02                         ` tmoran
2005-10-02  6:38                           ` David Trudgett
2005-10-02 14:11                             ` Martin Dowie
2005-10-02 22:40                               ` David Trudgett
2005-10-03  5:56                                 ` Martin Dowie
2005-10-03 10:33                           ` Georg Bauhaus
2005-09-28 18:21                   ` Jeffrey R. Carter
2005-09-28 21:00                   ` Simon Wright
2005-09-27 11:22         ` David Trudgett
2005-09-27 11:15       ` David Trudgett
2005-09-27 13:21         ` Dmitry A. Kazakov [this message]
2005-09-27 13:43           ` Martin Dowie
2005-09-28  0:51           ` David Trudgett
2005-09-28 12:02             ` Dmitry A. Kazakov
2005-09-28 13:25             ` Marc A. Criley
2005-09-29 22:42           ` Randy Brukardt
2005-09-30 17:54             ` Robert A Duff
2005-10-02  6:57               ` Steve Whalen
2005-10-02 14:14                 ` Martin Dowie
2005-10-03  1:21                 ` Robert A Duff
2005-10-03  7:44                   ` Jacob Sparre Andersen
2005-10-03  8:56                     ` Dmitry A. Kazakov
2005-10-03  9:25                       ` Jean-Pierre Rosen
2005-10-03 20:17                         ` Ada Notation Jeffrey R. Carter
2005-10-03 20:41                           ` Georg Bauhaus
2005-10-05 17:16                             ` Andre
2005-10-05 18:23                               ` Ludovic Brenta
2005-10-05 18:24                               ` Jeffrey R. Carter
2005-10-04 15:13                           ` brian.b.mcguinness
2005-10-04 17:00                     ` String filtering Robert A Duff
2005-10-05  8:19                       ` Jean-Pierre Rosen
2005-10-05 11:25                         ` Robert A Duff
2005-10-04 19:47                     ` Björn Persson
2005-10-05 14:14                       ` Dmitry A. Kazakov
2005-10-03 10:06                   ` Steve Whalen
2005-10-03 17:43                   ` tmoran
2005-10-03 17:59                     ` Robert A Duff
2005-10-05 23:04                       ` Randy Brukardt
2005-09-27 13:52         ` Jacob Sparre Andersen
2005-09-28  1:01           ` David Trudgett
2005-09-28  1:50             ` David Trudgett
2005-09-27 14:08         ` Georg Bauhaus
2005-09-27 14:09         ` Marc A. Criley
2005-09-28  1:09           ` David Trudgett
2005-09-28 21:09           ` Simon Wright
2005-09-27 17:59         ` tmoran
2005-09-28  1:20           ` David Trudgett
2005-09-27 17:47     ` Jeffrey R. Carter
2005-09-28  1:29       ` David Trudgett
2005-09-28 18:32         ` Jeffrey R. Carter
2005-09-27  7:41 ` tmoran
2005-09-27  9:17   ` David Trudgett
2005-09-28  1:54 ` Steve
2005-09-28  2:20   ` David Trudgett
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox