comp.lang.ada
 help / color / mirror / Atom feed
From: David Trudgett <wpower@zeta.org.au.nospamplease>
Subject: Re: String filtering
Date: Tue, 27 Sep 2005 21:15:15 +1000
Date: 2005-09-27T21:15:15+10:00	[thread overview]
Message-ID: <m3ek7a69to.fsf@rr.trudgett> (raw)
In-Reply-To: 1j92wa9843ylq.16j89wuqatbaj$.dlg@40tude.net

"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:

> On Tue, 27 Sep 2005 19:13:17 +1000, David Trudgett wrote:
>
>>     with Ada.Strings.Maps, Ada.Strings.Unbounded;
>>     use  Ada.Strings.Maps, Ada.Strings.Unbounded;
>> 
>>     Lower_Chars : constant Character_Range := ('a', 'z');
>>     Upper_Chars : constant Character_Range := ('A', 'Z');
>>     Numer_Chars : constant Character_Range := ('0', '9');
>>     Alphanumeric : constant Character_Ranges
>>       := (Lower_Chars, Upper_Chars, Numer_Chars);
>>     Alphanumeric_Set : constant Character_Set := To_Set(Alphanumeric);
>
> with Strings.Maps.Constants;
> use  Strings.Maps.Constants;
>
> -- use defined there Alphanumeric_Set

OK, now I have:

   Alpha_Num_Space_Set : constant Character_Set
     := Alphanumeric_Set or To_Set(' ');

since I realised I also need space.


>
>>     function Strip_Non_Alphanumeric
>>       (Str : in Unbounded_String) return Unbounded_String
>>     is
>>        New_Str : Unbounded_String
>>          := To_Unbounded_String(Count(Str, Alphanumeric_Set));
>
> If you do this, then use String (1..Count (...));

If I did that then I would need to convert back to unbounded_string
when I return the function result. Would that be significantly faster
than working on a pre-allocated unbounded string?


>
>>     begin
>>        New_Str := To_Unbounded_String("");
>
> No need for that, it is initially an empty string.

I at first thought so myself, until I discovered that New_Str was
uninitialised, as it says in the ARM. Hence, I added that line.


>
>>        for Char in 1 .. Length(Str) loop
>>           if Is_In(Element(Str, Char), Alphanumeric_Set) then
>>              Append(New_Str, Element(Str, Char));
>>           end if;
>>        end loop;
>>        return New_Str;
>>     end Strip_Non_Alphanumeric;
>> 
>> Is something like that what y'all do in situations like this?
>
> I don't.
>

> Firstly it is not clear why characters need to be filtered out. Or
> better to say, how did it happen, that you get garbage in a string?

I am sanitising data received over a socket, which may be of any
length. Hence my use of unbounded_string, and my desire to strip out
non-alphanumeric characters.


> Either, you need a character *stream* filtering, 

Possibly, but I'm not using a socket stream interface at the current
time. The socket library I'm using right now doesn't do streams.


> long before you get a string token out of it, or, more realistically
> an error message (exception), should have happened, for example if
> you take some text from a GUI widget.
>
> Secondly, unbounded strings are rarely needed. 

For some definition of 'rarely', I suppose. :-) I'm sure some people
must use them all the time, so it wouldn't be rare for them.

Ada does make it a pain to use unbounded_strings, so it can seem like
a virtue to avoid them, but other languages use them by default, with
no ill-effects to show for it ;-)

Still, in Ada, I do try to use plain fixed strings where they are
sufficient for the purpose.


> Especially in text parsing etc. It is quite uncommon to change a
> string content there. In your example you don't do it either. You
> create a new string. 

Yes, well, functions work that way in Ada (fortunately, or
unfortunately, I don't know). I could have made it a procedure with an
"in out" parameter, but I like functional programming better.
Unfortunately, I haven't been able to do proper functional style
programming in Ada so far, having been thwarted by strong typing and
lack of "out" parameters in functions.


> Also both the source and the result strings have *known* length.

Known but variable, with no particular bounds.


> So you don't need unbounded strings here. Usually, after making some
> trivial analysis like that you'll find out that only 2% or so really
> need to be unbounded.

It seems to me that to use fixed strings here, I would have to convert
the source to a fixed string, do my working on fixed string, then
convert the result to an unbounded string. It sounds like unnecessary
work to me... ;-)  

Thanks for your tips, though, Dmitry, and I'll definitely keep an eye
out for abuse of unbounded_strings.

Cheers,

David



-- 

David Trudgett
http://www.zeta.org.au/~wpower/

Equally, our immoral person must get away with any crimes he
undertakes in the proper fashion, if he is to be outstandingly
immoral; getting caught must be taken to be a sign of incompetence,
since the acme of immorality is to give an impression of morality
while actually being immoral. So we must attribute consummate
immorality to our consummate criminal, and if we are to leave it
intact, we should have him equipped with a colossal reputation for
morality even though he is a colossal criminal. He should be capable
of correcting any mistakes he makes. He must have the ability to argue
plausibly, in case any of his crimes are ever found out, and to use
force wherever necessary, by making use of his courage and strength and
by drawing on his fund of friends and his financial resources.

  -- Plato, in "Republic", 361a-361b, the words of Glaucon.
  



  parent reply	other threads:[~2005-09-27 11:15 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-09-27  6:27 String filtering David Trudgett
2005-09-27  7:38 ` Jacob Sparre Andersen
2005-09-27  9:13   ` David Trudgett
2005-09-27  9:49     ` Dmitry A. Kazakov
2005-09-27 11:01       ` Martin Dowie
2005-09-27 11:12         ` Martin Dowie
2005-09-27 12:54           ` Dmitry A. Kazakov
2005-09-27 13:42             ` Martin Dowie
2005-09-27 14:24               ` Dmitry A. Kazakov
2005-09-28  0:06                 ` David Trudgett
2005-09-28  8:15                   ` Dmitry A. Kazakov
2005-09-28 10:39                     ` David Trudgett
2005-09-28 20:55                       ` Simon Wright
2005-09-28 21:53                         ` Martin Dowie
2005-09-28  9:08                   ` Jacob Sparre Andersen
2005-09-28  9:54                     ` David Trudgett
2005-09-29 14:05                       ` Georg Bauhaus
2005-10-01 19:02                         ` tmoran
2005-10-02  6:38                           ` David Trudgett
2005-10-02 14:11                             ` Martin Dowie
2005-10-02 22:40                               ` David Trudgett
2005-10-03  5:56                                 ` Martin Dowie
2005-10-03 10:33                           ` Georg Bauhaus
2005-09-28 18:21                   ` Jeffrey R. Carter
2005-09-28 21:00                   ` Simon Wright
2005-09-27 11:22         ` David Trudgett
2005-09-27 11:15       ` David Trudgett [this message]
2005-09-27 13:21         ` Dmitry A. Kazakov
2005-09-27 13:43           ` Martin Dowie
2005-09-28  0:51           ` David Trudgett
2005-09-28 12:02             ` Dmitry A. Kazakov
2005-09-28 13:25             ` Marc A. Criley
2005-09-29 22:42           ` Randy Brukardt
2005-09-30 17:54             ` Robert A Duff
2005-10-02  6:57               ` Steve Whalen
2005-10-02 14:14                 ` Martin Dowie
2005-10-03  1:21                 ` Robert A Duff
2005-10-03  7:44                   ` Jacob Sparre Andersen
2005-10-03  8:56                     ` Dmitry A. Kazakov
2005-10-03  9:25                       ` Jean-Pierre Rosen
2005-10-03 20:17                         ` Ada Notation Jeffrey R. Carter
2005-10-03 20:41                           ` Georg Bauhaus
2005-10-05 17:16                             ` Andre
2005-10-05 18:23                               ` Ludovic Brenta
2005-10-05 18:24                               ` Jeffrey R. Carter
2005-10-04 15:13                           ` brian.b.mcguinness
2005-10-04 17:00                     ` String filtering Robert A Duff
2005-10-05  8:19                       ` Jean-Pierre Rosen
2005-10-05 11:25                         ` Robert A Duff
2005-10-04 19:47                     ` Björn Persson
2005-10-05 14:14                       ` Dmitry A. Kazakov
2005-10-03 10:06                   ` Steve Whalen
2005-10-03 17:43                   ` tmoran
2005-10-03 17:59                     ` Robert A Duff
2005-10-05 23:04                       ` Randy Brukardt
2005-09-27 13:52         ` Jacob Sparre Andersen
2005-09-28  1:01           ` David Trudgett
2005-09-28  1:50             ` David Trudgett
2005-09-27 14:08         ` Georg Bauhaus
2005-09-27 14:09         ` Marc A. Criley
2005-09-28  1:09           ` David Trudgett
2005-09-28 21:09           ` Simon Wright
2005-09-27 17:59         ` tmoran
2005-09-28  1:20           ` David Trudgett
2005-09-27 17:47     ` Jeffrey R. Carter
2005-09-28  1:29       ` David Trudgett
2005-09-28 18:32         ` Jeffrey R. Carter
2005-09-27  7:41 ` tmoran
2005-09-27  9:17   ` David Trudgett
2005-09-28  1:54 ` Steve
2005-09-28  2:20   ` David Trudgett
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox