From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: 103376,8f802583e5c84fa X-Google-Attributes: gid103376,public X-Google-Language: ENGLISH,ASCII-7-bit Path: g2news1.google.com!news3.google.com!border1.nntp.dca.giganews.com!border2.nntp.dca.giganews.com!nntp.giganews.com!newscon06.news.prodigy.com!prodigy.net!newsfeed.pacific.net.au!nasal.pacific.net.au!not-for-mail Newsgroups: comp.lang.ada Subject: Re: String filtering From: David Trudgett Organization: Very little? References: <1j92wa9843ylq.16j89wuqatbaj$.dlg@40tude.net> Message-ID: User-Agent: Gnus/5.1006 (Gnus v5.10.6) Emacs/21.4 (gnu/linux) Cancel-Lock: sha1:80cdpyIVZyWK0ve115dRxV/qyBM= MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Tue, 27 Sep 2005 21:15:15 +1000 NNTP-Posting-Host: 61.8.36.91 X-Complaints-To: news@pacific.net.au X-Trace: nasal.pacific.net.au 1127819995 61.8.36.91 (Tue, 27 Sep 2005 21:19:55 EST) NNTP-Posting-Date: Tue, 27 Sep 2005 21:19:55 EST Xref: g2news1.google.com comp.lang.ada:5186 Date: 2005-09-27T21:15:15+10:00 List-Id: "Dmitry A. Kazakov" writes: > On Tue, 27 Sep 2005 19:13:17 +1000, David Trudgett wrote: > >> with Ada.Strings.Maps, Ada.Strings.Unbounded; >> use Ada.Strings.Maps, Ada.Strings.Unbounded; >> >> Lower_Chars : constant Character_Range := ('a', 'z'); >> Upper_Chars : constant Character_Range := ('A', 'Z'); >> Numer_Chars : constant Character_Range := ('0', '9'); >> Alphanumeric : constant Character_Ranges >> := (Lower_Chars, Upper_Chars, Numer_Chars); >> Alphanumeric_Set : constant Character_Set := To_Set(Alphanumeric); > > with Strings.Maps.Constants; > use Strings.Maps.Constants; > > -- use defined there Alphanumeric_Set OK, now I have: Alpha_Num_Space_Set : constant Character_Set := Alphanumeric_Set or To_Set(' '); since I realised I also need space. > >> function Strip_Non_Alphanumeric >> (Str : in Unbounded_String) return Unbounded_String >> is >> New_Str : Unbounded_String >> := To_Unbounded_String(Count(Str, Alphanumeric_Set)); > > If you do this, then use String (1..Count (...)); If I did that then I would need to convert back to unbounded_string when I return the function result. Would that be significantly faster than working on a pre-allocated unbounded string? > >> begin >> New_Str := To_Unbounded_String(""); > > No need for that, it is initially an empty string. I at first thought so myself, until I discovered that New_Str was uninitialised, as it says in the ARM. Hence, I added that line. > >> for Char in 1 .. Length(Str) loop >> if Is_In(Element(Str, Char), Alphanumeric_Set) then >> Append(New_Str, Element(Str, Char)); >> end if; >> end loop; >> return New_Str; >> end Strip_Non_Alphanumeric; >> >> Is something like that what y'all do in situations like this? > > I don't. > > Firstly it is not clear why characters need to be filtered out. Or > better to say, how did it happen, that you get garbage in a string? I am sanitising data received over a socket, which may be of any length. Hence my use of unbounded_string, and my desire to strip out non-alphanumeric characters. > Either, you need a character *stream* filtering, Possibly, but I'm not using a socket stream interface at the current time. The socket library I'm using right now doesn't do streams. > long before you get a string token out of it, or, more realistically > an error message (exception), should have happened, for example if > you take some text from a GUI widget. > > Secondly, unbounded strings are rarely needed. For some definition of 'rarely', I suppose. :-) I'm sure some people must use them all the time, so it wouldn't be rare for them. Ada does make it a pain to use unbounded_strings, so it can seem like a virtue to avoid them, but other languages use them by default, with no ill-effects to show for it ;-) Still, in Ada, I do try to use plain fixed strings where they are sufficient for the purpose. > Especially in text parsing etc. It is quite uncommon to change a > string content there. In your example you don't do it either. You > create a new string. Yes, well, functions work that way in Ada (fortunately, or unfortunately, I don't know). I could have made it a procedure with an "in out" parameter, but I like functional programming better. Unfortunately, I haven't been able to do proper functional style programming in Ada so far, having been thwarted by strong typing and lack of "out" parameters in functions. > Also both the source and the result strings have *known* length. Known but variable, with no particular bounds. > So you don't need unbounded strings here. Usually, after making some > trivial analysis like that you'll find out that only 2% or so really > need to be unbounded. It seems to me that to use fixed strings here, I would have to convert the source to a fixed string, do my working on fixed string, then convert the result to an unbounded string. It sounds like unnecessary work to me... ;-) Thanks for your tips, though, Dmitry, and I'll definitely keep an eye out for abuse of unbounded_strings. Cheers, David -- David Trudgett http://www.zeta.org.au/~wpower/ Equally, our immoral person must get away with any crimes he undertakes in the proper fashion, if he is to be outstandingly immoral; getting caught must be taken to be a sign of incompetence, since the acme of immorality is to give an impression of morality while actually being immoral. So we must attribute consummate immorality to our consummate criminal, and if we are to leave it intact, we should have him equipped with a colossal reputation for morality even though he is a colossal criminal. He should be capable of correcting any mistakes he makes. He must have the ability to argue plausibly, in case any of his crimes are ever found out, and to use force wherever necessary, by making use of his courage and strength and by drawing on his fund of friends and his financial resources. -- Plato, in "Republic", 361a-361b, the words of Glaucon.