From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00, REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4 X-Google-Thread: 103376,8f802583e5c84fa X-Google-Attributes: gid103376,public X-Google-Language: ENGLISH,ASCII-7-bit Path: g2news1.google.com!news3.google.com!border1.nntp.dca.giganews.com!nntp.giganews.com!newsfeed00.sul.t-online.de!newsfeed01.sul.t-online.de!t-online.de!newsfeed.arcor.de!news.arcor.de!not-for-mail From: "Dmitry A. Kazakov" Subject: Re: String filtering Newsgroups: comp.lang.ada User-Agent: 40tude_Dialog/2.0.14.1 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Reply-To: mailbox@dmitry-kazakov.de Organization: cbb software GmbH References: <1j92wa9843ylq.16j89wuqatbaj$.dlg@40tude.net> Date: Tue, 27 Sep 2005 15:21:12 +0200 Message-ID: <1b54lwg8s1gk8.1t3jp1cmc2x32$.dlg@40tude.net> NNTP-Posting-Date: 27 Sep 2005 15:21:12 MEST NNTP-Posting-Host: 4a2c75b7.newsread2.arcor-online.net X-Trace: DXC=;H:U_3U1\59LL2:TY8[@n6Q5U85hF6f;4jW\KbG]kaM8GSi?jHD8GO0c5J7enW;^6ZC`4<=9bOTW=MN> X-Complaints-To: abuse@arcor.de Xref: g2news1.google.com comp.lang.ada:5189 Date: 2005-09-27T15:21:12+02:00 List-Id: On Tue, 27 Sep 2005 21:15:15 +1000, David Trudgett wrote: > "Dmitry A. Kazakov" writes: > > OK, now I have: > > Alpha_Num_Space_Set : constant Character_Set > := Alphanumeric_Set or To_Set(' '); > > since I realised I also need space. And HT? And VT? LF, CR,... (:-)) >>> function Strip_Non_Alphanumeric >>> (Str : in Unbounded_String) return Unbounded_String >>> is >>> New_Str : Unbounded_String >>> := To_Unbounded_String(Count(Str, Alphanumeric_Set)); >> >> If you do this, then use String (1..Count (...)); > > If I did that then I would need to convert back to unbounded_string > when I return the function result. Would that be significantly faster > than working on a pre-allocated unbounded string? It cannot be slower, because you already have one To_Unbounded_String to initialize it. >> >>> begin >>> New_Str := To_Unbounded_String(""); >> >> No need for that, it is initially an empty string. > > I at first thought so myself, until I discovered that New_Str was > uninitialised, as it says in the ARM. Hence, I added that line. It is, with Null_Unbounded_String. >> Firstly it is not clear why characters need to be filtered out. Or >> better to say, how did it happen, that you get garbage in a string? > > I am sanitising data received over a socket, which may be of any > length. Hence my use of unbounded_string, and my desire to strip out > non-alphanumeric characters. But sockets normally work either as a stream or with a Storage_Element's array. Thus you don't have Unbounded_String, you make it later. Do a String instead. >> Either, you need a character *stream* filtering, > > Possibly, but I'm not using a socket stream interface at the current > time. The socket library I'm using right now doesn't do streams. Anyway, you have some protocol, and non-alpha characters seem to violate it. So, what your filter does, is inventing some meaning out of meaningless rubbish. Usually it is rather a bad idea, see PL/1 and HTML. Errors should be reported as early as possible. >> long before you get a string token out of it, or, more realistically >> an error message (exception), should have happened, for example if >> you take some text from a GUI widget. >> >> Secondly, unbounded strings are rarely needed. > > For some definition of 'rarely', I suppose. :-) I'm sure some people > must use them all the time, so it wouldn't be rare for them. You must use it only under certain conditions. Which are: mutability and "sufficiently" unknown in advance length. In your case they aren't satisfied, so you don't have to, if you don't want to... (:-)) > Ada does make it a pain to use unbounded_strings, so it can seem like > a virtue to avoid them, but other languages use them by default, with > no ill-effects to show for it ;-) It is a different story. Unbounded_String is a nasty kludge. But that does not mean that if they were designed properly, they would be more needed! (:-)) >> Especially in text parsing etc. It is quite uncommon to change a >> string content there. In your example you don't do it either. You >> create a new string. > > Yes, well, functions work that way in Ada (fortunately, or > unfortunately, I don't know). I could have made it a procedure with an > "in out" parameter, but I like functional programming better. > Unfortunately, I haven't been able to do proper functional style > programming in Ada so far, having been thwarted by strong typing and > lack of "out" parameters in functions. Well, out parameters in functions are much desired by almost anybody, except the ARG members. (:-)) But that won't help. Try access parameters instead and you will see. The problem is that an out parameter cannot "return" constraints as the proper result can. So functional style is only possible through the result. And you perfectly can create a local string of needed length and return it as the result. >> Also both the source and the result strings have *known* length. > > Known but variable, with no particular bounds. That's no matter. All string operations can be implemented this way: function Op (...) return String is Result_Length : Natural; begin -- evaluate Result_Length declare Result : String (1..Result_Length); begin -- Fill Result return Result; end; end Op; Such operations can always be used as: declare X : String renames Op (...); begin -- Using X end; -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de