From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: 103376,8f802583e5c84fa X-Google-Attributes: gid103376,public X-Google-Language: ENGLISH,ASCII-7-bit Path: g2news1.google.com!news4.google.com!news.glorb.com!zen.net.uk!dedekind.zen.co.uk!news.hacking.dk!news.jacob-sparre.dk!pnx.dk!not-for-mail From: Jacob Sparre Andersen Newsgroups: comp.lang.ada Subject: Re: String filtering Date: Wed, 28 Sep 2005 11:08:20 +0200 Organization: Jacob's private Usenet server Message-ID: References: <1j92wa9843ylq.16j89wuqatbaj$.dlg@40tude.net> <433924a2$1_1@glkas0286.greenlnk.net> <43392732$1_1@glkas0286.greenlnk.net> <1jd30obyohnp6$.41tz3funikly.dlg@40tude.net> <43394a3e$1_1@glkas0286.greenlnk.net> NNTP-Posting-Host: hugin.crs4.it Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: jacob-sparre.dk 1127898501 3515 156.148.71.67 (28 Sep 2005 09:08:21 GMT) X-Complaints-To: sparre@jacob-sparre.dk NNTP-Posting-Date: Wed, 28 Sep 2005 09:08:21 +0000 (UTC) User-Agent: Gnus/5.110004 (No Gnus v0.4) Emacs/21.4 (gnu/linux) Cancel-Lock: sha1:DMdOew18CNmKJhyDeXDipJAjKf8= Xref: g2news1.google.com comp.lang.ada:5226 Date: 2005-09-28T11:08:20+02:00 List-Id: David Trudgett wrote: > I've made some revisions based on various comments, and this is what > I have at the moment (incorporating both a string and > unbounded_string version): > > Space_Char : constant Character_Range := (' ', ' '); > Lower_Chars : constant Character_Range := ('a', 'z'); > Upper_Chars : constant Character_Range := ('A', 'Z'); > Numer_Chars : constant Character_Range := ('0', '9'); > Alpha_Num_Space : constant Character_Ranges > := (Space_Char, Lower_Chars, Upper_Chars, Numer_Chars); > Alpha_Num_Space_Set : constant Character_Set > := To_Set(Alpha_Num_Space); > > > function Strip_Non_Alphanumeric > (Str : in Unbounded_String) return Unbounded_String > is > Dest_Size : Natural := Count(Str, Alpha_Num_Space_Set); > New_Str : Unbounded_String := Null_Unbounded_String; > Dest_Char : Natural := 0; > begin > if Dest_Size > 0 then > New_Str := To_Unbounded_String(Dest_Size); > for Src_Char in 1 .. Length(Str) loop > if Is_In(Element(Str, Src_Char), Alpha_Num_Space_Set) then > Dest_Char := Dest_Char + 1; > Replace_Element > (New_Str, Dest_Char, Element(Str, Src_Char)); > end if; > end loop; > end if; > return New_Str; > end Strip_Non_Alphanumeric; procedure Strip_Non_Alphanumeric (Str : in out Unbounded_String) is -- This version does in-place modification of the unbounded -- string, and is thus actually making use of Str being an -- unbounded string and not a fixed length string. Position : Natural := 1; begin while Position =< Length (Str) loop if Is_In (Element (Str, Position), Alpha_Num_Space_Set) then Position := Position + 1; else Delete (Source => Str, From => Position, Through => Position); end if; end loop; end Strip_Non_Alphanumeric; > function Strip_Non_Alphanumeric > (Str : in String) return String > is > New_Str : String(1 .. Count(Str, Alpha_Num_Space_Set)); > Dest_Char : Natural := 0; > begin > if New_Str'Last > 0 then > for Src_Char in Str'Range loop > if Is_In(Str(Src_Char), Alpha_Num_Space_Set) then > Dest_Char := Dest_Char + 1; > New_Str(Dest_Char) := Str(Src_Char); > end if; > end loop; > else > New_Str := ""; > end if; > return New_Str; > end Strip_Non_Alphanumeric; > > In the unbounded version, I decided to use replace_element instead of > append (with its assignment to "", which might perhaps unallocate > memory, depending on implementation??, thus potentially undoing the > purpose of the preallocation). Good. It makes the String and Unbounded_String versions practically equivalent - probably both in CPU and memory use. > Profiling would show no difference in performance between the two > for my current purposes, but in a different situation, involving > large amounts of data, for instance, the fixed string version would > no doubt out-perform speed-wise. Space-wise, the unbounded strings > would probably win out in many situations. I don't expect that your two functions would show significant differences in space or CPU use, depending on how long strings you throw at them. >> Because, assignment might reclaim the memory allocated by >> To_Unbounded_String (Count). > > I assume this is left up to the compiler implementation? Exactly. Jacob -- "There is nothing worse than having only one drunk head."