From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00, REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,4f316de357ae35e9 X-Google-Attributes: gid103376,public X-Google-ArrivalTime: 2002-08-01 06:10:40 PST Path: archiver1.google.com!news1.google.com!newsfeed.stanford.edu!newsfeed.gamma.ru!Gamma.RU!carrier.kiev.ua!news.lucky.net!not-for-mail From: Oleg Goodyckov Newsgroups: comp.lang.ada Subject: Re: FAQ and string functions Date: Thu, 1 Aug 2002 16:10:52 +0300 Organization: unknown Distribution: world Message-ID: <20020801161052.M1080@videoproject.kiev.ua> References: <20020730093206.A8550@videoproject.kiev.ua> <20020731104643.C1083@videoproject.kiev.ua> <20020731182308.K1083@videoproject.kiev.ua> Reply-To: og@videoproject.kiev.ua NNTP-Posting-Host: news.lucky.net Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: news.lucky.net 1028207434 26224 193.193.193.102 (1 Aug 2002 13:10:34 GMT) X-Complaints-To: usenet@news.lucky.net NNTP-Posting-Date: Thu, 1 Aug 2002 13:10:34 +0000 (UTC) Keywords: 265282490 X-Return-Path: oleg@videoproject.kiev.ua Xref: archiver1.google.com comp.lang.ada:27567 Date: 2002-08-01T16:10:52+03:00 List-Id: On Thu, Aug 01, 2002 at 11:57:04PM +0200, Dmitry A.Kazakov wrote: > > Ok! How about write-once-use-always? For text data analyze applications. > > Then, maybe it is worth to consider more advanced parsing techniques than > split? There are numerous Ada implementations of pattern matching. There > are also Ada subprograms to recognize data types in a string stream. It is > relatively easy to parse and evaluate expressions with brackets and > prioritized operations in Ada (an implementation of the twin-stack > argorithm is quite short) and note no things like split involved. May be. In dreams it is possible almost all things. > >> > While for splitting string like > >> > "x=2*3" people will must be to write program enstead > >> > split("=","x=2*3"), people will write in Perl, not Ada. > >> > >> And what would you do in the case "x=/* An error, should be := */ 2*" and > >> "3" continues on the next line? > > > > Nothing. I know: I have data as described. If no - data is corrupted and > > must be throwed out. It's simple. > > To do so you should have an ability to recognize errors. In couple of next steps of program error will be recognized and exception rised. No problem. > > But what would you do in the case, when data is correct yet? > > In the given example data are correct. /*...*/ was a comment containing a > symbol supposed to be a delimiter. My point was that for almost any > real-life text parsing application, split is useless. Why then most of my tasks are much easier solvable by using split, not substring and similar? May be they are not from real life? > > You'll build PROGRAMMMM, instead write "split(/=/,"x=2*3")". > > You still need a program to process the output of split. Is its output the > final outcome? I suppose it is not. So there should a loop to iterate > through the list returned by split. Where is then a difference between > split + loop, and loop with Get_Next_Word inside? Difference is like difference between RANDOM and SEQUENTIAL acceses to data. In many cases it is not necessary to analyze all of string - enought to know count of tokens or several from them on several well known positions. > IMO, the difference is > that the second is faster and easier to understand. Really? Have you seen that program (bcwords.ada)? And you'll assert that that Ada's program is easier to understand? :-)))))) I have nothing to say... Sorry me for long quotations. But look everybody and who will risk to say, below Ada's program is simpler and easier to understand than equal Perl's program (which is more than in 10 times smaller)? Don't think about how much time needs that program to be written and debugged. Think, how much time it needs to simply type it in text editor correctly. :-))))) -- This demonstration is a response to a suggestion by John English -- about assessing different component libraries, in -- the context of the Ada Standard Component Library WG -- (http://www.suffix.com/Ada/SCL/). It uses part of Corey Minford -- 's solution (why reinvent that wheel?!) -- John said: -- As a way of objectively assessing the merits of different approaches, -- perhaps the way to do this is to code some examples; one of my -- favourites for this is a program to list the 10 most common words in a -- file with the number of occurrences of each, where the length of words -- and the size of the file can be arbitrarily large. In Perl it might look -- something like this: -- while (<>) { # for each line in the input file(s) -- chomp; # trim the end of the line -- tr/A-Z/a-z/; # fold uppercase to lowercase -- @words = split /\W+/; # break the line into words -- foreach (@words) { -- if (/^\w+$/) { # ignore non-words -- $wordlist{$_}++; # increment count in associative array -- } # (key = word, val = no. of occurrences) -- } -- } -- $times = 0; -- foreach (sort {$wordlist{$b} <=> $wordlist{$a}} (keys %wordlist)) { -- last if (++$times > 10); # exit loop after 10 iterations -- print "$_ : $wordlist{$_}\n"; # process array in descending order -- } # of value, printing keys and values -- What would this look like in Ada using each of the libraries you've -- listed? Does anyone else have favourite examples like this? -- $Id: bcwords.ada,v 1.6 2001/09/23 15:25:10 simon Exp $ with Ada.Strings.Unbounded; with Ada.Text_IO; with Word_Parser; with Word_Count_Support; procedure Word_Count is Word_Found : Boolean; File_Done : Boolean; Word : Ada.Strings.Unbounded.Unbounded_String; Word_Bag : Word_Count_Support.BU.Bag; Word_Tree : Word_Count_Support.ST.AVL_Tree; Word_Bag_Iter : Word_Count_Support.Containers.Iterator'Class := Word_Count_Support.BU.New_Iterator (Word_Bag); procedure Word_Processor (Item : Ada.Strings.Unbounded.Unbounded_String; Ok : out Boolean); procedure Word_Processor (Item : Ada.Strings.Unbounded.Unbounded_String; Ok : out Boolean) is Dummy : Boolean; begin Word_Count_Support.ST.Insert (Word_Tree, Word_Count_Support.Word_Stat' (Word => Item, Count => Word_Count_Support.BU.Count (Word_Bag, Item)), Dummy); Ok := True; end Word_Processor; procedure Word_Bag_Visitor is new Word_Count_Support.Containers.Visit (Word_Processor); Number_Output : Natural := 0; procedure Tree_Processor (Item : Word_Count_Support.Word_Stat; OK : out Boolean); procedure Tree_Processor (Item : Word_Count_Support.Word_Stat; OK : out Boolean) is begin Ada.Text_IO.Put_Line (Ada.Strings.Unbounded.To_String (Item.Word) & " =>" & Positive'Image (Item.Count)); Number_Output := Number_Output + 1; OK := Number_Output < 10; -- this is where we select the top 10 end Tree_Processor; procedure Tree_Visitor is new Word_Count_Support.ST.Visit (Tree_Processor); begin loop Word_Parser.Get_Next_Word (Ada.Text_IO.Standard_Input, Word, Word_Found, File_Done); exit when not Word_Found; Word_Count_Support.Bags.Add (Word_Bag, Word); end loop; Word_Count_Support.Containers.Reset (Word_Bag_Iter); Word_Bag_Visitor (Word_Bag_Iter); Tree_Visitor (Word_Tree); end Word_Count; with Ada.Strings.Unbounded; with BC.Containers; with BC.Containers.Bags; with BC.Containers.Bags.Unbounded; with BC.Containers.Trees; with BC.Containers.Trees.AVL; with Global_Heap; package Word_Count_Support is package Containers is new BC.Containers (Item => Ada.Strings.Unbounded.Unbounded_String, "=" => Ada.Strings.Unbounded."="); package Bags is new Containers.Bags; function Hash (S : Ada.Strings.Unbounded.Unbounded_String) return Positive; package BU is new Bags.Unbounded (Hash => Hash, Buckets => 1, Storage => Global_Heap.Storage); type Word_Stat is record Word : Ada.Strings.Unbounded.Unbounded_String; Count : Positive; end record; function ">" (L, R : Word_Stat) return Boolean; function "=" (L, R : Word_Stat) return Boolean; package Stat_Containers is new BC.Containers (Word_Stat); package Trees is new Stat_Containers.Trees; package ST is new Trees.AVL ("<" => ">", -- we need the most popular first Storage => Global_Heap.Storage); end Word_Count_Support; package body Word_Count_Support is -- This is extraordinarily lazy, of course we should really invent -- some better hash function! function Hash (S : Ada.Strings.Unbounded.Unbounded_String) return Positive is begin return 1; end Hash; function ">" (L, R : Word_Stat) return Boolean is use type Ada.Strings.Unbounded.Unbounded_String; begin return L.Count > R.Count or else (L.Count = R.Count and then L.Word > R.Word); end ">"; function "=" (L, R : Word_Stat) return Boolean is use type Ada.Strings.Unbounded.Unbounded_String; begin return L.Count = R.Count and then L.Word = R.Word; end "="; end Word_Count_Support; -- by Corey Minyard package body Word_Parser is Big_A_Pos : Integer := Character'Pos ('A'); Small_A_Pos : Integer := Character'Pos ('a'); procedure Xlat_To_Lower_Case (C : in out Character); procedure Xlat_To_Lower_Case (C : in out Character) is begin if (C in 'A' .. 'Z') then C := Character'Val (Character'Pos (C) - Big_A_Pos + Small_A_Pos); end if; end Xlat_To_Lower_Case; procedure Get_Next_Word (File : in File_Type; Word : out Ada.Strings.Unbounded.Unbounded_String; Word_Found : out Boolean; File_Done : out Boolean) is Tmp_Str : String (1 .. 10); Word_Pos : Positive := Tmp_Str'First; Input_Char : Character; In_Word : Boolean := False; begin -- Start with an empty word. Word := Ada.Strings.Unbounded.To_Unbounded_String (""); File_Done := False; Word_Found := False; if (End_Of_File (File)) then Word_Found := False; File_Done := True; else loop Get (File, Input_Char); Xlat_To_Lower_Case (Input_Char); if (not In_Word) then if (Input_Char in 'a' .. 'z') then In_Word := True; Word_Found := True; Tmp_Str (Word_Pos) := Input_Char; Word_Pos := Word_Pos + 1; end if; elsif (Input_Char in 'a' .. 'z') then Tmp_Str (Word_Pos) := Input_Char; if (Word_Pos = Tmp_Str'Last) then Word := Word & Tmp_Str; Word_Pos := Tmp_Str'First; else Word_Pos := Word_Pos + 1; end if; else exit; end if; if (End_Of_File (File)) then File_Done := True; exit; elsif (End_Of_Line (File) and In_Word) then exit; end if; end loop; if (Word_Pos /= Tmp_Str'First) then -- If we have some stuff left in the temporary string, put it into -- the word. Word := Word & Tmp_Str (Tmp_Str'First .. Word_Pos - 1); end if; end if; end Get_Next_Word; end Word_Parser; -- by Corey Minyard with Ada.Strings.Unbounded; use type Ada.Strings.Unbounded.Unbounded_String; with Ada.Text_IO; use Ada.Text_IO; package Word_Parser is procedure Get_Next_Word (File : in File_Type; Word : out Ada.Strings.Unbounded.Unbounded_String; Word_Found : out Boolean; File_Done : out Boolean); end Word_Parser;