From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00,
	REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: 103376,4f316de357ae35e9
X-Google-Attributes: gid103376,public
X-Google-ArrivalTime: 2002-08-01 06:10:40 PST
Path: 
 archiver1.google.com!news1.google.com!newsfeed.stanford.edu!newsfeed.gamma.ru!Gamma.RU!carrier.kiev.ua!news.lucky.net!not-for-mail
From: Oleg Goodyckov <og@videoproject.kiev.ua>
Newsgroups: comp.lang.ada
Subject: Re: FAQ and string functions
Date: Thu, 1 Aug 2002 16:10:52 +0300
Organization: unknown
Distribution: world
Message-ID: <20020801161052.M1080@videoproject.kiev.ua>
References: <20020730093206.A8550@videoproject.kiev.ua>
 <20020731104643.C1083@videoproject.kiev.ua>
 <ai8ccb$124r7q$1@ID-77047.news.dfncis.de>
 <20020731182308.K1083@videoproject.kiev.ua>
 <aib0a6$139lkn$1@ID-77047.news.dfncis.de>
Reply-To: og@videoproject.kiev.ua
NNTP-Posting-Host: news.lucky.net
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: news.lucky.net 1028207434 26224 193.193.193.102 (1 Aug 2002 13:10:34
 GMT)
X-Complaints-To: usenet@news.lucky.net
NNTP-Posting-Date: Thu, 1 Aug 2002 13:10:34 +0000 (UTC)
Keywords: 265282490
X-Return-Path: oleg@videoproject.kiev.ua
Xref: archiver1.google.com comp.lang.ada:27567
Date: 2002-08-01T16:10:52+03:00
List-Id: <comp.lang.ada>

On Thu, Aug 01, 2002 at 11:57:04PM +0200, Dmitry A.Kazakov wrote:
> > Ok! How about write-once-use-always? For text data analyze applications.
> 
> Then, maybe it is worth to consider more advanced parsing techniques than 
> split? There are numerous Ada implementations of pattern matching. There 
> are also Ada subprograms to recognize data types in a string stream. It is 
> relatively easy to parse and evaluate expressions with brackets and 
> prioritized operations in Ada (an implementation of the twin-stack 
> argorithm is quite short) and note no things like split involved.

May be. In dreams it is possible almost all things.

> >> > While for splitting string like
> >> > "x=2*3" people will must be to write program enstead
> >> > split("=","x=2*3"), people will write in Perl, not Ada.
> >> 
> >> And what would you do in the case "x=/* An error, should be := */ 2*" and
> >> "3" continues on the next line?
> > 
> > Nothing. I know: I have data as described. If no - data is corrupted and
> > must be throwed out. It's simple.
> 
> To do so you should have an ability to recognize errors.

In couple of next steps of program error will be recognized and exception
rised. No problem.

> > But what would you do in the case, when data is correct yet?
> 
> In the given example data are correct. /*...*/ was a comment containing a 
> symbol supposed to be a delimiter. My point was that for almost any 
> real-life text parsing application, split is useless.

Why then most of my tasks are much easier solvable by using split, not
substring and similar? May be they are not from real life?

> > You'll build PROGRAMMMM, instead write "split(/=/,"x=2*3")".
> 
> You still need a program to process the output of split. Is its output the 
> final outcome? I suppose it is not. So there should a loop to iterate 
> through the list returned by split. Where is then a difference between 
> split + loop, and loop with Get_Next_Word inside? 

Difference is like difference between RANDOM and SEQUENTIAL acceses to
data.

In many cases it is not necessary to analyze all of string - enought to
know count of tokens or several from them on several well known positions.

> IMO, the difference is 
> that the second is faster and easier to understand.

Really? Have you seen that program (bcwords.ada)? And you'll assert that
that Ada's program is easier to understand? :-))))))
I have nothing to say...


Sorry me for long quotations. But look everybody and who will risk to say,
below Ada's program is simpler and easier to understand than equal Perl's
program (which is more than in 10 times smaller)? Don't think about how
much time needs that program to be written and debugged. Think, how much
time it needs to simply type it in text editor correctly. :-)))))


--  This demonstration is a response to a suggestion by John English
--  <J.English@bton.ac.uk> about assessing different component libraries, in
--  the context of the Ada Standard Component Library WG
--  (http://www.suffix.com/Ada/SCL/). It uses part of Corey Minford
--  <minyard@acm.org>'s solution (why reinvent that wheel?!)

--  John said:

--  As a way of objectively assessing the merits of different approaches,
--  perhaps the way to do this is to code some examples; one of my
--  favourites for this is a program to list the 10 most common words in a
--  file with the number of occurrences of each, where the length of words
--  and the size of the file can be arbitrarily large. In Perl it might look
--  something like this:

--   while (<>) {                     # for each line in the input file(s)
--     chomp;                         # trim the end of the line
--     tr/A-Z/a-z/;                   # fold uppercase to lowercase
--     @words = split /\W+/;          # break the line into words
--     foreach (@words) {
--       if (/^\w+$/) {               # ignore non-words
--         $wordlist{$_}++;           # increment count in associative array
--       }                            # (key = word, val = no. of occurrences)
--     }
--   }
--   $times = 0;
--   foreach (sort {$wordlist{$b} <=> $wordlist{$a}} (keys %wordlist)) {
--     last if (++$times > 10);       # exit loop after 10 iterations
--     print "$_ : $wordlist{$_}\n";  # process array in descending order
--   }                                # of value, printing keys and values

--  What would this look like in Ada using each of the libraries you've
--  listed?  Does anyone else have favourite examples like this?

--  $Id: bcwords.ada,v 1.6 2001/09/23 15:25:10 simon Exp $

with Ada.Strings.Unbounded;
with Ada.Text_IO;
with Word_Parser;
with Word_Count_Support;

procedure Word_Count is
   Word_Found : Boolean;
   File_Done : Boolean;
   Word : Ada.Strings.Unbounded.Unbounded_String;
   Word_Bag : Word_Count_Support.BU.Bag;
   Word_Tree : Word_Count_Support.ST.AVL_Tree;
   Word_Bag_Iter : Word_Count_Support.Containers.Iterator'Class
     := Word_Count_Support.BU.New_Iterator (Word_Bag);
   procedure Word_Processor (Item : Ada.Strings.Unbounded.Unbounded_String;
                             Ok : out Boolean);
   procedure Word_Processor (Item : Ada.Strings.Unbounded.Unbounded_String;
                             Ok : out Boolean) is
      Dummy : Boolean;
   begin
      Word_Count_Support.ST.Insert
        (Word_Tree,
         Word_Count_Support.Word_Stat'
         (Word => Item,
          Count => Word_Count_Support.BU.Count (Word_Bag, Item)),
         Dummy);
      Ok := True;
   end Word_Processor;
   procedure Word_Bag_Visitor
   is new Word_Count_Support.Containers.Visit (Word_Processor);
   Number_Output : Natural := 0;
   procedure Tree_Processor (Item : Word_Count_Support.Word_Stat;
                             OK : out Boolean);
   procedure Tree_Processor (Item : Word_Count_Support.Word_Stat;
                             OK : out Boolean) is
   begin
      Ada.Text_IO.Put_Line
        (Ada.Strings.Unbounded.To_String (Item.Word)
         & " =>"
         & Positive'Image (Item.Count));
      Number_Output := Number_Output + 1;
      OK := Number_Output < 10;          --  this is where we select the top 10
   end Tree_Processor;
   procedure Tree_Visitor is new Word_Count_Support.ST.Visit (Tree_Processor);
begin
   loop
      Word_Parser.Get_Next_Word
        (Ada.Text_IO.Standard_Input, Word, Word_Found, File_Done);
      exit when not Word_Found;
      Word_Count_Support.Bags.Add (Word_Bag, Word);
   end loop;
   Word_Count_Support.Containers.Reset (Word_Bag_Iter);
   Word_Bag_Visitor (Word_Bag_Iter);
   Tree_Visitor (Word_Tree);
end Word_Count;
with Ada.Strings.Unbounded;
with BC.Containers;
with BC.Containers.Bags;
with BC.Containers.Bags.Unbounded;
with BC.Containers.Trees;
with BC.Containers.Trees.AVL;
with Global_Heap;

package Word_Count_Support is

   package Containers is new BC.Containers
     (Item => Ada.Strings.Unbounded.Unbounded_String,
        "=" => Ada.Strings.Unbounded."=");

   package Bags is new Containers.Bags;

   function Hash (S : Ada.Strings.Unbounded.Unbounded_String) return Positive;

   package BU is new Bags.Unbounded (Hash => Hash,
                                     Buckets => 1,
                                     Storage => Global_Heap.Storage);

   type Word_Stat is record
      Word : Ada.Strings.Unbounded.Unbounded_String;
      Count : Positive;
   end record;

   function ">" (L, R : Word_Stat) return Boolean;
   function "=" (L, R : Word_Stat) return Boolean;

   package Stat_Containers is new BC.Containers (Word_Stat);

   package Trees is new Stat_Containers.Trees;

   package ST is new Trees.AVL
     ("<" => ">",     --  we need the most popular first
      Storage => Global_Heap.Storage);

end Word_Count_Support;
package body Word_Count_Support is

   --  This is extraordinarily lazy, of course we should really invent
   --  some better hash function!
   function Hash
     (S : Ada.Strings.Unbounded.Unbounded_String) return Positive is
   begin
      return 1;
   end Hash;

   function ">" (L, R : Word_Stat) return Boolean is
      use type Ada.Strings.Unbounded.Unbounded_String;
   begin
      return L.Count > R.Count
        or else (L.Count = R.Count
                 and then L.Word > R.Word);
   end ">";

   function "=" (L, R : Word_Stat) return Boolean is
      use type Ada.Strings.Unbounded.Unbounded_String;
   begin
      return L.Count = R.Count
        and then L.Word = R.Word;
   end "=";

end Word_Count_Support;
--  by Corey Minyard
package body Word_Parser is

   Big_A_Pos   : Integer := Character'Pos ('A');
   Small_A_Pos : Integer := Character'Pos ('a');

   procedure Xlat_To_Lower_Case (C : in out Character);
   procedure Xlat_To_Lower_Case (C : in out Character) is
   begin
      if (C in 'A' .. 'Z') then
         C := Character'Val (Character'Pos (C) - Big_A_Pos + Small_A_Pos);
      end if;
   end Xlat_To_Lower_Case;

   procedure Get_Next_Word
     (File       : in File_Type;
      Word       : out Ada.Strings.Unbounded.Unbounded_String;
      Word_Found : out Boolean;
      File_Done  : out Boolean) is

      Tmp_Str    : String (1 .. 10);
      Word_Pos   : Positive := Tmp_Str'First;
      Input_Char : Character;
      In_Word    : Boolean := False;
   begin
      --  Start with an empty word.
      Word := Ada.Strings.Unbounded.To_Unbounded_String ("");

      File_Done := False;
      Word_Found := False;

      if (End_Of_File (File)) then
         Word_Found := False;
         File_Done := True;
      else
         loop
            Get (File, Input_Char);
            Xlat_To_Lower_Case (Input_Char);

            if (not In_Word) then
               if (Input_Char in 'a' .. 'z') then
                  In_Word := True;
                  Word_Found := True;
                  Tmp_Str (Word_Pos) := Input_Char;
                  Word_Pos := Word_Pos + 1;
               end if;
            elsif (Input_Char in 'a' .. 'z') then
               Tmp_Str (Word_Pos) := Input_Char;
               if (Word_Pos = Tmp_Str'Last) then
                  Word := Word & Tmp_Str;
                  Word_Pos := Tmp_Str'First;
               else
                  Word_Pos := Word_Pos + 1;
               end if;
            else
               exit;
            end if;

            if (End_Of_File (File)) then
               File_Done := True;
               exit;
            elsif (End_Of_Line (File) and In_Word) then
               exit;
            end if;
         end loop;

         if (Word_Pos /= Tmp_Str'First) then
            --  If we have some stuff left in the temporary string, put it into
            --  the word.
            Word := Word & Tmp_Str (Tmp_Str'First .. Word_Pos - 1);
         end if;
      end if;
   end Get_Next_Word;

end Word_Parser;
--  by Corey Minyard
with Ada.Strings.Unbounded; use type Ada.Strings.Unbounded.Unbounded_String;
with Ada.Text_IO; use Ada.Text_IO;
package Word_Parser is

   procedure Get_Next_Word
     (File       : in File_Type;
      Word       : out Ada.Strings.Unbounded.Unbounded_String;
      Word_Found : out Boolean;
      File_Done  : out Boolean);

end Word_Parser;