From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00,
	REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4
X-Google-Thread: 103376,8de7eedad50552f1
X-Google-Attributes: gid103376,public
X-Google-Language: ENGLISH,ASCII-7-bit
Path: 
 g2news1.google.com!news4.google.com!news.glorb.com!newsfeed00.sul.t-online.de!t-online.de!newsfeed.freenet.de!151.189.20.20.MISMATCH!newsfeed.arcor.de!news.arcor.de!not-for-mail
From: "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de>
Subject: Re: Ada bench : count words
Newsgroups: comp.lang.ada
User-Agent: 40tude_Dialog/2.0.14.1
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Reply-To: mailbox@dmitry-kazakov.de
Organization: cbb software GmbH
References: <ur7ib38sg.fsf@obry.net>
 <pan.2005.03.19.16.57.03.525542@linuxchip.demon.co.uk.uk.uk>
 <87vf7n5njs.fsf@code-hal.de> <umzsy3c8j.fsf@obry.net>
 <423f5813$0$9224$9b4e6d93@newsread4.arcor-online.net>
 <mailman.47.1111454204.23655.comp.lang.ada@ada-france.org>
 <18arnvu705ly4$.1wz6ybz1jt70y$.dlg@40tude.net>
 <mailman.48.1111492666.23655.comp.lang.ada@ada-france.org>
Date: Tue, 22 Mar 2005 13:17:43 +0100
Message-ID: <1q9cx4jt7802s.k45m6mcntl87$.dlg@40tude.net>
NNTP-Posting-Date: 22 Mar 2005 13:17:41 MET
NNTP-Posting-Host: 6d8b4431.newsread2.arcor-online.net
X-Trace: 
 DXC=MK=3D0E]RfDUhhl_USDNiOQ5U85hF6f;DjW\KbG]kaMH]kI_X=5KeaFimM]nd@:0iN[6LHn;2LCVNCOgUkn_?_YOi7m9b:SlmjO
X-Complaints-To: abuse@arcor.de
Xref: g2news1.google.com comp.lang.ada:9723
Date: 2005-03-22T13:17:41+01:00
List-Id: <comp.lang.ada>

On Tue, 22 Mar 2005 11:57:22 +0000, Marius Amado Alves wrote:

>>> ... To implement buffering, I have resorted to
>>> Ada.Direct_IO, which I think cannot apply to standard input.
>>
>> Is Text_IO that bad?
> 
> No, if you can solve The Get_Line puzzle :-)

What about Get (Item : out Character)?

I wonder if calling C-lib's read would qualify! (:-))

>>>     procedure Process (S : in String) is
>>>     begin
>>>        Lines := Lines + Ada.Strings.Fixed.Count (S, EOL);
>>
>> Isn't it an extra pass? I think you should do parsing using FSM.
>> Character
>> classes are: EOL, delimiter, letter. It is either two character map 
>> tests
>> or one case statement. I don't know what is faster. Probably you should
>> test both.
>>
>>>        for I in S'Range loop
>>>           if Is_Separator (S (I)) then
>>>              if In_Word then Finish_Word; end if;
>>>           else
>>>              if not In_Word then Start_Word; end if;
>>>           end if;
>>>        end loop;
>>>     end;
> 
> Note EOL is not a character, but a string, because in some environments 
> the thing is a combination of two characters.

I think Text_IO should translate it into LF. But anyway you can always
assume LF = new line, CR = space in the FSM.

> This one-character 
> version improves speed (but still only to 1/2 of C):
> 
>           for I in S'Range loop
>              if S (I) = EOL then Lines := Lines + 1; end if;
>              if Is_Separator (S (I)) then
>                 if In_Word then Finish_Word; end if;
>              else
>                 if not In_Word then Start_Word; end if;
>              end if;
>           end loop;
> 
> I have not tried with string matching (if that's what you mean with 
> "FSM") because the iteration was already there, and I doubt the 
> standard library implements it more efficiently than that.

No, I meant a finite state machine:

<<Word>>
   read character;
   case Char is
      when EOL    => Inc line count; goto Blank;
      when Space => goto Blank;
      when Letter => goto Word;
   end case;

<<Blank>>
   read character;
   case Char is
      when EOL    => Inc line count; goto Blank;
      when Space => goto Blank;
      when Letter => Inc word count; goto Word;
   end case;

<<Word>>
   read character;
   case Char is
      when EOL    => Inc line count; goto Blank;
      when Space => goto Blank;
      when Letter => goto Word;
   end case;

FSM is one of that rare cases where gotos are natural. What a pity that Ada
does not have arrays of labels! (:-))

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de