From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00, REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4 X-Google-Thread: 103376,8de7eedad50552f1 X-Google-Attributes: gid103376,public X-Google-Language: ENGLISH,ASCII-7-bit Path: g2news1.google.com!news4.google.com!news.glorb.com!newsfeed00.sul.t-online.de!t-online.de!newsfeed.freenet.de!151.189.20.20.MISMATCH!newsfeed.arcor.de!news.arcor.de!not-for-mail From: "Dmitry A. Kazakov" Subject: Re: Ada bench : count words Newsgroups: comp.lang.ada User-Agent: 40tude_Dialog/2.0.14.1 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Reply-To: mailbox@dmitry-kazakov.de Organization: cbb software GmbH References: <87vf7n5njs.fsf@code-hal.de> <423f5813$0$9224$9b4e6d93@newsread4.arcor-online.net> <18arnvu705ly4$.1wz6ybz1jt70y$.dlg@40tude.net> Date: Tue, 22 Mar 2005 13:17:43 +0100 Message-ID: <1q9cx4jt7802s.k45m6mcntl87$.dlg@40tude.net> NNTP-Posting-Date: 22 Mar 2005 13:17:41 MET NNTP-Posting-Host: 6d8b4431.newsread2.arcor-online.net X-Trace: DXC=MK=3D0E]RfDUhhl_USDNiOQ5U85hF6f;DjW\KbG]kaMH]kI_X=5KeaFimM]nd@:0iN[6LHn;2LCVNCOgUkn_?_YOi7m9b:SlmjO X-Complaints-To: abuse@arcor.de Xref: g2news1.google.com comp.lang.ada:9723 Date: 2005-03-22T13:17:41+01:00 List-Id: On Tue, 22 Mar 2005 11:57:22 +0000, Marius Amado Alves wrote: >>> ... To implement buffering, I have resorted to >>> Ada.Direct_IO, which I think cannot apply to standard input. >> >> Is Text_IO that bad? > > No, if you can solve The Get_Line puzzle :-) What about Get (Item : out Character)? I wonder if calling C-lib's read would qualify! (:-)) >>> procedure Process (S : in String) is >>> begin >>> Lines := Lines + Ada.Strings.Fixed.Count (S, EOL); >> >> Isn't it an extra pass? I think you should do parsing using FSM. >> Character >> classes are: EOL, delimiter, letter. It is either two character map >> tests >> or one case statement. I don't know what is faster. Probably you should >> test both. >> >>> for I in S'Range loop >>> if Is_Separator (S (I)) then >>> if In_Word then Finish_Word; end if; >>> else >>> if not In_Word then Start_Word; end if; >>> end if; >>> end loop; >>> end; > > Note EOL is not a character, but a string, because in some environments > the thing is a combination of two characters. I think Text_IO should translate it into LF. But anyway you can always assume LF = new line, CR = space in the FSM. > This one-character > version improves speed (but still only to 1/2 of C): > > for I in S'Range loop > if S (I) = EOL then Lines := Lines + 1; end if; > if Is_Separator (S (I)) then > if In_Word then Finish_Word; end if; > else > if not In_Word then Start_Word; end if; > end if; > end loop; > > I have not tried with string matching (if that's what you mean with > "FSM") because the iteration was already there, and I doubt the > standard library implements it more efficiently than that. No, I meant a finite state machine: <> read character; case Char is when EOL => Inc line count; goto Blank; when Space => goto Blank; when Letter => goto Word; end case; <> read character; case Char is when EOL => Inc line count; goto Blank; when Space => goto Blank; when Letter => Inc word count; goto Word; end case; <> read character; case Char is when EOL => Inc line count; goto Blank; when Space => goto Blank; when Letter => goto Word; end case; FSM is one of that rare cases where gotos are natural. What a pity that Ada does not have arrays of labels! (:-)) -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de