From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00, REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4 X-Google-Thread: 103376,8de7eedad50552f1 X-Google-Attributes: gid103376,public X-Google-Language: ENGLISH,ASCII-7-bit Path: g2news1.google.com!news1.google.com!proxad.net!proxad.net!newsfeed.arcor.de!news.arcor.de!not-for-mail From: "Dmitry A. Kazakov" Subject: Re: Ada bench : count words Newsgroups: comp.lang.ada User-Agent: 40tude_Dialog/2.0.14.1 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Reply-To: mailbox@dmitry-kazakov.de Organization: cbb software GmbH References: <87vf7n5njs.fsf@code-hal.de> <423f5813$0$9224$9b4e6d93@newsread4.arcor-online.net> Date: Tue, 22 Mar 2005 11:59:43 +0100 Message-ID: <18arnvu705ly4$.1wz6ybz1jt70y$.dlg@40tude.net> NNTP-Posting-Date: 22 Mar 2005 11:59:41 MET NNTP-Posting-Host: f809983c.newsread4.arcor-online.net X-Trace: DXC=@:N?elDKL1KRG:gi]CKmZG:ejgIfPPldDjW\KbG]kaMHdbobRQ:W=WBK=fR<:5ghh@WRXZ37ga[7JncfD5BXcIX@hE?HIfUYM1C X-Complaints-To: abuse@arcor.de Xref: g2news1.google.com comp.lang.ada:9719 Date: 2005-03-22T11:59:41+01:00 List-Id: On Tue, 22 Mar 2005 01:16:09 +0000, Marius Amado Alves wrote: > I took a shot at the count-words benchmark, a program to count lines, > words and characters. The Ada program currently published there is > broken. My program is correct and portable but: > > - the speed is circa 1/3 of the GCC C version > > - it fails to comply with the requirement that the input be taken from > standard input. To implement buffering, I have resorted to > Ada.Direct_IO, which I think cannot apply to standard input. Is Text_IO that bad? > Can you help with any of these points? Thanks. The complete program > follows. > > -- Count words in Ada for the language shootout > -- by Marius Amado Alves > > with Ada.Characters.Handling; > with Ada.Characters.Latin_1; > with Ada.Command_Line; > with Ada.Direct_IO; > with Ada.Strings.Fixed; > with Ada.Text_IO; > > procedure Count_Words is > > use Ada.Characters.Handling; > use Ada.Characters.Latin_1; > use Ada.Command_Line; > > Filename : String := Argument (1); > Buffer_Size : constant := 4096; > EOF : Character := FS; > EOL : String := (1 => LF); > Lines : Natural := 0; > Words : Natural := 0; > Total : Natural := 0; > In_Word : Boolean := False; > > function Is_Separator (C : Character) return Boolean is > begin > return Is_Control (C) or C = ' '; > end; Why don't you use character map here? > procedure Start_Word is > begin > In_Word := True; > end; > > procedure Finish_Word is > begin > Words := Words + 1; > In_Word := False; > end; > > procedure Process (S : in String) is > begin > Lines := Lines + Ada.Strings.Fixed.Count (S, EOL); Isn't it an extra pass? I think you should do parsing using FSM. Character classes are: EOL, delimiter, letter. It is either two character map tests or one case statement. I don't know what is faster. Probably you should test both. > for I in S'Range loop > if Is_Separator (S (I)) then > if In_Word then Finish_Word; end if; > else > if not In_Word then Start_Word; end if; > end if; > end loop; > end; > > begin > declare > package Character_IO is new Ada.Direct_IO (Character); > use Character_IO; > File : File_Type; > begin > Open (File, In_File, Filename); > Total := Natural (Size (File)); > Close (File); > end; > > declare > subtype Buffer_Type is String (1 .. Buffer_Size); > package Buffer_IO is new Ada.Direct_IO (Buffer_Type); > use Buffer_IO; > File : File_Type; > S : Buffer_Type; > begin > Open (File, In_File, Filename); > for I in 1 .. Total / Buffer_Size loop > Read (File, S); > Process (S); > end loop; > Close (File); > end; > > declare > subtype Rest_Type is String (1 .. Total rem Buffer_Size); > package Character_IO is new Ada.Direct_IO (Character); > use Character_IO; > File : File_Type; > S : Rest_Type; > begin > Open (File, In_File, Filename); > Set_Index (File, Count (Total - S'Length)); > for I in 1 .. S'Length loop > Read (File, S (I)); > end loop; > Close (File); > Process (S); > end; > > if In_Word then Finish_Word; end if; > > Ada.Text_IO.Put_Line > (Natural'Image (Lines) & > Natural'Image (Words) & > Natural'Image (Total)); > end; P.S. For word frequencies: gcc version uses hash + sort. I wonder if binary trees could be better here. Or even sorted arrays for simplicity, there is no item deletion, search is more often than insert... Of course, for tree node allocation one could use a stack pool instead of heap. (That's one of the weaknesses of this contest. Actually the method should have been specified) -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de