comp.lang.ada
 help / color / mirror / Atom feed
From: "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de>
Subject: Re: Ada bench : count words
Date: Tue, 22 Mar 2005 11:59:43 +0100
Date: 2005-03-22T11:59:41+01:00	[thread overview]
Message-ID: <18arnvu705ly4$.1wz6ybz1jt70y$.dlg@40tude.net> (raw)
In-Reply-To: mailman.47.1111454204.23655.comp.lang.ada@ada-france.org

On Tue, 22 Mar 2005 01:16:09 +0000, Marius Amado Alves wrote:

> I took a shot at the count-words benchmark, a program to count lines, 
> words and characters. The Ada program currently published there is 
> broken. My program is correct and portable but:
> 
> - the speed is circa 1/3 of the GCC C version
> 
> - it fails to comply with the requirement that the input be taken from 
> standard input. To implement buffering, I have resorted to 
> Ada.Direct_IO, which I think cannot apply to standard input.

Is Text_IO that bad?

> Can you help with any of these points? Thanks. The complete program 
> follows.
> 
> -- Count words in Ada for the language shootout
> -- by Marius Amado Alves
> 
> with Ada.Characters.Handling;
> with Ada.Characters.Latin_1;
> with Ada.Command_Line;
> with Ada.Direct_IO;
> with Ada.Strings.Fixed;
> with Ada.Text_IO;
> 
> procedure Count_Words is
> 
>     use Ada.Characters.Handling;
>     use Ada.Characters.Latin_1;
>     use Ada.Command_Line;
> 
>     Filename : String := Argument (1);
>     Buffer_Size : constant := 4096;
>     EOF : Character := FS;
>     EOL : String := (1 => LF);
>     Lines : Natural := 0;
>     Words : Natural := 0;
>     Total : Natural := 0;
>     In_Word : Boolean := False;
> 
>     function Is_Separator (C : Character) return Boolean is
>     begin
>        return Is_Control (C) or C = ' ';
>     end;

Why don't you use character map here?
 
>     procedure Start_Word is
>     begin
>        In_Word := True;
>     end;
> 
>     procedure Finish_Word is
>     begin
>        Words := Words + 1;
>        In_Word := False;
>     end;
> 
>     procedure Process (S : in String) is
>     begin
>        Lines := Lines + Ada.Strings.Fixed.Count (S, EOL);

Isn't it an extra pass? I think you should do parsing using FSM. Character
classes are: EOL, delimiter, letter. It is either two character map tests
or one case statement. I don't know what is faster. Probably you should
test both.

>        for I in S'Range loop
>           if Is_Separator (S (I)) then
>              if In_Word then Finish_Word; end if;
>           else
>              if not In_Word then Start_Word; end if;
>           end if;
>        end loop;
>     end;
>
> begin
>     declare
>        package Character_IO is new Ada.Direct_IO (Character);
>        use Character_IO;
>        File : File_Type;
>     begin
>        Open (File, In_File, Filename);
>        Total := Natural (Size (File));
>        Close (File);
>     end;
> 
>     declare
>        subtype Buffer_Type is String (1 .. Buffer_Size);
>        package Buffer_IO is new Ada.Direct_IO (Buffer_Type);
>        use Buffer_IO;
>        File : File_Type;
>        S : Buffer_Type;
>     begin
>        Open (File, In_File, Filename);
>        for I in 1 .. Total / Buffer_Size loop
>           Read (File, S);
>           Process (S);
>        end loop;
>        Close (File);
>     end;
> 
>     declare
>        subtype Rest_Type is String (1 .. Total rem Buffer_Size);
>        package Character_IO is new Ada.Direct_IO (Character);
>        use Character_IO;
>        File : File_Type;
>        S : Rest_Type;
>     begin
>        Open (File, In_File, Filename);
>        Set_Index (File, Count (Total - S'Length));
>        for I in 1 .. S'Length loop
>           Read (File, S (I));
>        end loop;
>        Close (File);
>        Process (S);
>     end;
> 
>     if In_Word then Finish_Word; end if;
> 
>     Ada.Text_IO.Put_Line
>       (Natural'Image (Lines) &
>        Natural'Image (Words) &
>        Natural'Image (Total));
> end;

P.S. For word frequencies: gcc version uses hash + sort. I wonder if binary
trees could be better here. Or even sorted arrays for simplicity, there is
no item deletion, search is more often than insert... Of course, for tree
node allocation one could use a stack pool instead of heap.

(That's one of the weaknesses of this contest. Actually the method should
have been specified)

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



  reply	other threads:[~2005-03-22 10:59 UTC|newest]

Thread overview: 122+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-03-19 16:22 Ada bench Pascal Obry
2005-03-19 16:55 ` Dr. Adrian Wrigley
2005-03-19 21:32   ` Michael Bode
2005-03-20  9:20     ` Pascal Obry
2005-03-20  9:39       ` Michael Bode
2005-03-20 11:16         ` Pascal Obry
2005-03-20 12:20           ` Dmitry A. Kazakov
2005-03-20 12:32             ` Pascal Obry
2005-03-21  1:42             ` (see below)
2005-03-21  2:24               ` (see below)
2005-03-21 15:00                 ` (see below)
2005-03-21  3:54               ` Ed Falis
2005-03-21 12:20                 ` Jeff C
2005-03-21 15:18                 ` (see below)
2005-03-21 15:24                   ` (see below)
2005-03-21 18:56                   ` Isaac Gouy
2005-03-21 21:31                 ` Randy Brukardt
2005-03-21 22:14                   ` Ed Falis
2005-03-21 18:07               ` Pascal Obry
2005-03-20 10:11       ` Adrian Knoth
2005-03-20 10:30         ` Michael Bode
2005-03-21 23:27       ` Georg Bauhaus
2005-03-22  1:16         ` Ada bench : count words Marius Amado Alves
2005-03-22 10:59           ` Dmitry A. Kazakov [this message]
2005-03-22 11:57             ` Marius Amado Alves
2005-03-22 12:17               ` Dmitry A. Kazakov
2005-03-22 12:47                 ` Marius Amado Alves
2005-03-22 13:08                   ` Dmitry A. Kazakov
2005-03-22 13:28                     ` Marius Amado Alves
2005-03-22 16:48                     ` Marius Amado Alves
2005-03-22 17:34                       ` Dmitry A. Kazakov
2005-03-27 20:14                         ` jtg
2005-03-27 21:22                           ` Dmitry A. Kazakov
2005-03-28 19:54                             ` jtg
2005-03-28 20:56                               ` Dmitry A. Kazakov
2005-03-29 12:40                                 ` jtg
2005-03-29 13:00                                   ` [OT] " Tapio Kelloniemi
2005-03-29 13:47                                     ` Dmitry A. Kazakov
2005-03-29 15:53                                       ` Tapio Kelloniemi
2005-03-29 16:17                                         ` Dmitry A. Kazakov
     [not found]                                         ` <k33j419lgei1ui89s26o1dlr9ccf1qe1hd@4ax.com>
2005-04-11 23:04                                           ` Marius Amado Alves
2005-03-29 13:47                                   ` Dmitry A. Kazakov
2005-04-01 20:58                                   ` Georg Bauhaus
2005-04-01 20:18                                     ` Pascal Obry
2005-03-22 12:53                 ` Marius Amado Alves
     [not found]                 ` <f205219321dd18dba878fab16b7cb50d@netcabo.pt>
2005-03-22 13:12                   ` Marius Amado Alves
2005-03-23 16:58                     ` Isaac Gouy
2005-03-22 13:58                 ` Robert A Duff
2005-03-22 16:30                   ` Marius Amado Alves
2005-03-22 16:41                     ` Tapio Kelloniemi
2005-03-22 17:39                       ` Marius Amado Alves
2005-03-22 18:59                         ` Dmitry A. Kazakov
2005-03-22 19:08                         ` Tapio Kelloniemi
2005-03-22 18:34                     ` Georg Bauhaus
2005-03-22 19:32                     ` Robert A Duff
2005-03-22 20:04                       ` tmoran
2005-03-23 16:55                       ` Isaac Gouy
     [not found]                   ` <1820eab50b57f2fe1c4e8e50bb0f4fe5@netcabo.pt>
2005-03-22 22:49                     ` Stephen Leake
2005-03-22 22:58                       ` Robert A Duff
2005-03-22 23:27                   ` Larry Kilgallen
2005-03-23 22:33                     ` Robert A Duff
2005-03-24  5:02                     ` Larry Kilgallen
     [not found]                     ` <wccpsxqro0c.fsfOrganization: LJK Software <bM40pHW6P2KW@eisner.encompasserve.org>
2005-03-25  1:34                       ` Robert A Duff
2005-03-22 12:22             ` Jeff C
2005-03-23 16:48               ` Isaac Gouy
2005-03-23 17:06             ` Isaac Gouy
2005-03-22 19:49           ` tmoran
2005-03-22 21:51             ` Dmitry A. Kazakov
2005-03-23  0:16               ` tmoran
2005-03-23  7:25                 ` Dmitry A. Kazakov
2005-03-22 22:33             ` Marius Amado Alves
     [not found]             ` <00b362390273e6c04844dd4ff1885ee0@netcabo.pt>
2005-03-23 15:09               ` Marius Amado Alves
2005-03-23 19:00                 ` tmoran
2005-03-23 19:30                   ` tmoran
2005-03-23 21:38                     ` tmoran
2005-03-25  7:30                       ` Simon Wright
2005-03-25  9:38                         ` tmoran
2005-03-25 16:20                           ` Tapio Kelloniemi
2005-03-25 22:18                             ` Ada:The High Performance Language for Hyperthreaded and Multicore CPUs; was " tmoran
2005-03-23 19:54                   ` Tapio Kelloniemi
2005-03-23 20:39                     ` Ada bench : word frequency Marius Amado Alves
2005-03-23 21:26                       ` Isaac Gouy
2005-03-24  1:24                         ` Marius Amado Alves
2005-03-24 17:23                           ` Isaac Gouy
2005-03-24 19:52                           ` Martin Dowie
2005-04-11 15:11                             ` Marius Amado Alves
2005-03-24 20:16                           ` Pascal Obry
2005-03-24 22:54                             ` tmoran
2005-04-11 15:38                               ` Marius Amado Alves
2005-03-24 21:18                           ` Gautier Write-only
2005-04-11 15:32                             ` Marius Amado Alves
2005-04-11 23:56                               ` Robert A Duff
2005-04-28  4:04                       ` Matthew Heaney
2005-04-28 20:40                         ` Matthew Heaney
2005-03-23 21:38                     ` Ada bench : count words tmoran
2005-03-24 20:19                   ` tmoran
2005-03-24 21:00                     ` Pascal Obry
2005-03-24 22:54                       ` tmoran
2005-03-30 16:08                 ` Andre
2005-03-30 16:36                   ` Pascal Obry
2005-03-22 22:27           ` Dmitry A. Kazakov
2005-03-23  7:46             ` Pascal Obry
2005-03-23  7:56               ` Dmitry A. Kazakov
2005-03-23 13:38                 ` Robert A Duff
2005-03-22  7:05         ` Ada bench Pascal Obry
2005-04-07 20:59   ` David Sauvage
2005-04-07 23:40     ` David Sauvage
2005-04-08 17:11     ` Pascal Obry
2005-04-08 17:39       ` tmoran
2005-04-08 18:49       ` David Sauvage
2005-04-18 19:14       ` David Sauvage
2005-04-19 16:43         ` Matthew Heaney
2005-04-19 23:22           ` David Sauvage
2005-04-20  0:49             ` Matthew Heaney
2005-04-20  4:22               ` Georg Bauhaus
2005-04-20 21:24               ` David Sauvage
2005-04-20 23:06                 ` Georg Bauhaus
2005-04-20  9:41             ` Pascal Obry
2005-04-20 11:44               ` Matthew Heaney
2005-04-20 14:47                 ` Pascal Obry
2005-04-20 19:26                   ` Georg Bauhaus
2005-04-20 19:34                     ` Pascal Obry
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox