comp.lang.ada
 help / color / mirror / Atom feed
* [Shootout] Spellcheck.adb
@ 2005-04-25 21:30 albert.bachmann
  2005-04-25 21:57 ` Spellcheck.adb Albert Bachmann
                   ` (4 more replies)
  0 siblings, 5 replies; 23+ messages in thread
From: albert.bachmann @ 2005-04-25 21:30 UTC (permalink / raw)


Dear newsgroup,

on http://shootout.alioth.debian.org/ I discovered that the Ada entry
for the "spellcheck" benchmark produces an error according to the
information on that website. The specification for this benchmark can
be found here:
http://shootout.alioth.debian.org/benchmark.php?test=spellcheck&lang=all&sort=fullcpu

Since I am new to Ada I decided that it might be a good idea to
implement a solution and see how it performs. Finally I came up with a
naive program that performs not very well in comparison to other
language implementations. I produced a naive implementation in C++ as
well which solves the problem in less than half the time. The entry for
C++ at the Shootout takes half the time of my naive solution.

Now what I'd like to ask you is how I can speed up my Ada solution? As
I said I'm a novice and certainly there is a huge optimization
potential here. Here is the output of 'time':

Ada:

real    0m0.742s
user    0m0.720s
sys     0m0.009s

C++:

real    0m0.217s
user    0m0.192s
sys     0m0.008s


First I post the C++ solution since it is much shorter and easy to
understand:

#include <map>
#include <string>
#include <fstream>
#include <iostream>

using namespace std;

int main()
{
	map<string, bool> dict;
	string key;

	ifstream in("spellcheck-dict.txt");

	while (getline(in, key))
		dict[key] = true;

	while (cin >> key)
	{
		if (!dict[key])
			cout << key << endl;
	}
}


And now spellcheck.adb:


with ada.text_io;
with ada.integer_text_io;
with gnat.htable;
use  ada;

procedure spellcheck is

	subtype word is string(1 .. 128);
	subtype index is natural range 0 .. 2760;

	file        : text_io.file_type;
	item        : word;
	item_offset : natural;
	no_item     : constant boolean := false;

	function hash(item : word) return index is
		a : constant positive := 127;
		h : index := 0;
	begin
		for i in item'range loop
			h := (a * h + character'pos(item(i))) mod index'last;
		end loop;

		return h;
	end hash;

	package dictionary is new gnat.htable.simple_htable(header_num =>
index,
							element => boolean,
							no_element => no_item,
							key => word,
							hash => hash,
							equal => "=");

begin
	text_io.open(file, text_io.in_file, "spellcheck-dict.txt");

	while text_io.end_of_file(file) = false loop
		text_io.get_line(file, item, item_offset);
		strings.fixed.delete(item, item_offset + 1, item'last);
		dictionary.set(item, true);
	end loop;

	while text_io.end_of_file = false loop
		text_io.get_line(item, item_offset);
		strings.fixed.delete(item, item_offset + 1, item'last);
		if dictionary.get(item) = no_item then
			text_io.put_line(item);
			null;
		end if;
	end loop;
	
	text_io.close(file);

end spellcheck;


Regards,
Albert




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Spellcheck.adb
  2005-04-25 21:30 [Shootout] Spellcheck.adb albert.bachmann
@ 2005-04-25 21:57 ` Albert Bachmann
  2005-04-25 23:27 ` [Shootout] Spellcheck.adb Marius Amado Alves
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 23+ messages in thread
From: Albert Bachmann @ 2005-04-25 21:57 UTC (permalink / raw)


That "null;" statement is a holdover from previous testing and should
be deleted.




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Shootout] Spellcheck.adb
  2005-04-25 21:30 [Shootout] Spellcheck.adb albert.bachmann
  2005-04-25 21:57 ` Spellcheck.adb Albert Bachmann
@ 2005-04-25 23:27 ` Marius Amado Alves
  2005-04-26 18:04 ` Spellcheck.adb Matthew Heaney
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 23+ messages in thread
From: Marius Amado Alves @ 2005-04-25 23:27 UTC (permalink / raw)
  To: comp.lang.ada

At first sight the thing to improve is the hash function. The current 
function seems too expensive. Some time ago I had this problem. I 
invented a less expensive function with good dispersion. I don't have 
the code with me now, but it was as follows. Treat both the input and 
output values as bit strings X (1 .. M) and Y (1 .. N) respectively. 
(Use representation clauses, packed arrays, and unchecked conversion 
for this.) Make N small, say 16. Set Y bits to the value of X bits as 
follows:

if M > N, Y (J) = X (1 + (J - 1) * M / N), for J in 1 .. N

if M = N, Y = X

if M < N, initialize Y to all zeros,
  then let Y (1 + (I - 1) * N / M) = X (I), for I in 1 .. M

Another thing to improve may be the text input. Buffering with streams 
has proved faster than Text_IO.




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Spellcheck.adb
  2005-04-25 21:30 [Shootout] Spellcheck.adb albert.bachmann
  2005-04-25 21:57 ` Spellcheck.adb Albert Bachmann
  2005-04-25 23:27 ` [Shootout] Spellcheck.adb Marius Amado Alves
@ 2005-04-26 18:04 ` Matthew Heaney
  2005-04-26 19:36 ` [Shootout] Spellcheck.adb David Sauvage
  2005-04-26 20:36 ` [Shootout] Spellcheck.adb Dmitry A. Kazakov
  4 siblings, 0 replies; 23+ messages in thread
From: Matthew Heaney @ 2005-04-26 18:04 UTC (permalink / raw)



albert.bachmann@gmx.de wrote:
>
>
> First I post the C++ solution since it is much shorter and easy to
> understand:

The Ada solution based on the Ada 2005 standard container library isn't
much different from your C++ example:


with Hashed_Word_Sets;  use Hashed_Word_Sets;
with Ada.Text_IO;  use Ada.Text_IO;

procedure Spellcheck is

   Words : Set;

   Line : String (1 .. 132);
   Last : Natural;

   File : File_Type;

begin

   Open (File, In_File, "spellcheck-dict.txt");

   while not End_Of_File (File) loop
      Get_Line (File, Line, Last);
      Insert (Words, New_Item => Line (Line'First .. Last));
   end loop;

   while not End_Of_File loop
      Get_Line (Line, Last);

      if not Contains (Words, Item => Line (Line'First .. Last)) then
         Put_Line (Line (Line'First .. Last));
      end if;
   end loop;

end Spellcheck;


<http://charles.tigris.org/source/browse/charles/src/ai302/examples/spellcheck/>

We did determine that the hash function doesn't perform well for small
strings, but I haven't fixed it yet.  If you're desperate I can fix it
tonight.

-Matt




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Shootout] Spellcheck.adb
  2005-04-25 21:30 [Shootout] Spellcheck.adb albert.bachmann
                   ` (2 preceding siblings ...)
  2005-04-26 18:04 ` Spellcheck.adb Matthew Heaney
@ 2005-04-26 19:36 ` David Sauvage
  2005-04-26 20:50   ` Spellcheck.adb Albert Bachmann
  2005-04-26 21:21   ` Spellcheck.adb Albert Bachmann
  2005-04-26 20:36 ` [Shootout] Spellcheck.adb Dmitry A. Kazakov
  4 siblings, 2 replies; 23+ messages in thread
From: David Sauvage @ 2005-04-26 19:36 UTC (permalink / raw)


Hi,

happy to read your post, also because i'm the folk who proposed the 
spellcheck ada implementation :-), i don't undestand yet the shootout 
spellcheck error as it works nicely on my system, does it failed on 
yours too ? may be you could describe the errors here if so ? thanks

albert.bachmann@gmx.de wrote:
> Now what I'd like to ask you is how I can speed up my Ada solution? 
You can find interesting discussion on the spellcheck implementation & 
speed issue in the thread named "Ada Bench" of comp.lang.ada
you could use :
- Ada.Streams.Stream_IO.Read instead of Ada.Text_IO.Get_Line
- the shootout spellcheck C++ hash function as hash function

albert.bachmann@gmx.de wrote:
> 
> real    0m0.742s
> user    0m0.720s
> sys     0m0.009s
> 
Is it for N=1 ? could you describe your system configuration & the 
command line used to launch the program ?

Well done, i was about 5 min on my first implementation ;-)
here is the shootout spellcheck implementation (for N=1) duration, you 
can take a look on my system configuration on the Ada Bench thread :
real    0m0.126s
user    0m0.078s
sys     0m0.013s

To compare (just because we are talking about benchmark and how to speed 
up our programs) here is your posted implementation on my system (i add 
the line with Ada.Strings.Fixed; as "strings.fixed.delete" was not 
handle by my compiler without it)
real    0m1.432s
user    0m1.361s
sys     0m0.014s

Your next version should be far far better ;-)

Hope it could help ;-)
David



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Shootout] Spellcheck.adb
  2005-04-25 21:30 [Shootout] Spellcheck.adb albert.bachmann
                   ` (3 preceding siblings ...)
  2005-04-26 19:36 ` [Shootout] Spellcheck.adb David Sauvage
@ 2005-04-26 20:36 ` Dmitry A. Kazakov
  2005-04-26 21:50   ` David Sauvage
  4 siblings, 1 reply; 23+ messages in thread
From: Dmitry A. Kazakov @ 2005-04-26 20:36 UTC (permalink / raw)


On 25 Apr 2005 14:30:42 -0700, albert.bachmann@gmx.de wrote:

> on http://shootout.alioth.debian.org/ I discovered that the Ada entry
> for the "spellcheck" benchmark produces an error according to the
> information on that website. The specification for this benchmark can
> be found here:
> http://shootout.alioth.debian.org/benchmark.php?test=spellcheck&lang=all&sort=fullcpu
> 
> Since I am new to Ada I decided that it might be a good idea to
> implement a solution and see how it performs. Finally I came up with a
> naive program that performs not very well in comparison to other
> language implementations. I produced a naive implementation in C++ as
> well which solves the problem in less than half the time. The entry for
> C++ at the Shootout takes half the time of my naive solution.
> 
> Now what I'd like to ask you is how I can speed up my Ada solution? As
> I said I'm a novice and certainly there is a huge optimization
> potential here. Here is the output of 'time':

-------------- spell_check.adb -----------------
with Ada.Text_IO;        use Ada.Text_IO;
with Dictionary_Tables;  use Dictionary_Tables;

procedure Spell_Check is
   File : File_type;
   Line : String (1..130);
   Last : Natural;
   Dictionary : Table;
begin
   Open (File, In_File, "spellcheck-dict.txt");
   begin -- Reading dictionary
      loop
	   Get_Line (File, Line, Last);
         Add (Dictionary, Line (Line'First..Last), true);
      end loop;
   exception
      when End_Error =>
         null;
   end;
   Close (File);

   begin -- Reading test file
      loop
	   Get_Line (Line, Last);
         begin
   	      if Find (Dictionary, Line (Line'First..Last)) then
               null;
            end if;
         exception
            when End_Error =>
	         Put_Line (Line (Line'First..Last));
         end;
      end loop;
   exception
      when End_Error =>
         null;
   end;
end Spell_Check;
----------- dictionary_tables.adb ----------------------
with Tables;
package Dictionary_Tables is new Tables (Boolean);

The above is approximately five times faster on my computer under FC3.
Probably, because it does not use hash tables?

[ The package tables is here: http://www.dmitry-kazakov.de/ada/tables.htm ]

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Spellcheck.adb
  2005-04-26 19:36 ` [Shootout] Spellcheck.adb David Sauvage
@ 2005-04-26 20:50   ` Albert Bachmann
  2005-04-27 15:11     ` Spellcheck.adb Matthew Heaney
  2005-04-26 21:21   ` Spellcheck.adb Albert Bachmann
  1 sibling, 1 reply; 23+ messages in thread
From: Albert Bachmann @ 2005-04-26 20:50 UTC (permalink / raw)


Hello David,

my system is a Fedora Core 3 Linux box with a Pentium 4 running at 2.4
GHz with 512MB RAM. I use GNAT from GCC-4.0.0. with -O3. All my timings
are for n = 1.

In the meantime I followed the advice from Matthew Heaney (thanks
Matthew) and went on with  Ada.Containers. I again tried a simple
implementation. Unfortunately I recognized that controlled types have
to be instantiated at library level which requires now a total of 3
files since the instantiation of ada.containers.hashed_maps needs a
definite subtype for its key_type.

Anyway the resulting solution runs with the following times:

real    0m0.352s
user    0m0.318s
sys     0m0.015s

Here are the file contents:

types.ads:
----------

package types is
	subtype word is string(1 .. 128);
end types;

hashmap.ads:
------------

with ada.strings.fixed;
with ada.strings.hash;
with ada.containers.hashed_maps;
with types;

package hashmap is new ada.containers.hashed_maps(key_type  =>
types.word,
					element_type    => boolean,
					hash            => ada.strings.hash,
					equivalent_keys => "=");

and finally spellcheck2.adb:
----------------------------

with ada.strings.fixed;
with ada.strings.hash;
with ada.text_io;
with ada.integer_text_io;
with hashmap;
with types;
use  types;
use  ada;

procedure spellcheck2 is

	subtype index is natural range 0 .. 2760;

	file        : text_io.file_type;
	item        : word;
	item_offset : natural;
	dict        : hashmap.map;

begin

	text_io.open(file, text_io.in_file, "spellcheck-dict.txt");

	while not text_io.end_of_file(file) loop
		text_io.get_line(file, item, item_offset);
		strings.fixed.delete(item, item_offset + 1, word'last);
		hashmap.insert(dict, item, true);
	end loop;

	while not text_io.end_of_file loop
		text_io.get_line(item, item_offset);
		strings.fixed.delete(item, item_offset + 1, word'last);
		if not hashmap.contains(dict, item) then
			text_io.put_line(item);
		end if;
	end loop;

	text_io.close(file);

end spellcheck2;

Please note that those attempts are only quick and dirty hacks. If
someone knows how to get rid of the seperate types.ads file I would
welcome his advice.

So far I have not tested your proposed solution but eventually I will
do so and let you know about any possible issues.

Regards,
Albert




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Spellcheck.adb
  2005-04-26 19:36 ` [Shootout] Spellcheck.adb David Sauvage
  2005-04-26 20:50   ` Spellcheck.adb Albert Bachmann
@ 2005-04-26 21:21   ` Albert Bachmann
  2005-04-26 22:04     ` Spellcheck.adb David Sauvage
  1 sibling, 1 reply; 23+ messages in thread
From: Albert Bachmann @ 2005-04-26 21:21 UTC (permalink / raw)


David,

regarding your entry at the Shootout--it compiled and ran fine on my
system. Timing is excellent:

real    0m0.074s
user    0m0.054s
sys     0m0.012s

I don't know the infrastructure used by the Shootout. I think there is
a mailing list for questions like this.

Regards,
Albert.




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Shootout] Spellcheck.adb
  2005-04-26 20:36 ` [Shootout] Spellcheck.adb Dmitry A. Kazakov
@ 2005-04-26 21:50   ` David Sauvage
  2005-04-27  8:40     ` Dmitry A. Kazakov
  0 siblings, 1 reply; 23+ messages in thread
From: David Sauvage @ 2005-04-26 21:50 UTC (permalink / raw)


Dmitry A. Kazakov wrote:
> ...
> The above is approximately five times faster on my computer under FC3.
> Probably, because it does not use hash tables?
> 

GNAT.Spitbol.Table generic use hash tables (i wonder if it could be 
called an hashed table maps method?), may be people would like to have a 
look on it's hash function (see #1)

The published shootout Ada spellcheck using GNAT.Spitbol.Table :
real    0m0.120s
user    0m0.083s
sys     0m0.007s

(i run all the programs several times before each post ;-), so i get 
fresh time average each post)

I try using the Tables generic package implementation (see #2) instead 
of GNAT.Spitbol.Table generic on the shootout Ada spellcheck 
implementation, which gave me  :
real    0m0.271s
user    0m0.197s
sys     0m0.014s

system configuration :
Debian GNU/Linux Sid - kernel 2.6.10 - 1.2 Ghz
gcc 3-3, GNAT 3.15p-12

#1 the GNAT.Spitbol.Table generic hash function
------------------------------------------------
(see g-spitbo.adb)
       function Hash (Str : String) return Unsigned_32 is
          Result : Unsigned_32 := Str'Length;
       begin
          for J in Str'Range loop
             Result := Rotate_Left (Result, 1) +
                       Unsigned_32 (Character'Pos (Str (J)));
          end loop;

          return Result;
       end Hash;

#2 The Tables generic package :
-------------------------------
(see previous post)
 > [ http://www.dmitry-kazakov.de/ada/tables.htm ]

David



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Spellcheck.adb
  2005-04-26 21:21   ` Spellcheck.adb Albert Bachmann
@ 2005-04-26 22:04     ` David Sauvage
  0 siblings, 0 replies; 23+ messages in thread
From: David Sauvage @ 2005-04-26 22:04 UTC (permalink / raw)


Hello again,

Albert Bachmann wrote:
> regarding your entry at the Shootout--it compiled and ran fine on my
> system. Timing is excellent:
> 
> real    0m0.074s
> user    0m0.054s
> sys     0m0.012s
> 

Surprisely your timing is just twice faster than mine ;-) may be the 
1.2Ghz to 2.4 Ghz ;-)

Excellent to hear that it run well on your system too.

> I don't know the infrastructure used by the Shootout. I think there is
> a mailing list for questions like this.
> 

yes thanks for your advise, i will try posting on the mailing list.

David



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Shootout] Spellcheck.adb
  2005-04-26 21:50   ` David Sauvage
@ 2005-04-27  8:40     ` Dmitry A. Kazakov
  0 siblings, 0 replies; 23+ messages in thread
From: Dmitry A. Kazakov @ 2005-04-27  8:40 UTC (permalink / raw)


On Tue, 26 Apr 2005 23:50:12 +0200, David Sauvage wrote:

> Dmitry A. Kazakov wrote:
>> ...
>> The above is approximately five times faster on my computer under FC3.
>> Probably, because it does not use hash tables?
> 
> GNAT.Spitbol.Table generic use hash tables (i wonder if it could be 
> called an hashed table maps method?), may be people would like to have a 
> look on it's hash function (see #1)
> 
> The published shootout Ada spellcheck using GNAT.Spitbol.Table :
> real    0m0.120s
> user    0m0.083s
> sys     0m0.007s
> 
> (i run all the programs several times before each post ;-), so i get 
> fresh time average each post)
> 
> I try using the Tables generic package implementation (see #2) instead 
> of GNAT.Spitbol.Table generic on the shootout Ada spellcheck 
> implementation, which gave me  :
> real    0m0.271s
> user    0m0.197s
> sys     0m0.014s

Aha, now it is how it must be! (:-))

Hash table should be slower than sorted arrays on natural dictionaries. But
because it is allocated in advance (40_000 items), that should give it an
advantage over a dynamically allocated sorted array, when sizes of the
dictionary and of the text are close:

1 x malloc + n x string compare >> n x (hash + integer compare),
though string compare < hash + integer compare

when n ~ 1 two times looks OK.

[I can't stop wondering how badly wrong the benchmarks were designed]

BTW, you can try to use streams in the second part of your code. That
should bring another 10ms or so.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Spellcheck.adb
  2005-04-26 20:50   ` Spellcheck.adb Albert Bachmann
@ 2005-04-27 15:11     ` Matthew Heaney
  2005-04-27 22:49       ` Spellcheck.adb Albert Bachmann
  0 siblings, 1 reply; 23+ messages in thread
From: Matthew Heaney @ 2005-04-27 15:11 UTC (permalink / raw)



"Albert Bachmann" <albert.bachmann@gmx.de> wrote in message 
news:1114548638.851249.246280@f14g2000cwb.googlegroups.com...
>
> In the meantime I followed the advice from Matthew Heaney (thanks
> Matthew) and went on with  Ada.Containers. I again tried a simple
> implementation. Unfortunately I recognized that controlled types have
> to be instantiated at library level

This is no longer true in Ada 2005.  Eventually you'll be able to 
instantiate the containers in your main subprogram.


> which requires now a total of 3
> files since the instantiation of ada.containers.hashed_maps needs a
> definite subtype for its key_type.

True, but that's why we have the Indefinite_Hashed_Maps container package. 
If you're manipulating strings, you should probably be using that.

Also, as I mentioned in an earlier post, you don't really need a map, since 
all you're doing is performing a membership test.  Hence, a (hashed) set 
will do.

> types.ads:
> ----------
>
> package types is
> subtype word is string(1 .. 128);
> end types;

Get rid of this.  Instantiate the Indefinite_Hashed_Sets (or _Maps, if you 
prefer) with type String.


> hashmap.ads:
> ------------
>
> package hashmap is new ada.containers.hashed_maps(key_type  =>
> types.word,

No.  This should just be type String.


> and finally spellcheck2.adb:
> ----------------------------
>
> strings.fixed.delete(item, item_offset + 1, word'last);

You can get rid of this.


> hashmap.insert(dict, item, true);

If you used a (hashed) set, then all you'd have to say is

  insert(dict, item);


> strings.fixed.delete(item, item_offset + 1, word'last);

You can get rid of this.


> if not hashmap.contains(dict, item) then

See, you're not using the map element at all.  You're only using the key.


> Please note that those attempts are only quick and dirty hacks. If
> someone knows how to get rid of the seperate types.ads file I would
> welcome his advice.

You can get rid of the types file, by using the indefinite hashed map (or 
set), and instantiating it with type String.

-Matt






^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Spellcheck.adb
  2005-04-27 15:11     ` Spellcheck.adb Matthew Heaney
@ 2005-04-27 22:49       ` Albert Bachmann
  2005-04-27 23:35         ` Spellcheck.adb Marius Amado Alves
  2005-04-28  0:41         ` Spellcheck.adb Matthew Heaney
  0 siblings, 2 replies; 23+ messages in thread
From: Albert Bachmann @ 2005-04-27 22:49 UTC (permalink / raw)


On Wed, 27 Apr 2005 11:11:06 -0400, Matthew Heaney wrote:
> 
> "Albert Bachmann" <albert.bachmann@gmx.de> wrote in message 
> news:1114548638.851249.246280@f14g2000cwb.googlegroups.com...
>>
>> In the meantime I followed the advice from Matthew Heaney (thanks
>> Matthew) and went on with  Ada.Containers. I again tried a simple
>> implementation. Unfortunately I recognized that controlled types have
>> to be instantiated at library level
> 
> This is no longer true in Ada 2005.  Eventually you'll be able to 
> instantiate the containers in your main subprogram.

That may be true but with GNAT (as of GCC-4.0.0) I get an error:

gcc -c -gnat05 -O3 spellcheck3.adb
spellcheck3.adb:10:09: instantiation error at a-cohata.ads:29
spellcheck3.adb:10:09: instantiation error at a-cihase.ads:214
spellcheck3.adb:10:09: controlled type must be declared at the library level
gnatmake: "spellcheck3.adb" compilation error

> 
> 
>> which requires now a total of 3
>> files since the instantiation of ada.containers.hashed_maps needs a
>> definite subtype for its key_type.
> 
> True, but that's why we have the Indefinite_Hashed_Maps container package. 
> If you're manipulating strings, you should probably be using that.
> 
> Also, as I mentioned in an earlier post, you don't really need a map, since 
> all you're doing is performing a membership test.  Hence, a (hashed) set 
> will do.
> 

When I use and instance of indefinite_hashed_sets or
indefinite_hashed_maps the program compiles fine but I get an exception
during runtime:

raised CONSTRAINT_ERROR : a-cihama.adb:443 explicit raise (for the map)
raised CONSTRAINT_ERROR : a-cihase.adb:375 explicit raise (for the set)


>> types.ads:
>> ----------
>>
>> package types is
>> subtype word is string(1 .. 128);
>> end types;
> 
> Get rid of this.  Instantiate the Indefinite_Hashed_Sets (or _Maps, if you 
> prefer) with type String.
> 

Ok.

> 
>> hashmap.ads:
>> ------------
>>
>> package hashmap is new ada.containers.hashed_maps(key_type  =>
>> types.word,
> 
> No.  This should just be type String.
> 

Ok.

> 
>> and finally spellcheck2.adb:
>> ----------------------------
>>
>> strings.fixed.delete(item, item_offset + 1, word'last);
> 
> You can get rid of this.

I did.
> 
> 
>> hashmap.insert(dict, item, true);
> 
> If you used a (hashed) set, then all you'd have to say is
> 
>   insert(dict, item);
> 
> 
>> strings.fixed.delete(item, item_offset + 1, word'last);
> 
> You can get rid of this.
> 

Here too.

> 
>> if not hashmap.contains(dict, item) then
> 
> See, you're not using the map element at all.  You're only using the key.
> 
> 
>> Please note that those attempts are only quick and dirty hacks. If
>> someone knows how to get rid of the seperate types.ads file I would
>> welcome his advice.
> 
> You can get rid of the types file, by using the indefinite hashed map (or 
> set), and instantiating it with type String.
> 
> -Matt

I post the modified source. Maybe I've overlooked something because as I
said above during runtime I get an exception. However thanks for those
valuable information Matthew!

hashed_set.ads:
--------------

with ada.containers.indefinite_hashed_sets;
with ada.strings.hash;
--with hash_function;

package hashed_set is new ada.containers.indefinite_hashed_set(
				element_type    => string,
				hash            => ada.strings.hash, --hash_function,
				equivalent_keys => "=");

spellcheck3.adb:
----------------

with ada.strings.fixed;
with ada.text_io;
with hashed_set;
use  ada;

procedure spellcheck3 is

	file        : text_io.file_type;
	item        : string(1 .. 128);
	item_offset : natural;
	dict        : hashed_set.set;

begin

	text_io.open(file, text_io.in_file, "spellcheck-dict.txt");
	
	while not text_io.end_of_file(file) loop
		text_io.get_line(file, item, item_offset);
		hashed_set.insert(dict, item);
	end loop;

	while not text_io.end_of_file loop
		text_io.get_line(item, item_offset);
		if not hashed_set.contains(dict, item) then
			text_io.put_line(item);
		end if;
	end loop;
	
	text_io.close(file);

end spellcheck3;


Regards,
Albert



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Spellcheck.adb
  2005-04-27 22:49       ` Spellcheck.adb Albert Bachmann
@ 2005-04-27 23:35         ` Marius Amado Alves
  2005-04-27 23:58           ` Spellcheck.adb Albert Bachmann
  2005-04-28  0:41         ` Spellcheck.adb Matthew Heaney
  1 sibling, 1 reply; 23+ messages in thread
From: Marius Amado Alves @ 2005-04-27 23:35 UTC (permalink / raw)
  To: Albert Bachmann; +Cc: comp.lang.ada

> 	...
> 	while not text_io.end_of_file(file) loop
> 		text_io.get_line(file, item, item_offset);
> 		hashed_set.insert(dict, item);

Don't pass the whole item, but the slice item (1 .. item_offset).

> ...
> 	while not text_io.end_of_file loop
> 		text_io.get_line(item, item_offset);
> 		if not hashed_set.contains(dict, item) then
> 			text_io.put_line(item);
>

Ditto.

(And item_offset should be called item_length or similar.)




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Spellcheck.adb
  2005-04-27 23:35         ` Spellcheck.adb Marius Amado Alves
@ 2005-04-27 23:58           ` Albert Bachmann
  2005-04-28  0:19             ` Spellcheck.adb Marius Amado Alves
  2005-04-28  0:58             ` Spellcheck.adb Matthew Heaney
  0 siblings, 2 replies; 23+ messages in thread
From: Albert Bachmann @ 2005-04-27 23:58 UTC (permalink / raw)


On Thu, 28 Apr 2005 00:35:21 +0100, Marius Amado Alves wrote:

>> 	...
>> 	while not text_io.end_of_file(file) loop
>> 		text_io.get_line(file, item, item_offset);
>> 		hashed_set.insert(dict, item);
> 
> Don't pass the whole item, but the slice item (1 .. item_offset).
> 
>> ...
>> 	while not text_io.end_of_file loop
>> 		text_io.get_line(item, item_offset);
>> 		if not hashed_set.contains(dict, item) then
>> 			text_io.put_line(item);
>>
> 
> Ditto.
> 
> (And item_offset should be called item_length or similar.)

Yes. That's it. I should have known. And timing is not bad with hashed_set:

-- with custom hash function:
real    0m0.119s
user    0m0.101s
sys     0m0.009s

-- with ada.strings.hash:
real    0m0.130s
user    0m0.116s
sys     0m0.005s

Using streams makes for another 10ms. But the GNAT.Spitbol.Table
still seems to be the fastest version so far.

Regards,
Albert




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Spellcheck.adb
  2005-04-27 23:58           ` Spellcheck.adb Albert Bachmann
@ 2005-04-28  0:19             ` Marius Amado Alves
  2005-04-29 10:22               ` Spellcheck.adb Albert Bachmann
  2005-04-28  0:58             ` Spellcheck.adb Matthew Heaney
  1 sibling, 1 reply; 23+ messages in thread
From: Marius Amado Alves @ 2005-04-28  0:19 UTC (permalink / raw)
  To: comp.lang.ada

> -- with custom hash function:
> real    0m0.119s
> user    0m0.101s
> sys     0m0.009s
>
> -- with ada.strings.hash:
> real    0m0.130s
> user    0m0.116s
> sys     0m0.005s

Have you tried my bit-based hash function?




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Spellcheck.adb
  2005-04-27 22:49       ` Spellcheck.adb Albert Bachmann
  2005-04-27 23:35         ` Spellcheck.adb Marius Amado Alves
@ 2005-04-28  0:41         ` Matthew Heaney
  2005-04-28 21:20           ` Spellcheck.adb Simon Wright
  2005-04-29 10:05           ` Spellcheck.adb Albert Bachmann
  1 sibling, 2 replies; 23+ messages in thread
From: Matthew Heaney @ 2005-04-28  0:41 UTC (permalink / raw)


Albert Bachmann <albert.bachmann@gmx.de> writes:

> When I use and instance of indefinite_hashed_sets or
> indefinite_hashed_maps the program compiles fine but I get an exception
> during runtime:
> 
> raised CONSTRAINT_ERROR : a-cihama.adb:443 explicit raise (for the map)
> raised CONSTRAINT_ERROR : a-cihase.adb:375 explicit raise (for the set)

What version of the container packages are you using?  Either you found
a bug in the sources you're using, or you're doing something wrong.

Can you send me (or just post) the portion of the code that surrounds
the line numbers in the exception message above?

Thanks,
Matt



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Spellcheck.adb
  2005-04-27 23:58           ` Spellcheck.adb Albert Bachmann
  2005-04-28  0:19             ` Spellcheck.adb Marius Amado Alves
@ 2005-04-28  0:58             ` Matthew Heaney
  2005-04-29 10:19               ` Spellcheck.adb Albert Bachmann
  1 sibling, 1 reply; 23+ messages in thread
From: Matthew Heaney @ 2005-04-28  0:58 UTC (permalink / raw)


Albert Bachmann <albert.bachmann@gmx.de> writes:

> Yes. That's it. I should have known. And timing is not bad with hashed_set:
> 
> -- with custom hash function:

What is your custom hash algorithm?


> -- with ada.strings.hash:

I just checked in Pascal Obry's improved algorithm for ada.strings.hash.
See if it does any better on your input.


> Using streams makes for another 10ms. But the GNAT.Spitbol.Table still
> seems to be the fastest version so far.

What were those numbers again?

Regards,
Matt



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Spellcheck.adb
  2005-04-28  0:41         ` Spellcheck.adb Matthew Heaney
@ 2005-04-28 21:20           ` Simon Wright
  2005-04-29 10:05           ` Spellcheck.adb Albert Bachmann
  1 sibling, 0 replies; 23+ messages in thread
From: Simon Wright @ 2005-04-28 21:20 UTC (permalink / raw)


Matthew Heaney <matthewjheaney@earthlink.net> writes:

> Albert Bachmann <albert.bachmann@gmx.de> writes:
> 
> > When I use and instance of indefinite_hashed_sets or
> > indefinite_hashed_maps the program compiles fine but I get an exception
> > during runtime:
> > 
> > raised CONSTRAINT_ERROR : a-cihama.adb:443 explicit raise (for the map)
> > raised CONSTRAINT_ERROR : a-cihase.adb:375 explicit raise (for the set)
> 
> What version of the container packages are you using?  Either you found
> a bug in the sources you're using, or you're doing something wrong.

What revision of the tigris.org sources is in GCC 4.0.0? (I was most
impressed to find them there!)

With 4.0.0 on Mac OS X 10.3.9, I have managed (after a few minutes) to
run test_maps with no assertion failures.

I couldn't see instructions for running the tests? If in ai302/testdir,

* to compile with assertions against the 4.0.0 Ada.Containers,

    gnatmake -gnat05 -gnata test_maps

* To compile with assertions against the local Ada.Containers,

    gnatmake -gnat05 -gnata -I.. -a test_maps

(the -a says rebuild runtime components if necessary).

-- 
Simon Wright                               100% Ada, no bugs.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Spellcheck.adb
  2005-04-28  0:41         ` Spellcheck.adb Matthew Heaney
  2005-04-28 21:20           ` Spellcheck.adb Simon Wright
@ 2005-04-29 10:05           ` Albert Bachmann
  2005-04-29 12:35             ` Spellcheck.adb Matthew Heaney
  1 sibling, 1 reply; 23+ messages in thread
From: Albert Bachmann @ 2005-04-29 10:05 UTC (permalink / raw)


On Thu, 28 Apr 2005 00:41:22 +0000, Matthew Heaney wrote:

> Albert Bachmann <albert.bachmann@gmx.de> writes:
> 
>> When I use and instance of indefinite_hashed_sets or
>> indefinite_hashed_maps the program compiles fine but I get an exception
>> during runtime:
>> 
>> raised CONSTRAINT_ERROR : a-cihama.adb:443 explicit raise (for the map)
>> raised CONSTRAINT_ERROR : a-cihase.adb:375 explicit raise (for the set)
> 
> What version of the container packages are you using?  Either you found
> a bug in the sources you're using, or you're doing something wrong.
> 
> Can you send me (or just post) the portion of the code that surrounds
> the line numbers in the exception message above?
> 
> Thanks,
> Matt

Well, Marius Amado Alves advised me not to pass the whole string to the
hashed_set (see the code in my post you replied to). I get no error
anymore then. Do you consider this an error of Ada.Containers despite
this? I assumed it was solely my fault.

Regards,
Albert




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Spellcheck.adb
  2005-04-28  0:58             ` Spellcheck.adb Matthew Heaney
@ 2005-04-29 10:19               ` Albert Bachmann
  0 siblings, 0 replies; 23+ messages in thread
From: Albert Bachmann @ 2005-04-29 10:19 UTC (permalink / raw)


On Thu, 28 Apr 2005 00:58:29 +0000, Matthew Heaney wrote:

> Albert Bachmann <albert.bachmann@gmx.de> writes:
> 
>> Yes. That's it. I should have known. And timing is not bad with hashed_set:
>> 
>> -- with custom hash function:
> 
> What is your custom hash algorithm?
> 

Just the one used in the C++ entry. With "custom" I just wanted to
distinguish it from Ada.Strings.Hash.

with ada.containers;
use  ada.containers;

function hash_function(key : string) return hash_type is
	n : hash_type := 0;
begin
	for i in key'range loop
		n := (n * 5 + character'pos(key(i)));
	end loop;

	return n;
end hash_function;

> 
>> -- with ada.strings.hash:
> 
> I just checked in Pascal Obry's improved algorithm for ada.strings.hash.
> See if it does any better on your input.
> 
> 
>> Using streams makes for another 10ms. But the GNAT.Spitbol.Table still
>> seems to be the fastest version so far.
> 
> What were those numbers again?
> 
> Regards,
> Matt

With David Sauvage's version (with uses GNAT.Spitbol.Table) I get the
following times:

real    0m0.074s
user    0m0.054s
sys     0m0.012s

I have no specific times for the streams version at hand since I deleted
it already. But I remember that back then the difference was in the range
of 10ms. Note that David Sauvage's version uses streams which contribute
to his excellent results.

http://charles.tigris.org/source/browse/charles/src/ai302/examples/spellcheck/
still shows only this initial revisions and the hash function there is the
same as the one I used.

Regards,
Albert




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Spellcheck.adb
  2005-04-28  0:19             ` Spellcheck.adb Marius Amado Alves
@ 2005-04-29 10:22               ` Albert Bachmann
  0 siblings, 0 replies; 23+ messages in thread
From: Albert Bachmann @ 2005-04-29 10:22 UTC (permalink / raw)


On Thu, 28 Apr 2005 01:19:03 +0100, Marius Amado Alves wrote:

>> -- with custom hash function:
>> real    0m0.119s
>> user    0m0.101s
>> sys     0m0.009s
>>
>> -- with ada.strings.hash:
>> real    0m0.130s
>> user    0m0.116s
>> sys     0m0.005s
> 
> Have you tried my bit-based hash function?

Not yet ;-) But if time permits I will try it during the next days.

Regards,
Albert



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Spellcheck.adb
  2005-04-29 10:05           ` Spellcheck.adb Albert Bachmann
@ 2005-04-29 12:35             ` Matthew Heaney
  0 siblings, 0 replies; 23+ messages in thread
From: Matthew Heaney @ 2005-04-29 12:35 UTC (permalink / raw)


Albert Bachmann <albert.bachmann@gmx.de> writes:

> Well, Marius Amado Alves advised me not to pass the whole string to
> the hashed_set (see the code in my post you replied to). I get no
> error anymore then. Do you consider this an error of Ada.Containers
> despite this? I assumed it was solely my fault.

The exception behavior you observed is in harmony with what's described
in the latest AI-302 draft.





^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2005-04-29 12:35 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-04-25 21:30 [Shootout] Spellcheck.adb albert.bachmann
2005-04-25 21:57 ` Spellcheck.adb Albert Bachmann
2005-04-25 23:27 ` [Shootout] Spellcheck.adb Marius Amado Alves
2005-04-26 18:04 ` Spellcheck.adb Matthew Heaney
2005-04-26 19:36 ` [Shootout] Spellcheck.adb David Sauvage
2005-04-26 20:50   ` Spellcheck.adb Albert Bachmann
2005-04-27 15:11     ` Spellcheck.adb Matthew Heaney
2005-04-27 22:49       ` Spellcheck.adb Albert Bachmann
2005-04-27 23:35         ` Spellcheck.adb Marius Amado Alves
2005-04-27 23:58           ` Spellcheck.adb Albert Bachmann
2005-04-28  0:19             ` Spellcheck.adb Marius Amado Alves
2005-04-29 10:22               ` Spellcheck.adb Albert Bachmann
2005-04-28  0:58             ` Spellcheck.adb Matthew Heaney
2005-04-29 10:19               ` Spellcheck.adb Albert Bachmann
2005-04-28  0:41         ` Spellcheck.adb Matthew Heaney
2005-04-28 21:20           ` Spellcheck.adb Simon Wright
2005-04-29 10:05           ` Spellcheck.adb Albert Bachmann
2005-04-29 12:35             ` Spellcheck.adb Matthew Heaney
2005-04-26 21:21   ` Spellcheck.adb Albert Bachmann
2005-04-26 22:04     ` Spellcheck.adb David Sauvage
2005-04-26 20:36 ` [Shootout] Spellcheck.adb Dmitry A. Kazakov
2005-04-26 21:50   ` David Sauvage
2005-04-27  8:40     ` Dmitry A. Kazakov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox