From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: 103376,13ab88b30e0f779d
X-Google-Attributes: gid103376,public
X-Google-ArrivalTime: 2002-12-27 17:44:54 PST
Path: 
 archiver1.google.com!news1.google.com!newsfeed.stanford.edu!headwall.stanford.edu!fu-berlin.de!uni-berlin.de!pc-62-30-113-45-cr.blueyonder.co.UK!not-for-mail
From: "Nick Roberts" <nickroberts@blueyonder.co.uk>
Newsgroups: comp.lang.ada
Subject: Re: Efficient Matrix?
Date: Sat, 28 Dec 2002 01:44:50 -0000
Message-ID: <auivmk$7r6vj$1@ID-25716.news.dfncis.de>
References: <3e0b2a66_4@news.bluewin.ch>
 <slrnb0mvgq.j0i.adi@drcomp.erfurt.thur.de>
 <Vg9FFjG215C+Ewh5@grazebrook.demon.co.uk>
 <slrnb0o8pi.ma5.adi@drcomp.erfurt.thur.de>
 <wccadir1jrt.fsf@shell01.TheWorld.com>
NNTP-Posting-Host: pc-62-30-113-45-cr.blueyonder.co.uk (62.30.113.45)
X-Trace: fu-berlin.de 1041039892 8231923 62.30.113.45 (16 [25716])
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 5.50.4522.1200
X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4522.1200
Xref: archiver1.google.com comp.lang.ada:32349
Date: 2002-12-28T01:44:50+00:00
List-Id: <comp.lang.ada>

[Jonas Gasser asked about how to efficiently model a large boolean matrix in
Ada.]

It may be worth pointing out that on a computer with typical virtual memory
management, if patterns of actual access to your large boolean array will
tend to be 'sparse' (i.e. your program will tend to concentrate on one or a
few small parts of the matrix at a time during computation), only a small
part of the matrix will be part of the 'working set' (the set of memory
pages which are actually being used at a certain 'moment', e.g. each second,
during the program's execution). In this case, the fact that the matrix
takes up many megabytes (or even gigabytes) of memory may recede in
importance compared to the fact that only a small part (perhaps a few 100
kilobytes) needs to be in RAM at any one time during most of the program's
execution.

Or to put it more simply, if you are running your program on a virtual
memory machine, try the big array declaration and see if it works!  If it
seems intolerably slow, try another more sophisticated strategy, such as a
hash table, and see if this increases speed (it may decrease speed, so be
wary).

One technique that may be worth investigating (again, on a virtual memory
machine) is that of using a small hash table which simply uses the full
array to resolve collisions.


type Matrix is array (Positive range <>, Positive range <>) of Boolean;

type Hash_Index is range 0..999; -- whatever range is appropriate
function Hash (i, j: Positive) return Hash_Index;

type Hashtable_Entry is
   record
      Value: Boolean := False; -- or True if most values will be true
      Collision: Boolean := False;
   end record;

type Matrix_by_Hash (Hash_Index) of Hashtable_Entry;

M1: Matrix(1..10000,1..5000) := (others => (others => False));
-- M1 may need to be dynamically allocated.
-- The initialisation may need to be different or omitted.

H1: Matrix_by_Hash;

function Get_M1 (i, j: in Positive) return Boolean is
   h: constant Hash_Index := Hash(i,j);
   E: constant Hashtable_Entry := H1(h);
begin
   if E.Collision then return M1(i,j); else return E.Value; end if;
end;

procedure Set_M1 (i, j: in Positive; b: in Boolean) is
   h: constant Hash_Index := Hash(i,j);
begin
   M1(i,j) := b;
   if H1(h).Value /= b then H1(h).Collision := True; end if;
end;


It is one of the basic premises of the hashing technique that the data
written into the full matrix will tend, in practice, to be sparse (i.e. that
it will contain mainly falses (or maybe that it will have mainly trues)).
The computation of the hashing index (the Hash function) always needs to be
fast, but particularly so in this kind of situation, I think. I'll leave the
actual choice of the Hash function and Hash_Index range to you.

Also, for the above specific technique, since Set_M1 always accesses the big
M1 array, it is supposed that the number of reads will greatly outnumber the
number of writes. You may well wish to apply pragma Inline to Hash, Get_M1
and Set_M1. You will almost certainly wish to apply pragma Pack to M1 (or
the (sub)type Matrix), and possibly also to H1 (or Matrix_by_Hash) and
Hashtable_Entry.

If this technique works, you ought to 'pot' the matrix types and functions
in a package; possibly a generic one which takes the full matrix array type,
the hash index type, and the hashing function as generic parameters.

Hope this is of some help (or interest).  Good luck!

--
Nick Roberts