From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM
	autolearn=ham autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII
Path: 
 g2news1.google.com!postnews.google.com!y33g2000yqb.googlegroups.com!not-for-mail
From: jonathan <johnscpg@googlemail.com>
Newsgroups: comp.lang.ada
Subject: Re: Copying rows in a two dimensional array.
Date: Sat, 13 Feb 2010 16:42:13 -0800 (PST)
Organization: http://groups.google.com
Message-ID: 
 <c79a33a3-2b3e-46f1-9abf-d596f2f4b4a9@y33g2000yqb.googlegroups.com>
References: <4b6637a1$0$4586$4d3efbfe@news.sover.net>
 <ed36036c-8318-4f27-aaae-5329a8bfc83d@t31g2000prh.googlegroups.com>
	<m4p8kh.376.ln@hunter.axlog.fr>
NNTP-Posting-Host: 143.117.23.232
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
X-Trace: posting.google.com 1266108133 852 127.0.0.1 (14 Feb 2010 00:42:13
 GMT)
X-Complaints-To: groups-abuse@google.com
NNTP-Posting-Date: Sun, 14 Feb 2010 00:42:13 +0000 (UTC)
Complaints-To: groups-abuse@google.com
Injection-Info: y33g2000yqb.googlegroups.com; posting-host=143.117.23.232;
	posting-account=Jzt5lQoAAAB4PhTgRLOPGuTLd_K1LY-C
User-Agent: G2/1.0
X-HTTP-UserAgent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.16)
	Gecko/2009121609 Iceweasel/3.0.6 (Debian-3.0.6-3),gzip(gfe),gzip(gfe)
Xref: g2news1.google.com comp.lang.ada:9206
Date: 2010-02-13T16:42:13-08:00
List-Id: <comp.lang.ada>

On Feb 2, 8:52=A0am, Jean-Pierre Rosen <ro...@adalog.fr> wrote:
> Jerry a =E9crit :> I've never understood why Ada does not allow slicing i=
n
> > multidimensional arrays. What are the safety issues involved? And how
> > is it safe to force the programmer into ad hoc methods?
>
> One-dimensional slices are simple and efficient. Multidimensional slices
> are a can of worms.
>
> I guess you are thinking about rectangular slices. But why stop there? A
> slice comprising the main diagonal and the diagonal above and below can
> be very useful for some calculations. Or a slice which is the triangular
> part of the upper half...
>
> AFAICT, Fortran-99 does provide this - and the syntax is so complicated
> that nobody uses it. And implementation is also a nightmare.
>
> When designing a programming language, you have to stop at some point.
> The ratio (cost of implementation) / usefulness is a good measure for
> this. I think the ratio was simply to high for this feature.
>
> --
> ---------------------------------------------------------
> =A0 =A0 =A0 =A0 =A0 =A0J-P. Rosen (ro...@adalog.fr)
> Visit Adalog's web site athttp://www.adalog.fr


I've never felt the need for two dimensional (or higher
dimensional) slicing. It's partly a performance issue: if
you make the data storage matrix as small as possible (ie
make it exactly the same size as the matrix-transformation
you are doing) then you sometimes get a disastrous loss of
efficiency. If on the other hand you always make the data storage
matrix larger than necessary (so that your transformation will
always be performed on a sub-matrix of the data storage matrix),
then you always have the option of avoiding these efficiency
losses. Once you write the transformation routine so that it
operates on sub-matrices of the data storage matrix, then
you usually don't have to slice or copy a sub-matrix from
the data storage matrix in order to transform it.

Here are some Ada and Fortran examples of the
problem on various sized data storage arrays.
I used some bits and pieces of routines from
http://web.am.qub.ac.uk/users/j.parker/miscellany/

First example: we eigen-decompose an N x N =3D 2048 x 2048 matrix.
The data storage matrix is M x M =3D (1024+Padding) x (1024+Padding)
Here is the running time in seconds of an iterative jacobi
eigen-decomposition:

   2048x2048:  322 sec, gnat      (Padding=3D24)
   2048x2048: 1646 sec, gnat      (Padding=3D0)
   2048x2048: 1632 sec, gfortran  (Padding=3D0)
   2048x2048: 1492 sec, ifort     (Padding=3D0)

The observed 500% slowdown in the absence of padded arrays is
unacceptable, even if it is a rare event (occurring only on 2**p sized
data arrays). In fact it's not all that rare ... more comments on
that below.  (BTW, ifort is the INTEL fortran compiler, all
optimizations at Max. gfortran is the gcc variant.)

By the way, I never bothered to write a paddable Fortran routine,
but here are some more timings near a power-of-2 array bounds:

   1023x1023: 33.5 sec, gnat     (Padding=3D9)
   1023x1023: 42.4 sec, gfortran (Padding=3D0)
   1023x1023: 37.3 sec, ifort    (Padding=3D0)

   1022x1022: 33.5 sec, gnat     (Padding=3D10)
   1022x1022: 30.2 sec, gfortran (Padding=3D0)
   1022x1022: 28.3 sec, ifort    (Padding=3D0)

   1024x1024: 33.2 sec, gnat     (Padding=3D8)
   1024x1024: 96.0 sec, gnat     (Padding=3D0)
   1024x1024: 116  sec, gfortran (Padding=3D0)
   1024x1024: 43   sec, ifort    (Padding=3D0)

There is one puzzle here I don't have time solve.  Normally, a good
fortran will automatically pad the array for you ... I recall that
happening in the past. This time it seems to have slipped its fortran
mind. The ifort man pages:

   -O3 enables "Padding the size of certain power-of-two
   arrays to allow more  efficient cache use."

But I used -O3 and also -O3 -fast ... maybe did something wrong,
but important lesson: compiler optimization policies change
with time, and of course vary from compiler to compiler.
You can't rely on them or even, amazingly, man pages. It's much
better to write the program in a way that is insensitive to changing
optimization policies.

A web search of "cache thrashing" will reveal much depressing
detail on the subject. The efficiency problems discussed above
occur as our arrays become large and spill out of the L3 cache
(6 Meg in the present case).

Just to demonstrate that these problems show up on all sorts
of arrays, I did some runs in the 2000 to 3000 range, this time
using a householder decomposition scavenged from the Golub
singular value decomposition. Can still find plenty of 500%'s:

   2102x2102, 3.93 sec, gnat  (Padding =3D 0)
   2102x2102, 3.19 sec, gnat  (Padding =3D 8)

   2176x2176, 5.03 sec, gnat  (Padding =3D 0)
   2176x2176, 3.42 sec, gnat  (Padding =3D 8)

   2304x2304, 8.47 sec, gnat  (Padding =3D 0)
   2304x2304, 4.52 sec, gnat  (Padding =3D 8)

   2560x2560, 24.1 sec, gnat  (Padding =3D 0)
   2560x2560, 5.42 sec, gnat  (Padding =3D 8)

   3072x3072, 38.9 sec, gnat  (Padding =3D 0)
   3072x3072, 7.76 sec, gnat  (Padding =3D 8)

   3584x3584, 53.2 sec, gnat  (Padding =3D 0)
   3584x3584, 11.5 sec, gnat  (Padding =3D 8)

Finally, an example on 1-dim arrays, using fast fourier
transforms, FFT. The standard, and the most common FFT is
computed on a power-of-2 length data set: 0 .. 2**p-1. I
timed computation of these FFT's on arrays of length  2**p, and
I compared this with computation on arrays of
length  2**p + Padding, where Padding =3D 24.  The computation
on the padded arrays was faster. The ratio of running times is:

   p =3D 10  ratio =3D .93
   p =3D 11  ratio =3D .88
   p =3D 12  ratio =3D .76
   p =3D 13  ratio =3D .79
   p =3D 14  ratio =3D .76
   p =3D 15  ratio =3D .75
   p =3D 16  ratio =3D .77
   p =3D 17  ratio =3D .84
   p =3D 18  ratio =3D .75
   p =3D 19  ratio =3D .45
   p =3D 20  ratio =3D .63
   p =3D 21  ratio =3D .69
   p =3D 22  ratio =3D .69
   p =3D 22  ratio =3D .67
   p =3D 24  ratio =3D .62
   p =3D 25  ratio =3D .62

So the problem is more common here, and smaller. (These
efficiency losses I still consider unacceptable, especially
in a routine whose reason for existence is efficiency.)
The problem is still worse when you take FFTs of two
dimensional arrays.

There is another (and entirely independent) reason
I prefer routines that perform their transformations on
arbitrary sub-matrices (or on arbitrary diagonal blocks)
of the data storage matrix. After writing my 1st linear
algebra routine, I was very pleased with myself, but it
didn't really do what I wanted.  I realized I needed to
transform the diagonal sub-blocks of a large matrix,
and do it iteratively on arbitrarily sized diagonal
sub-blocks. It was a very simple matter to modify the code to
do this, and the result was so convenient that I've never
considered doing it otherwise.

Jonathan