From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM autolearn=ham autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII Path: g2news1.google.com!postnews.google.com!y33g2000yqb.googlegroups.com!not-for-mail From: jonathan Newsgroups: comp.lang.ada Subject: Re: Copying rows in a two dimensional array. Date: Sat, 13 Feb 2010 16:42:13 -0800 (PST) Organization: http://groups.google.com Message-ID: References: <4b6637a1$0$4586$4d3efbfe@news.sover.net> NNTP-Posting-Host: 143.117.23.232 Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: posting.google.com 1266108133 852 127.0.0.1 (14 Feb 2010 00:42:13 GMT) X-Complaints-To: groups-abuse@google.com NNTP-Posting-Date: Sun, 14 Feb 2010 00:42:13 +0000 (UTC) Complaints-To: groups-abuse@google.com Injection-Info: y33g2000yqb.googlegroups.com; posting-host=143.117.23.232; posting-account=Jzt5lQoAAAB4PhTgRLOPGuTLd_K1LY-C User-Agent: G2/1.0 X-HTTP-UserAgent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.16) Gecko/2009121609 Iceweasel/3.0.6 (Debian-3.0.6-3),gzip(gfe),gzip(gfe) Xref: g2news1.google.com comp.lang.ada:9206 Date: 2010-02-13T16:42:13-08:00 List-Id: On Feb 2, 8:52=A0am, Jean-Pierre Rosen wrote: > Jerry a =E9crit :> I've never understood why Ada does not allow slicing i= n > > multidimensional arrays. What are the safety issues involved? And how > > is it safe to force the programmer into ad hoc methods? > > One-dimensional slices are simple and efficient. Multidimensional slices > are a can of worms. > > I guess you are thinking about rectangular slices. But why stop there? A > slice comprising the main diagonal and the diagonal above and below can > be very useful for some calculations. Or a slice which is the triangular > part of the upper half... > > AFAICT, Fortran-99 does provide this - and the syntax is so complicated > that nobody uses it. And implementation is also a nightmare. > > When designing a programming language, you have to stop at some point. > The ratio (cost of implementation) / usefulness is a good measure for > this. I think the ratio was simply to high for this feature. > > -- > --------------------------------------------------------- > =A0 =A0 =A0 =A0 =A0 =A0J-P. Rosen (ro...@adalog.fr) > Visit Adalog's web site athttp://www.adalog.fr I've never felt the need for two dimensional (or higher dimensional) slicing. It's partly a performance issue: if you make the data storage matrix as small as possible (ie make it exactly the same size as the matrix-transformation you are doing) then you sometimes get a disastrous loss of efficiency. If on the other hand you always make the data storage matrix larger than necessary (so that your transformation will always be performed on a sub-matrix of the data storage matrix), then you always have the option of avoiding these efficiency losses. Once you write the transformation routine so that it operates on sub-matrices of the data storage matrix, then you usually don't have to slice or copy a sub-matrix from the data storage matrix in order to transform it. Here are some Ada and Fortran examples of the problem on various sized data storage arrays. I used some bits and pieces of routines from http://web.am.qub.ac.uk/users/j.parker/miscellany/ First example: we eigen-decompose an N x N =3D 2048 x 2048 matrix. The data storage matrix is M x M =3D (1024+Padding) x (1024+Padding) Here is the running time in seconds of an iterative jacobi eigen-decomposition: 2048x2048: 322 sec, gnat (Padding=3D24) 2048x2048: 1646 sec, gnat (Padding=3D0) 2048x2048: 1632 sec, gfortran (Padding=3D0) 2048x2048: 1492 sec, ifort (Padding=3D0) The observed 500% slowdown in the absence of padded arrays is unacceptable, even if it is a rare event (occurring only on 2**p sized data arrays). In fact it's not all that rare ... more comments on that below. (BTW, ifort is the INTEL fortran compiler, all optimizations at Max. gfortran is the gcc variant.) By the way, I never bothered to write a paddable Fortran routine, but here are some more timings near a power-of-2 array bounds: 1023x1023: 33.5 sec, gnat (Padding=3D9) 1023x1023: 42.4 sec, gfortran (Padding=3D0) 1023x1023: 37.3 sec, ifort (Padding=3D0) 1022x1022: 33.5 sec, gnat (Padding=3D10) 1022x1022: 30.2 sec, gfortran (Padding=3D0) 1022x1022: 28.3 sec, ifort (Padding=3D0) 1024x1024: 33.2 sec, gnat (Padding=3D8) 1024x1024: 96.0 sec, gnat (Padding=3D0) 1024x1024: 116 sec, gfortran (Padding=3D0) 1024x1024: 43 sec, ifort (Padding=3D0) There is one puzzle here I don't have time solve. Normally, a good fortran will automatically pad the array for you ... I recall that happening in the past. This time it seems to have slipped its fortran mind. The ifort man pages: -O3 enables "Padding the size of certain power-of-two arrays to allow more efficient cache use." But I used -O3 and also -O3 -fast ... maybe did something wrong, but important lesson: compiler optimization policies change with time, and of course vary from compiler to compiler. You can't rely on them or even, amazingly, man pages. It's much better to write the program in a way that is insensitive to changing optimization policies. A web search of "cache thrashing" will reveal much depressing detail on the subject. The efficiency problems discussed above occur as our arrays become large and spill out of the L3 cache (6 Meg in the present case). Just to demonstrate that these problems show up on all sorts of arrays, I did some runs in the 2000 to 3000 range, this time using a householder decomposition scavenged from the Golub singular value decomposition. Can still find plenty of 500%'s: 2102x2102, 3.93 sec, gnat (Padding =3D 0) 2102x2102, 3.19 sec, gnat (Padding =3D 8) 2176x2176, 5.03 sec, gnat (Padding =3D 0) 2176x2176, 3.42 sec, gnat (Padding =3D 8) 2304x2304, 8.47 sec, gnat (Padding =3D 0) 2304x2304, 4.52 sec, gnat (Padding =3D 8) 2560x2560, 24.1 sec, gnat (Padding =3D 0) 2560x2560, 5.42 sec, gnat (Padding =3D 8) 3072x3072, 38.9 sec, gnat (Padding =3D 0) 3072x3072, 7.76 sec, gnat (Padding =3D 8) 3584x3584, 53.2 sec, gnat (Padding =3D 0) 3584x3584, 11.5 sec, gnat (Padding =3D 8) Finally, an example on 1-dim arrays, using fast fourier transforms, FFT. The standard, and the most common FFT is computed on a power-of-2 length data set: 0 .. 2**p-1. I timed computation of these FFT's on arrays of length 2**p, and I compared this with computation on arrays of length 2**p + Padding, where Padding =3D 24. The computation on the padded arrays was faster. The ratio of running times is: p =3D 10 ratio =3D .93 p =3D 11 ratio =3D .88 p =3D 12 ratio =3D .76 p =3D 13 ratio =3D .79 p =3D 14 ratio =3D .76 p =3D 15 ratio =3D .75 p =3D 16 ratio =3D .77 p =3D 17 ratio =3D .84 p =3D 18 ratio =3D .75 p =3D 19 ratio =3D .45 p =3D 20 ratio =3D .63 p =3D 21 ratio =3D .69 p =3D 22 ratio =3D .69 p =3D 22 ratio =3D .67 p =3D 24 ratio =3D .62 p =3D 25 ratio =3D .62 So the problem is more common here, and smaller. (These efficiency losses I still consider unacceptable, especially in a routine whose reason for existence is efficiency.) The problem is still worse when you take FFTs of two dimensional arrays. There is another (and entirely independent) reason I prefer routines that perform their transformations on arbitrary sub-matrices (or on arbitrary diagonal blocks) of the data storage matrix. After writing my 1st linear algebra routine, I was very pleased with myself, but it didn't really do what I wanted. I realized I needed to transform the diagonal sub-blocks of a large matrix, and do it iteratively on arbitrarily sized diagonal sub-blocks. It was a very simple matter to modify the code to do this, and the result was so convenient that I've never considered doing it otherwise. Jonathan