comp.lang.ada
 help / color / mirror / Atom feed
* Vectorization (was Re: Ada in a C++ Interview)
@ 1991-08-09 21:49 Douglas S. Gray
  0 siblings, 0 replies; 2+ messages in thread
From: Douglas S. Gray @ 1991-08-09 21:49 UTC (permalink / raw)


In article <5283@lib.tmc.edu> cshotton@oac.hsc.uth.tmc.edu (Chuck Shotton) writ
es:
>
>Exactly what is involved in "vectorizing" code? Particularly, Ada code.

Since no one else has taken a stab at this, I guess I'll try.  I
worked on an expert system to increase optimization of Fortran
code for the 3090, so I know a bit, but am definitely not an expert.

Vectorized code takes advantage of a given system's vector processors,
which are usually embedded in a pipelined architecture.   These
processors operate on vectors- i.e one-dimensional arrays.  A machine
which has a vector architecture has a number of vector registers which
are operated on using special machine-level vector instructions.  A
vectorizing compiler is able to recognize what source statements can
utilize the vector instructions while ensuring correct results (which is
defined to be whatever a completely scalar version of the same code
would produce)

Take this sample pseuudo-code:

procedure Example ( A : in ARRAY[1..50] of integer;
		    B : in ARRAY[1..50] of integer;
		    C : out ARRAY[1..50] of integer
		  ) is
*********
for I in 1..50 do
     C[I] := B[I] + 1
*********

end Example;


The code between the asterisks might be translated to the following
assembly:

vloada V0, Bloc, 50    ;  Load the vector register V0 with the 50
			  elements starting at Bloc.
vloads V1, 1, 50       ;  Load the first 50 elements of vector register
			  V1 with the scalar value of 1.
vadd V0, V1, V2        ;  Using the vector processor, add vector
			  registers V0 and V1, placing the result in V2.
vstore V2, Cloc, 50    ;  Store the first 50 elements of V2 starting at
 			  Cloc.

  Using scalar instructions, the addition operation for each element
must be completed before the next element can begin.  With vector
operations, there is an overlap.  So each element still takes 3 cycles
for addition, but the next element starts the addition operation on the
second cycle of the previous element--

Cycles:    1    2    3    4    5    6 

           B[1]...B[1]
	        B[2]...B[2]
		     B[3]...B[3]
			  B[4]...B[4] 

  So instead of taking 12 cycles to add 4 elements, it only takes 6.
This does not take into account the considerable overhead of loading the
vector registers, though.  In the above timeline, note that the final
value of B[1] is not available while calculating B[2].  Therefore, if
that value IS required to compute B[2], the code can NOT be vectorized,
because of a data dependency.


   The vectorizing portion of the compiler typically deals with a parsed
form of the source, which it examines for loops where the data
dependencies allow vectorization.  Then, it must determine whether the
vectorization of the loop makes up for the associated overhead.  Good
compilers will rearrange statements  to increase vectorizability.

   The relative simplicity of performing the data dependency analysis in
FORTRAN (which is not simple) is one reason FORTRAN is the language of 
choice in this arena.  Another is the simplicity of the data structures.  
It is more likely that you will be performing operations on elements which 
are stored contiguously in memory when writing FORTRAN than when writing
Ada simply because FORTRAN's only data structure is the array.  It is
obviously still possible to vectorize Ada code (as someone pointed out,
any language with a LOOP control structure can be vectorized), it is
just harder than in FORTRAN. 

I hope this has clarified rather than confused.  If this isn't good
enough, I can suggest some references after looking in my notes from
those years.
-- 
****************************************************************
*   Douglas S Gray               SA-ALC/SC Ada Project Officer *
*   DSN:  945-7155               dgray@sadis02.sa.aflc.af.mil  *
*   Comm:  (512)925-7155         dugal@cs.stanford.edu         *

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Vectorization (was Re: Ada in a C++ Interview)
@ 1991-08-10  5:40 Robert I. Eachus
  0 siblings, 0 replies; 2+ messages in thread
From: Robert I. Eachus @ 1991-08-10  5:40 UTC (permalink / raw)


     Douglas Gray did a nice job of describing what vectorizing is all
about, although it can--and does--get considerably more complex on the
supercomputers.  However, I thought that I would say a few words about
vectorizing Ada.

    In general, Ada was intentionally designed so that simple
vectorizations can be done efficiently.  And since, at the time Ada
was developed, there were many machines around with "nice" character
string operation accelerators, Ada was also designed to take advantage
of these units.  (For example, predefined string comparisons.)

    However, there is a feature of Ada which seems to be a bogeyman
for some vectorizing compilers.  The language rules require that
when evaluating something like:

    for I in A'RANGE loop A(I) := B(I) + C(I); end loop;

    either the operation completes without an exception or A retains
its previous value.  This can be done by the pseudo-code:

    declare
      T: A_type(A'RANGE);
      pragma DELAY_CHECKS(T);
    begin
      for I in A'RANGE loop T(I) := B(I) + C(I); end loop;
      if T'OVERFLOW then raise CONSTRAINT_ERROR; end if;
      A := T;
    end;
    
    You sometimes can't get away from the "extra" copy (which in those
cases is not really extra), but some compilers seem unwilling to
generate temporary arrays in the optimizer so many vectorizations you
would expect never get done unless you massage the code.
    
    In many real applications, however, this turns out to be a
non-issue.  To get the fastest performance out of a machine you have
to understand its memory hierarchy, pipelining strategy, etc. and the
right thing to do is often to call an "intrinsic" library/run-time
routine.  The easiest way to make Ada compilers for vector machines
more useful to users is to provide these library packages/interfaces,
and users don't have to care if the best way to implement function
"*"(L,R: Matrix) return Matrix is the same on a Cray 2 and an IBM
RS/6000.  But that doesn't win you bragging rights in the benchmarking
wars....

--

					Robert I. Eachus

with STANDARD_DISCLAIMER;
use  STANDARD_DISCLAIMER;
function MESSAGE (TEXT: in CLEVER_IDEAS) return BETTER_IDEAS is...

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~1991-08-10  5:40 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1991-08-10  5:40 Vectorization (was Re: Ada in a C++ Interview) Robert I. Eachus
  -- strict thread matches above, loose matches on Subject: below --
1991-08-09 21:49 Douglas S. Gray

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox