From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.5-pre1 (2020-06-20) on ip-172-31-74-118.ec2.internal X-Spam-Level: X-Spam-Status: No, score=-1.9 required=3.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.5-pre1 Date: 9 Aug 91 21:49:42 GMT From: stanford.edu!neon.Stanford.EDU!dugal@uunet.uu.net (Douglas S. Gray) Subject: Vectorization (was Re: Ada in a C++ Interview) Message-ID: <1991Aug9.214942.2446@neon.Stanford.EDU> List-Id: In article <5283@lib.tmc.edu> cshotton@oac.hsc.uth.tmc.edu (Chuck Shotton) writ es: > >Exactly what is involved in "vectorizing" code? Particularly, Ada code. Since no one else has taken a stab at this, I guess I'll try. I worked on an expert system to increase optimization of Fortran code for the 3090, so I know a bit, but am definitely not an expert. Vectorized code takes advantage of a given system's vector processors, which are usually embedded in a pipelined architecture. These processors operate on vectors- i.e one-dimensional arrays. A machine which has a vector architecture has a number of vector registers which are operated on using special machine-level vector instructions. A vectorizing compiler is able to recognize what source statements can utilize the vector instructions while ensuring correct results (which is defined to be whatever a completely scalar version of the same code would produce) Take this sample pseuudo-code: procedure Example ( A : in ARRAY[1..50] of integer; B : in ARRAY[1..50] of integer; C : out ARRAY[1..50] of integer ) is ********* for I in 1..50 do C[I] := B[I] + 1 ********* end Example; The code between the asterisks might be translated to the following assembly: vloada V0, Bloc, 50 ; Load the vector register V0 with the 50 elements starting at Bloc. vloads V1, 1, 50 ; Load the first 50 elements of vector register V1 with the scalar value of 1. vadd V0, V1, V2 ; Using the vector processor, add vector registers V0 and V1, placing the result in V2. vstore V2, Cloc, 50 ; Store the first 50 elements of V2 starting at Cloc. Using scalar instructions, the addition operation for each element must be completed before the next element can begin. With vector operations, there is an overlap. So each element still takes 3 cycles for addition, but the next element starts the addition operation on the second cycle of the previous element-- Cycles: 1 2 3 4 5 6 B[1]...B[1] B[2]...B[2] B[3]...B[3] B[4]...B[4] So instead of taking 12 cycles to add 4 elements, it only takes 6. This does not take into account the considerable overhead of loading the vector registers, though. In the above timeline, note that the final value of B[1] is not available while calculating B[2]. Therefore, if that value IS required to compute B[2], the code can NOT be vectorized, because of a data dependency. The vectorizing portion of the compiler typically deals with a parsed form of the source, which it examines for loops where the data dependencies allow vectorization. Then, it must determine whether the vectorization of the loop makes up for the associated overhead. Good compilers will rearrange statements to increase vectorizability. The relative simplicity of performing the data dependency analysis in FORTRAN (which is not simple) is one reason FORTRAN is the language of choice in this arena. Another is the simplicity of the data structures. It is more likely that you will be performing operations on elements which are stored contiguously in memory when writing FORTRAN than when writing Ada simply because FORTRAN's only data structure is the array. It is obviously still possible to vectorize Ada code (as someone pointed out, any language with a LOOP control structure can be vectorized), it is just harder than in FORTRAN. I hope this has clarified rather than confused. If this isn't good enough, I can suggest some references after looking in my notes from those years. -- **************************************************************** * Douglas S Gray SA-ALC/SC Ada Project Officer * * DSN: 945-7155 dgray@sadis02.sa.aflc.af.mil * * Comm: (512)925-7155 dugal@cs.stanford.edu *