* Re: Multithreaded scientific programming [not found] ` <4523e925$1@nntp.unige.ch> @ 2006-10-04 21:47 ` mrohan 2006-10-05 0:14 ` jimmaureenrogers 0 siblings, 1 reply; 12+ messages in thread From: mrohan @ 2006-10-04 21:47 UTC (permalink / raw) DP wrote: > Rob McDonald wrote: > ... > > To go along with this, what old algorithms have been sitting ignored in > > the back of textbooks because they need a ton of processors to be > > efficient? Maybe it is time to dust them off... > > > > Some attempts to address multi-threaded parallelism that might be suited > to future 80 cores processors: > > Nested data parallelism : http://www.cs.cmu.edu/~scandal/cacm/node4.html > > Cilk : http://supertech.lcs.mit.edu/cilk/ > > > Dan Hi, Since the choice of language is open, I would recommend investigating using Ada 2005. Ada has had tasks (mutlithreading) since it's creation in 1983 and polymorphic object oriented features since the 1995 revision. Numerical codes tend to be large and Ada's features for programming in the large should certainly help. Also, due to it's strong typing system and overall emphasis on correctness, once something is implemented in Ada that works, it's generally the "correct" solution. I've frequently encountered people making updates to numerical code (in C/Fortran) and summarizing their work with "I'm not sure what I did, but it works now and I'm not going to touch again". I've included comp.lang.ada on this thread as I'm sure there are people there who could give you more advise. Take care, Michael. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Multithreaded scientific programming 2006-10-04 21:47 ` Multithreaded scientific programming mrohan @ 2006-10-05 0:14 ` jimmaureenrogers 2006-10-05 4:18 ` tmoran 2006-10-05 16:32 ` David B. Chorlian 0 siblings, 2 replies; 12+ messages in thread From: jimmaureenrogers @ 2006-10-05 0:14 UTC (permalink / raw) mrohan@ACM.ORG wrote: > I've included comp.lang.ada on this thread as I'm sure there are people > there who could give you more advise. Here is a very modest example of parallel matrix multiplication using Ada. A proper Ada example would separate the Matrix type definition along with the matrix multiplication implementation into a separate package. This example is compressed into one procedure for easier viewing on usenet. The "*" function for type Matrix is defined so that one task is created to compute each Row * Column value. Ada stores arrays in row-major order by default. Each task is a concurrently executing entity. The number of tasks created in this example equals the number of elements in the result matrix. ----------------------------------------------------------------------- -- Simple Parallel Matrix Multiplication -- Using an ordinary matrix product ----------------------------------------------------------------------- with Ada.Text_Io; use Ada.Text_Io; with Ada.Integer_Text_Io; use Ada.Integer_Text_Io; procedure Parallel_Mult is type Matrix is array (Positive range <>, Positive range <>) of Integer; Incompatible_Dimensions : exception; function "*" (Left, Right : Matrix ) return Matrix is task type Product is entry Start (Left_Index, Right_Index : in Positive ); entry Report (Value : out Integer ); end Product; task body Product is L_Index, R_Index : Positive; Sum : Integer := 0; begin accept Start ( Left_Index, Right_Index : in Positive ) do L_Index := Left_Index; R_Index := Right_Index; end Start; for I in Left'range(2) loop Sum := Sum + (Right(I, R_Index) * Left(L_Index, I)); end loop; accept Report ( Value : out Integer ) do Value := Sum; end Report; end Product; Product_Array : array (Left'range (1), Right'range (2)) of Product; Result : Matrix (Left'range (1), Right'range (2)); begin -- "*" if Right'Length(1) /= Left'Length(2) then raise Incompatible_Dimensions; end if; for I in Left'range(1) loop for J in Right'range(2) loop Product_Array(I, J).Start(I, J); end loop; end loop; for I in Left'range(1) loop for J in Right'range(2) loop Product_Array(I,J).Report(Result(I,J)); end loop; end loop; return Result; end "*"; procedure Put (Item : Matrix ) is begin for I in Item'range(1) loop for J in Item'range(2) loop Put(Item => Item(I,J), Width => 5); end loop; New_Line; end loop; end Put; A : Matrix (1 .. 2, 1 .. 3) := ((1, 0, 2), (- 1, 3, 1)); B : Matrix (1 .. 3, 1 .. 2) := ((3, 1), (2, 1), (1, 0)); begin Put_Line("Matrix A:"); Put(A); Put_Line("Matrix B:"); Put(B); Put_Line("Product:"); Put(A * B); end Parallel_Mult; Jim Rogers ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Multithreaded scientific programming 2006-10-05 0:14 ` jimmaureenrogers @ 2006-10-05 4:18 ` tmoran 2006-10-05 16:32 ` David B. Chorlian 1 sibling, 0 replies; 12+ messages in thread From: tmoran @ 2006-10-05 4:18 UTC (permalink / raw) After getting a dual-core (Windows) machine I modified my Ada library 2-D FFT to take a CPU_Count parameter and do the job using that many separate tasks. Two Ada tasks makes a very pleasing speedup. Whether 80 would help, and what size cache you might want, I'll be curious to find out. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Multithreaded scientific programming 2006-10-05 0:14 ` jimmaureenrogers 2006-10-05 4:18 ` tmoran @ 2006-10-05 16:32 ` David B. Chorlian 2006-10-05 17:58 ` jimmaureenrogers ` (3 more replies) 1 sibling, 4 replies; 12+ messages in thread From: David B. Chorlian @ 2006-10-05 16:32 UTC (permalink / raw) In <1160007272.875375.190830@b28g2000cwb.googlegroups.com> "jimmaureenrogers@worldnet.att.net" <jimmaureenrogers@worldnet.att.net> writes: >mrohan@ACM.ORG wrote: >> I've included comp.lang.ada on this thread as I'm sure there are people >> there who could give you more advise. >Here is a very modest example of parallel matrix multiplication using >Ada. >A proper Ada example would separate the Matrix type definition along >with the matrix multiplication implementation into a separate package. >This example is compressed into one procedure for easier viewing >on usenet. >The "*" function for type Matrix is defined so that one task is >created to compute each Row * Column value. Ada stores arrays >in row-major order by default. >Each task is a concurrently executing entity. The number of >tasks created in this example equals the number of elements in >the result matrix. [clip] >Jim Rogers Please, let's have something worthwhile rather the above joke. Talk about cache, TLB, and other memory management topics which are the serious problems in large-scale scientific computing. -- David B. Chorlian Neurodynamics Lab SUNY/HSCB chorlian@cns.hscbklyn.edu davidc@panix.com ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Multithreaded scientific programming 2006-10-05 16:32 ` David B. Chorlian @ 2006-10-05 17:58 ` jimmaureenrogers 2006-10-05 18:35 ` Martin Krischik ` (2 subsequent siblings) 3 siblings, 0 replies; 12+ messages in thread From: jimmaureenrogers @ 2006-10-05 17:58 UTC (permalink / raw) David B. Chorlian wrote: > Please, let's have something worthwhile rather the above joke. > Talk about cache, TLB, and other memory management topics which > are the serious problems in large-scale scientific computing. Ok. I did not think a code example would be offensive. It sounds like you might appreciate a discussion concerning the concurrency features of the Ada language, or would you also consider that offensive? Jim Rogers ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Multithreaded scientific programming 2006-10-05 16:32 ` David B. Chorlian 2006-10-05 17:58 ` jimmaureenrogers @ 2006-10-05 18:35 ` Martin Krischik 2006-10-06 18:22 ` Björn Persson 2006-10-05 18:58 ` Francis Burton 2006-10-06 1:28 ` jimmaureenrogers 3 siblings, 1 reply; 12+ messages in thread From: Martin Krischik @ 2006-10-05 18:35 UTC (permalink / raw) David B. Chorlian wrote: > In <1160007272.875375.190830@b28g2000cwb.googlegroups.com> > "jimmaureenrogers@worldnet.att.net" <jimmaureenrogers@worldnet.att.net> > writes: > >>mrohan@ACM.ORG wrote: >>> I've included comp.lang.ada on this thread as I'm sure there are people >>> there who could give you more advise. > >>Here is a very modest example of parallel matrix multiplication using >>Ada. >>A proper Ada example would separate the Matrix type definition along >>with the matrix multiplication implementation into a separate package. > >>This example is compressed into one procedure for easier viewing >>on usenet. > >>The "*" function for type Matrix is defined so that one task is >>created to compute each Row * Column value. Ada stores arrays >>in row-major order by default. > >>Each task is a concurrently executing entity. The number of >>tasks created in this example equals the number of elements in >>the result matrix. > [clip] >>Jim Rogers > > Please, let's have something worthwhile rather the above joke. > Talk about cache, TLB, and other memory management topics which > are the serious problems in large-scale scientific computing. Well, for that I would have to know what a TLB is... This is cross post now. That can be very fruitfull or a complete waste of time as different ideas meet each other. Now, memory management is a word I understand and Ada is quite good at it. Flexible use of the Stack, user defined storrage pools, arrays with size calculated and maintained at runtime. Anything in particular you want to know about? Martin -- mailto://krischik@users.sourceforge.net Ada programming at: http://ada.krischik.com ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Multithreaded scientific programming 2006-10-05 18:35 ` Martin Krischik @ 2006-10-06 18:22 ` Björn Persson 2006-10-06 20:57 ` jimmaureenrogers 2006-10-09 6:36 ` Martin Krischik 0 siblings, 2 replies; 12+ messages in thread From: Björn Persson @ 2006-10-06 18:22 UTC (permalink / raw) Martin Krischik wrote: > David B. Chorlian wrote: >> Talk about cache, TLB, and other memory management topics which >> are the serious problems in large-scale scientific computing. > > Well, for that I would have to know what a TLB is... Translation lookaside buffer � a small cache memory for the page table, which the memory management unit uses to speed up the translation from virtual addresses to hardware addresses. I guess David wants to write programs such that they only relatively rarely need to access memory pages that aren't in the TLB. -- Bj�rn Persson PGP key A88682FD omb jor ers @sv ge. r o.b n.p son eri nu ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Multithreaded scientific programming 2006-10-06 18:22 ` Björn Persson @ 2006-10-06 20:57 ` jimmaureenrogers 2006-10-09 6:36 ` Martin Krischik 1 sibling, 0 replies; 12+ messages in thread From: jimmaureenrogers @ 2006-10-06 20:57 UTC (permalink / raw) Björn Persson wrote: > Martin Krischik wrote: > > David B. Chorlian wrote: > >> Talk about cache, TLB, and other memory management topics which > >> are the serious problems in large-scale scientific computing. > > > > Well, for that I would have to know what a TLB is... > > Translation lookaside buffer - a small cache memory for the page table, > which the memory management unit uses to speed up the translation from > virtual addresses to hardware addresses. I guess David wants to write > programs such that they only relatively rarely need to access memory > pages that aren't in the TLB. > This sounds highly system dependent. Do all processors have the same size TLB? I know some current Intel dual core processors have a 16 Megabyte cache, while the roughly equivalent AMD processors have a 2 Megabyte cache. It would seem that cache size and contents could be very important for optimizations at the level one worrys about TLB contents. On the other hand, the AMD dual core processors have up to a 3000 Mts speed while the roughly equivalent Intel processors have a 1600 Mts speed. Both AMD and Intel employ the same instruction sets. Compilers are not likely to recognize differences between processors from the two vendors. Is it common to write scientific programming in such an unportable manner? Jim Rogers ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Multithreaded scientific programming 2006-10-06 18:22 ` Björn Persson 2006-10-06 20:57 ` jimmaureenrogers @ 2006-10-09 6:36 ` Martin Krischik 1 sibling, 0 replies; 12+ messages in thread From: Martin Krischik @ 2006-10-09 6:36 UTC (permalink / raw) Bj�rn Persson schrieb: > Martin Krischik wrote: >> David B. Chorlian wrote: >>> Talk about cache, TLB, and other memory management topics which >>> are the serious problems in large-scale scientific computing. >> >> Well, for that I would have to know what a TLB is... > > Translation lookaside buffer � a small cache memory for the page table, > which the memory management unit uses to speed up the translation from > virtual addresses to hardware addresses. I guess David wants to write > programs such that they only relatively rarely need to access memory > pages that aren't in the TLB. Well, in that case I would say that's exactly what the -mtune compiler option is for. So David, to answer to your question about the TLB: This has has nothing to do with the programming language front end (Ada, Fortran, C, etc. pp.) - it's handled by the compiler back end (Pentium, Athlon, PowerPC, etc. pp). Perhaps you have only worked with compilers which only feature one front and one back end - but this is not always the case. In fact, most Ada compiler feature multiple back ends. Martin ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Multithreaded scientific programming 2006-10-05 16:32 ` David B. Chorlian 2006-10-05 17:58 ` jimmaureenrogers 2006-10-05 18:35 ` Martin Krischik @ 2006-10-05 18:58 ` Francis Burton 2006-10-05 20:20 ` Pascal Obry 2006-10-06 1:28 ` jimmaureenrogers 3 siblings, 1 reply; 12+ messages in thread From: Francis Burton @ 2006-10-05 18:58 UTC (permalink / raw) In article <eg3c27$7kl$1@reader1.panix.com>, David B. Chorlian <davidc@panix.com> wrote: >Please, let's have something worthwhile rather the above joke. You don't say why it is a joke and not worthwhile. Is this an opinion, or fact? >Talk about cache, TLB, and other memory management topics which >are the serious problems in large-scale scientific computing. Why don't you start the ball rolling then - your insights on these topics might be very interesting. Francis ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Multithreaded scientific programming 2006-10-05 18:58 ` Francis Burton @ 2006-10-05 20:20 ` Pascal Obry 0 siblings, 0 replies; 12+ messages in thread From: Pascal Obry @ 2006-10-05 20:20 UTC (permalink / raw) To: Francis Burton Francis Burton a �crit : > In article <eg3c27$7kl$1@reader1.panix.com>, > David B. Chorlian <davidc@panix.com> wrote: >> Please, let's have something worthwhile rather the above joke. > > You don't say why it is a joke and not worthwhile. Is this > an opinion, or fact? Probably not a joke as this was presented by Jim as a "modest example". I also think that this is not a proper solution as creating 9 tasks for each 3x3 matrix product is a waste of resources. But isn't this the path followed by OpenMP ? First of all : I'm no expert on this domain. I tend to think that in a real world scientific applications there is a lot of vector/matrix/whatever computations to be done. Also not the whole application is computation intensive (data loading or initialization for example). I like to think as a work flow. Some tasks are preparing data to be computed and storing them in a queue. Some others are getting the job to be done from those queues. In this case the tasks are on the same computer or distributed (here Ada Distributed Annex can be of great use). The deal is to balance the different tasks to avoid starvation (from reader tasks) or waiting when queues are full. At least in such architecture the number of tasks is mostly static. We do not waste too much time creating and finalizing tasks/threads. Anyway, here Ada concurrency and distributed support can be used instead of the OpenMP and MPI couple as used in many Fortran and C/C++ applications. Pascal. -- --|------------------------------------------------------ --| Pascal Obry Team-Ada Member --| 45, rue Gabriel Peri - 78114 Magny Les Hameaux FRANCE --|------------------------------------------------------ --| http://www.obry.net --| "The best way to travel is by means of imagination" --| --| gpg --keyserver wwwkeys.pgp.net --recv-key C1082595 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Multithreaded scientific programming 2006-10-05 16:32 ` David B. Chorlian ` (2 preceding siblings ...) 2006-10-05 18:58 ` Francis Burton @ 2006-10-06 1:28 ` jimmaureenrogers 3 siblings, 0 replies; 12+ messages in thread From: jimmaureenrogers @ 2006-10-06 1:28 UTC (permalink / raw) David B. Chorlian wrote: > Talk about cache, TLB, and other memory management topics which > are the serious problems in large-scale scientific computing. Memory management is one of the serious problems of large-scale computing in general. It is not limited to scientific computing. Memory management for large scale computing can be divided into several sub-topics. I will cover some of those topics. I trust others will comment on the same areas and add comments on areas I do not discuss. Ada provides implicit memory management for large scale computing in the areas of data encapsulation, direct communication between tasks, communication through implicitly protected shared buffers, and communication through unprotected shared variables. Every Ada program has at least one task. The "main" or entry point procedure for the program runs within an "environment" task. All other tasks depend on a master task. Furthermore, if a task depends on a given master, it is defined to depend on the task that executes the master, and (recursively) on any master of that task. The variables declared statically within a task are normally allocated on the stack for that task. Those variables are visible to subprograms and tasks defined within the scope of the task defining the variable. Tasks do not need to be defined within the scope of other tasks. Tasks may be defined by the elaboration of an object declaration in a package depended upon by the environment task. Tasks may directly communicate with each other using the Rendezvous model. The called task declares an entry with zero or more parameters. The calling task calls that entry. All entry calls are implicitly queued. The entry call is completed only when a calling task has made a call and the called task reaches the point in the code where it accepts the entry. This communication mechanism creates synchronous communication between tasks. Multiple tasks may call a given entry in a common called task. The implicit queuing mechanism assures that the calls will be handled according to the queuing policy in place for the program. A calling task may remove itself from an entry queue any time before its call is accepted using selective entry calls, which are essentially conditional calls to other tasks. Many problems are not best served by fully synchronous communication between tasks. Ada provides mechanisms for asynchronous communication between tasks. The Ada protected object provides asynchronous communication between tasks with implicit protection from inappropriate mutual access to the shared data. There are three kinds of access to a protected object: unconditional read/write, conditional read/write, and shared read-only. Protected procedures provide unconditional read/write access. Protected procedures are given exclusive access to the protected object during the execution of the procedure. Protected entries provide conditional read/ write access to the protected object. Like procedures, an entry is given exclusive access to the protected object during its execution. Unlike protected procedures, protected entries only execute when their boundary condition evaluates to TRUE. Protected functions provide read-only access to protected objects. Any number of protected function calls may execute simultaneously on a protected object. All the access restrictions to a protected object are provided implicitly. The programmer cannot "lock" or "unlock" the protected object explicitly. Unprotected shared variables may be implemented in Ada. Ada provides the ability to assign unprotected variables the attributes of "atomic" and "volatile". Such variables do not have any implicit or explicit locking beyond the locking provided in a multi-processor system for atomic variables. Ada concurrency is defined at a relatively high level of abstraction. The abstraction level is much higher than that provided for Pthreads or Windows threads. Ada concurrent programs can be ported across operating systems without changing the source code to accomodate threading conventions. Simply recompile the unchanged source code on the desired target platform. Jim Rogers ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2006-10-09 6:36 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <1159978124.458442.80520@m7g2000cwm.googlegroups.com> [not found] ` <4523e925$1@nntp.unige.ch> 2006-10-04 21:47 ` Multithreaded scientific programming mrohan 2006-10-05 0:14 ` jimmaureenrogers 2006-10-05 4:18 ` tmoran 2006-10-05 16:32 ` David B. Chorlian 2006-10-05 17:58 ` jimmaureenrogers 2006-10-05 18:35 ` Martin Krischik 2006-10-06 18:22 ` Björn Persson 2006-10-06 20:57 ` jimmaureenrogers 2006-10-09 6:36 ` Martin Krischik 2006-10-05 18:58 ` Francis Burton 2006-10-05 20:20 ` Pascal Obry 2006-10-06 1:28 ` jimmaureenrogers
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox