Re: Multithreaded scientific programming

comp.lang.ada
 help / color / mirror / Atom feed

* Re: Multithreaded scientific programming
       [not found] ` <4523e925$1@nntp.unige.ch>
@ 2006-10-04 21:47   ` mrohan
  2006-10-05  0:14     ` jimmaureenrogers
  0 siblings, 1 reply; 12+ messages in thread
From: mrohan @ 2006-10-04 21:47 UTC (permalink / raw)

DP wrote:
> Rob McDonald wrote:
> ...
> > To go along with this, what old algorithms have been sitting ignored in
> > the back of textbooks because they need a ton of processors to be
> > efficient?  Maybe it is time to dust them off...
> >
>
> Some attempts to address multi-threaded parallelism that might be suited
> to future 80 cores processors:
>
> Nested data parallelism : http://www.cs.cmu.edu/~scandal/cacm/node4.html
>
> Cilk : http://supertech.lcs.mit.edu/cilk/
>
>
> Dan

Hi,

Since the choice of language is open, I would recommend investigating
using Ada 2005.  Ada has had tasks (mutlithreading) since it's creation
in 1983 and polymorphic object oriented features since the 1995
revision.

Numerical codes tend to be large and Ada's features for programming in
the large should certainly help.  Also, due to it's strong typing
system and overall emphasis on correctness, once something is
implemented in Ada that works, it's generally the "correct" solution.
I've frequently encountered people making updates to numerical code (in
C/Fortran) and summarizing their work with "I'm not sure what I did,
but it works now and I'm not going to touch again".

I've included comp.lang.ada on this thread as I'm sure there are people
there who could give you more advise.

Take care,
Michael.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Multithreaded scientific programming
  2006-10-04 21:47   ` Multithreaded scientific programming mrohan
@ 2006-10-05  0:14     ` jimmaureenrogers
  2006-10-05  4:18       ` tmoran
  2006-10-05 16:32       ` David B. Chorlian
  0 siblings, 2 replies; 12+ messages in thread
From: jimmaureenrogers @ 2006-10-05  0:14 UTC (permalink / raw)


mrohan@ACM.ORG wrote:
> I've included comp.lang.ada on this thread as I'm sure there are people
> there who could give you more advise.

Here is a very modest example of parallel matrix multiplication using
Ada.
A proper Ada example would separate the Matrix type definition along
with the matrix multiplication implementation into a separate package.

This example is compressed into one procedure for easier viewing
on usenet.

The "*" function for type Matrix is defined so that one task is
created to compute each Row * Column value. Ada stores arrays
in row-major order by default.

Each task is a concurrently executing entity. The number of
tasks created in this example equals the number of elements in
the result matrix.
-----------------------------------------------------------------------
-- Simple Parallel Matrix Multiplication
-- Using an ordinary matrix product
-----------------------------------------------------------------------
with Ada.Text_Io;
use Ada.Text_Io;
with Ada.Integer_Text_Io;
use Ada.Integer_Text_Io;

procedure Parallel_Mult is
   type Matrix is array (Positive range <>, Positive range <>) of
Integer;
   Incompatible_Dimensions : exception;

   function "*" (Left, Right : Matrix )
     return Matrix is
      task type Product is
         entry Start (Left_Index, Right_Index : in     Positive );
         entry Report (Value :    out Integer );
      end Product;

      task body Product is
         L_Index,
         R_Index : Positive;
         Sum     : Integer  := 0;
      begin
         accept Start (
               Left_Index,
               Right_Index : in     Positive ) do
            L_Index := Left_Index;
            R_Index := Right_Index;
         end Start;

         for I in Left'range(2) loop
            Sum := Sum + (Right(I, R_Index) *
               Left(L_Index, I));
         end loop;

         accept Report (
               Value :    out Integer ) do
            Value := Sum;
         end Report;
      end Product;

      Product_Array : array (Left'range (1),
                           Right'range (2)) of Product;

      Result : Matrix (Left'range (1), Right'range (2));

   begin    -- "*"
      if Right'Length(1) /= Left'Length(2) then
         raise Incompatible_Dimensions;
      end if;

      for I in Left'range(1) loop
         for J in Right'range(2) loop
            Product_Array(I, J).Start(I, J);
         end loop;
      end loop;
      for I in Left'range(1) loop
         for J in Right'range(2) loop
            Product_Array(I,J).Report(Result(I,J));
         end loop;
      end loop;
      return Result;
   end "*";

   procedure Put (Item : Matrix ) is
   begin
      for I in Item'range(1) loop
         for J in Item'range(2) loop
            Put(Item => Item(I,J), Width => 5);
         end loop;
         New_Line;
      end loop;
   end Put;

   A : Matrix (1 .. 2, 1 .. 3) := ((1, 0, 2), (- 1, 3, 1));
   B : Matrix (1 .. 3, 1 .. 2) := ((3, 1), (2, 1), (1, 0));
begin
   Put_Line("Matrix A:");
   Put(A);
   Put_Line("Matrix B:");
   Put(B);
   Put_Line("Product:");
   Put(A * B);
end Parallel_Mult;


Jim Rogers




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Multithreaded scientific programming
  2006-10-05  0:14     ` jimmaureenrogers
@ 2006-10-05  4:18       ` tmoran
  2006-10-05 16:32       ` David B. Chorlian
  1 sibling, 0 replies; 12+ messages in thread
From: tmoran @ 2006-10-05  4:18 UTC (permalink / raw)


After getting a dual-core (Windows) machine I modified my Ada library
2-D FFT to take a CPU_Count parameter and do the job using that
many separate tasks.  Two Ada tasks makes a very pleasing speedup.
Whether 80 would help, and what size cache you might want, I'll
be curious to find out.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Multithreaded scientific programming
  2006-10-05  0:14     ` jimmaureenrogers
  2006-10-05  4:18       ` tmoran
@ 2006-10-05 16:32       ` David B. Chorlian
  2006-10-05 17:58         ` jimmaureenrogers
                           ` (3 more replies)
  1 sibling, 4 replies; 12+ messages in thread
From: David B. Chorlian @ 2006-10-05 16:32 UTC (permalink / raw)


In <1160007272.875375.190830@b28g2000cwb.googlegroups.com> "jimmaureenrogers@worldnet.att.net" <jimmaureenrogers@worldnet.att.net> writes:

>mrohan@ACM.ORG wrote:
>> I've included comp.lang.ada on this thread as I'm sure there are people
>> there who could give you more advise.

>Here is a very modest example of parallel matrix multiplication using
>Ada.
>A proper Ada example would separate the Matrix type definition along
>with the matrix multiplication implementation into a separate package.

>This example is compressed into one procedure for easier viewing
>on usenet.

>The "*" function for type Matrix is defined so that one task is
>created to compute each Row * Column value. Ada stores arrays
>in row-major order by default.

>Each task is a concurrently executing entity. The number of
>tasks created in this example equals the number of elements in
>the result matrix.
[clip]
>Jim Rogers

Please, let's have something worthwhile rather the above joke.
Talk about cache, TLB, and other memory management topics which
are the serious problems in large-scale scientific computing.

-- 
David B. Chorlian
Neurodynamics Lab  SUNY/HSCB
chorlian@cns.hscbklyn.edu
davidc@panix.com



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Multithreaded scientific programming
  2006-10-05 16:32       ` David B. Chorlian
@ 2006-10-05 17:58         ` jimmaureenrogers
  2006-10-05 18:35         ` Martin Krischik
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: jimmaureenrogers @ 2006-10-05 17:58 UTC (permalink / raw)


David B. Chorlian wrote:
> Please, let's have something worthwhile rather the above joke.
> Talk about cache, TLB, and other memory management topics which
> are the serious problems in large-scale scientific computing.

Ok.
I did not think a code example would be offensive.

It sounds like you might appreciate a discussion concerning the
concurrency features of the Ada language, or would you also
consider that offensive?

Jim Rogers




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Multithreaded scientific programming
  2006-10-05 16:32       ` David B. Chorlian
  2006-10-05 17:58         ` jimmaureenrogers
@ 2006-10-05 18:35         ` Martin Krischik
  2006-10-06 18:22           ` Björn Persson
  2006-10-05 18:58         ` Francis Burton
  2006-10-06  1:28         ` jimmaureenrogers
  3 siblings, 1 reply; 12+ messages in thread
From: Martin Krischik @ 2006-10-05 18:35 UTC (permalink / raw)


David B. Chorlian wrote:

> In <1160007272.875375.190830@b28g2000cwb.googlegroups.com>
> "jimmaureenrogers@worldnet.att.net" <jimmaureenrogers@worldnet.att.net>
> writes:
> 
>>mrohan@ACM.ORG wrote:
>>> I've included comp.lang.ada on this thread as I'm sure there are people
>>> there who could give you more advise.
> 
>>Here is a very modest example of parallel matrix multiplication using
>>Ada.
>>A proper Ada example would separate the Matrix type definition along
>>with the matrix multiplication implementation into a separate package.
> 
>>This example is compressed into one procedure for easier viewing
>>on usenet.
> 
>>The "*" function for type Matrix is defined so that one task is
>>created to compute each Row * Column value. Ada stores arrays
>>in row-major order by default.
> 
>>Each task is a concurrently executing entity. The number of
>>tasks created in this example equals the number of elements in
>>the result matrix.
> [clip]
>>Jim Rogers
> 
> Please, let's have something worthwhile rather the above joke.
> Talk about cache, TLB, and other memory management topics which
> are the serious problems in large-scale scientific computing.

Well, for that I would have to know what a TLB is... This is cross post now.
That can be very fruitfull or a complete waste of time as different ideas
meet each other.

Now, memory management is a word I understand and Ada is quite good at it.
Flexible use of the Stack, user defined storrage pools, arrays with size
calculated and maintained at runtime. Anything in particular you want to
know about?

Martin
-- 
mailto://krischik@users.sourceforge.net
Ada programming at: http://ada.krischik.com



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Multithreaded scientific programming
  2006-10-05 16:32       ` David B. Chorlian
  2006-10-05 17:58         ` jimmaureenrogers
  2006-10-05 18:35         ` Martin Krischik
@ 2006-10-05 18:58         ` Francis Burton
  2006-10-05 20:20           ` Pascal Obry
  2006-10-06  1:28         ` jimmaureenrogers
  3 siblings, 1 reply; 12+ messages in thread
From: Francis Burton @ 2006-10-05 18:58 UTC (permalink / raw)


In article <eg3c27$7kl$1@reader1.panix.com>,
David B. Chorlian  <davidc@panix.com> wrote:
>Please, let's have something worthwhile rather the above joke.

You don't say why it is a joke and not worthwhile. Is this
an opinion, or fact?

>Talk about cache, TLB, and other memory management topics which
>are the serious problems in large-scale scientific computing.

Why don't you start the ball rolling then - your insights
on these topics might be very interesting.

Francis



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Multithreaded scientific programming
  2006-10-05 18:58         ` Francis Burton
@ 2006-10-05 20:20           ` Pascal Obry
  0 siblings, 0 replies; 12+ messages in thread
From: Pascal Obry @ 2006-10-05 20:20 UTC (permalink / raw)
  To: Francis Burton

Francis Burton a ï¿½crit :
> In article <eg3c27$7kl$1@reader1.panix.com>,
> David B. Chorlian  <davidc@panix.com> wrote:
>> Please, let's have something worthwhile rather the above joke.
> 
> You don't say why it is a joke and not worthwhile. Is this
> an opinion, or fact?

Probably not a joke as this was presented by Jim as a "modest example".
I also think that this is not a proper solution as creating 9 tasks for
each 3x3 matrix product is a waste of resources. But isn't this the path
followed by OpenMP ?

First of all : I'm no expert on this domain.

I tend to think that in a real world scientific applications there is a
lot of vector/matrix/whatever computations to be done. Also not the
whole application is computation intensive (data loading or
initialization for example).

I like to think as a work flow. Some tasks are preparing data to be
computed and storing them in a queue. Some others are getting the job to
be done from those queues. In this case the tasks are on the same
computer or distributed (here Ada Distributed Annex can be of great
use). The deal is to balance the different tasks to avoid starvation
(from reader tasks) or waiting when queues are full.

At least in such architecture the number of tasks is mostly static. We
do not waste too much time creating and finalizing tasks/threads.

Anyway, here Ada concurrency and distributed support can be used instead
of the OpenMP and MPI couple as used in many Fortran and C/C++ applications.

Pascal.

-- 

--|------------------------------------------------------
--| Pascal Obry                           Team-Ada Member
--| 45, rue Gabriel Peri - 78114 Magny Les Hameaux FRANCE
--|------------------------------------------------------
--|              http://www.obry.net
--| "The best way to travel is by means of imagination"
--|
--| gpg --keyserver wwwkeys.pgp.net --recv-key C1082595

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Multithreaded scientific programming
  2006-10-05 16:32       ` David B. Chorlian
                           ` (2 preceding siblings ...)
  2006-10-05 18:58         ` Francis Burton
@ 2006-10-06  1:28         ` jimmaureenrogers
  3 siblings, 0 replies; 12+ messages in thread
From: jimmaureenrogers @ 2006-10-06  1:28 UTC (permalink / raw)

David B. Chorlian wrote:
> Talk about cache, TLB, and other memory management topics which
> are the serious problems in large-scale scientific computing.

Memory management is one of the serious problems of large-scale
computing in general. It is not limited to scientific computing.

Memory management for large scale computing can be divided
into several sub-topics. I will cover some of those topics. I trust
others will comment on the same areas and add comments on
areas I do not discuss.

Ada provides implicit memory management for large scale
computing in the areas of data encapsulation, direct
communication between tasks, communication through
implicitly protected shared buffers, and communication
through unprotected shared variables.

Every Ada program has at least one task. The "main" or
entry point procedure for the program runs within an
"environment" task. All other tasks depend on a master
task. Furthermore, if a task depends on a given master,
it is defined to depend on the task that executes the master,
and (recursively) on any master of that task.

The variables declared statically within a task are normally
allocated on the stack for that task. Those variables are
visible to subprograms and tasks defined within the scope
of the task defining the variable. Tasks do not need to be
defined within the scope of other tasks. Tasks may be
defined by the elaboration of an object declaration in a
package depended upon by the environment task.

Tasks may directly communicate with each other using
the Rendezvous model. The called task declares an
entry with zero or more parameters. The calling task
calls that entry. All entry calls are implicitly queued.
The entry call is completed only when a calling task
has made a call and the called task reaches the point
in the code where it accepts the entry. This
communication mechanism creates synchronous
communication between tasks. Multiple tasks may
call a given entry in a common called task. The
implicit queuing mechanism assures that the calls
will be handled according to the queuing policy in
place for the program. A calling task may remove
itself from an entry queue any time before its call
is accepted using selective entry calls, which are
essentially conditional calls to other tasks.

Many problems are not best served by fully
synchronous communication between tasks.
Ada provides mechanisms for asynchronous
communication between tasks. The Ada protected
object provides asynchronous communication
between tasks with implicit protection from
inappropriate mutual access to the shared data.
There are three kinds of access to a protected
object: unconditional read/write, conditional
read/write, and shared read-only. Protected
procedures provide unconditional read/write
access. Protected procedures are given
exclusive access to the protected object
during the execution of the procedure.
Protected entries provide conditional read/
write access to the protected object. Like
procedures, an entry is given exclusive
access to the protected object during its
execution. Unlike protected procedures,
protected entries only execute when their
boundary condition evaluates to TRUE.
Protected functions provide read-only
access to protected objects. Any number
of protected function calls may execute
simultaneously on a protected object.

All the access restrictions to a protected
object are provided implicitly. The programmer
cannot "lock" or "unlock" the protected object
explicitly.

Unprotected shared variables may be
implemented in Ada. Ada provides the ability
to assign unprotected variables the attributes
of "atomic" and "volatile". Such variables do not
have any implicit or explicit locking beyond the
locking provided in a multi-processor system for
atomic variables.

Ada concurrency is defined at a relatively high
level of abstraction. The abstraction level is much
higher than that provided for Pthreads or Windows
threads. Ada concurrent programs can be ported
across operating systems without changing
the source code to accomodate threading
conventions. Simply recompile the unchanged
source code on the desired target platform.

Jim Rogers

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Multithreaded scientific programming
  2006-10-05 18:35         ` Martin Krischik
@ 2006-10-06 18:22           ` Björn Persson
  2006-10-06 20:57             ` jimmaureenrogers
  2006-10-09  6:36             ` Martin Krischik
  0 siblings, 2 replies; 12+ messages in thread
From: Björn Persson @ 2006-10-06 18:22 UTC (permalink / raw)


Martin Krischik wrote:
> David B. Chorlian wrote:
>> Talk about cache, TLB, and other memory management topics which
>> are the serious problems in large-scale scientific computing.
> 
> Well, for that I would have to know what a TLB is...

Translation lookaside buffer ï¿½ a small cache memory for the page table, 
which the memory management unit uses to speed up the translation from 
virtual addresses to hardware addresses. I guess David wants to write 
programs such that they only relatively rarely need to access memory 
pages that aren't in the TLB.

-- 
Bjï¿½rn Persson                              PGP key A88682FD
                    omb jor ers @sv ge.
                    r o.b n.p son eri nu



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Multithreaded scientific programming
  2006-10-06 18:22           ` Björn Persson
@ 2006-10-06 20:57             ` jimmaureenrogers
  2006-10-09  6:36             ` Martin Krischik
  1 sibling, 0 replies; 12+ messages in thread
From: jimmaureenrogers @ 2006-10-06 20:57 UTC (permalink / raw)

Björn Persson wrote:
> Martin Krischik wrote:
> > David B. Chorlian wrote:
> >> Talk about cache, TLB, and other memory management topics which
> >> are the serious problems in large-scale scientific computing.
> >
> > Well, for that I would have to know what a TLB is...
>
> Translation lookaside buffer - a small cache memory for the page table,
> which the memory management unit uses to speed up the translation from
> virtual addresses to hardware addresses. I guess David wants to write
> programs such that they only relatively rarely need to access memory
> pages that aren't in the TLB.
>

This sounds highly system dependent. Do all processors have the same
size TLB?

I know some current Intel dual core processors have a 16 Megabyte
cache, while the roughly equivalent AMD processors have a 2
Megabyte cache. It would seem that cache size and contents
could be very important for optimizations at the level one worrys
about TLB contents. On the other hand, the AMD dual core
processors have up to a 3000 Mts speed while the roughly
equivalent Intel processors have a 1600 Mts speed.

Both AMD and Intel employ the same instruction sets.
Compilers are not likely to recognize differences between
processors from the two vendors.

Is it common to write scientific programming in such an
unportable manner?

Jim Rogers

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Multithreaded scientific programming
  2006-10-06 18:22           ` Björn Persson
  2006-10-06 20:57             ` jimmaureenrogers
@ 2006-10-09  6:36             ` Martin Krischik
  1 sibling, 0 replies; 12+ messages in thread
From: Martin Krischik @ 2006-10-09  6:36 UTC (permalink / raw)

Bjï¿½rn Persson schrieb:
> Martin Krischik wrote:
>> David B. Chorlian wrote:
>>> Talk about cache, TLB, and other memory management topics which
>>> are the serious problems in large-scale scientific computing.
>>
>> Well, for that I would have to know what a TLB is...
> 
> Translation lookaside buffer ï¿½ a small cache memory for the page table, 
> which the memory management unit uses to speed up the translation from 
> virtual addresses to hardware addresses. I guess David wants to write 
> programs such that they only relatively rarely need to access memory 
> pages that aren't in the TLB.

Well, in that case I would say that's exactly what the -mtune compiler 
option is for.

So David, to answer to your question about the TLB: This has has nothing 
to do with the programming language front end (Ada, Fortran, C, etc. 
pp.) - it's handled by the compiler back end (Pentium, Athlon, PowerPC, 
etc. pp).

Perhaps you have only worked with compilers which only feature one front 
and one back end - but this is not always the case. In fact, most Ada 
compiler feature multiple back ends.

Martin

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2006-10-09  6:36 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1159978124.458442.80520@m7g2000cwm.googlegroups.com>
     [not found] ` <4523e925$1@nntp.unige.ch>
2006-10-04 21:47   ` Multithreaded scientific programming mrohan
2006-10-05  0:14     ` jimmaureenrogers
2006-10-05  4:18       ` tmoran
2006-10-05 16:32       ` David B. Chorlian
2006-10-05 17:58         ` jimmaureenrogers
2006-10-05 18:35         ` Martin Krischik
2006-10-06 18:22           ` Björn Persson
2006-10-06 20:57             ` jimmaureenrogers
2006-10-09  6:36             ` Martin Krischik
2006-10-05 18:58         ` Francis Burton
2006-10-05 20:20           ` Pascal Obry
2006-10-06  1:28         ` jimmaureenrogers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox