[gmx-developers] Alternate Parallelization scheme

Sat Jun 18 17:23:38 CEST 2005

I'm porting GROMACS to IBM's Blue Gene system.  At present, the dpcc
benchmark  runs fastest at about 64 processors.  A deeper look into the
execution shows that of the ~1100 seconds of walltime the run takes, ~500
seconds are used for MPI communication.  I'd like to see if this can be
reduced.

Accordingly, I'd like to implement an MPI collective routine rather than
the ring structure currently implemented.  The most important parts to
parallelize seem like the move_x and move_f functions (perhaps also
sum_f?).

(1) in what function are the x and f arrays declared?  I assume both are
cartesian triplets - is this defined in a struct or are the arrays flat?

(2) Which particles does each node control?  I assume that the grompp
options -sort and -shuffle destroy the easy mapping, rank=0 gets the first
10 atoms, rank=1 gets the second 10 atoms etc.

regards,

Nathan Moore