[gmx-users] Re: MPI tips

Mon Jan 30 23:36:15 CET 2006

Erik Lindahl wrote:

> As for running things with MPI, you don't even need the -np flag  
> anymore - it's autodetected. Just use -np X as an argument to grompp  
> in order to  split your topology for X processors.

Thanks everybody, MPI is now running.  

Unfortunately, while MPI is working, it is not working well.
These tests were all based on the gmxdemo "demo" script converted to
bash which is (temporarily) available here:

  ftp://saf.bio.caltech.edu/pub/pickup/demo_mpi.sh

The results are ok (more or less the same when I look
at the animation) but as the number of compute nodes goes from 1->20
the run times get worse and worse.  For instance, in my modified
demo the second mdrun is controlled by these lines (same
run parameters as from the original demo file):

MYDATE=`which accudate`
...

grompp -np $CNODES -shuffle  -f pr -c ${MOL}_b4pr \
  -r ${MOL}_b4pr -p ${MOL} -o ${MOL}_pr >>output.log 2>&1

...

STARTTIME=`$MYDATE -t0`
(mpirun -np $CNODES -wd $USEWORKDIR $PATHMDRUN \
   -nice 4 -s ${MOL}_pr -o ${MOL}_pr \
    -c ${MOL}_b4md -v) >>output.log 2>&1

echo "mdrun finished, time elapsed: " `$MYDATE -ds $STARTTIME`

Run times measured for various node numbers were:

Nodes Time [CPU usage on compute node]
1     5.4s >98%
2     5.7s 50-70%
4    28.1s 8-12%
20   44.4s 14-16%

In other words, the MPI implementation goes SLOWER with increasing
nodes.  The final mdrun run from the demo script was 46.4s on one
node and 452s on 20.  There is virtually no CPU usage on the master
node.  The compute nodes run slower and slower and it isn't at all
clear to me why.

The beowulf is 100baseT with NFS mounted home directory.  lam-mpi was
used.

Any ideas why the parallel version is running so slow???  I already
tried changing grompp -> grompp_mpi and running without -shuffle
and neither helped.

Thanks,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech