[gmx-users] mdrun with mpi runs slow

Thu Nov 27 03:13:01 CET 2003

Dennis,

>I'm running gromacs on a cluster with 20 nodes, but mdrun only get's
>less than 5% of the cpu's (each node has two cpu's). The line in 'top'
>looks like this:

That is simply due to the scaling of GROMACS.  Depending on the system the 
optimum use of CPU time can be 1, 2, 4 or may be larger.  Have a look at 
the scaling data on the website.  For my own systems with around 50,000 
atoms and electrostatic cut off it is not worth going past 2 processors or 
4, depending on the cluster being used, beyond that the scaling factor 
falls away rapidly.

>Usually there are other applications on that cluster, they get all the
>available cpu time. But even when there are no other running apps
>doesn't work fast. Some collegues of mine think that the processes are
>waiting for something different than cpu-time (e.g. I/O).
>Has anyone an idea?

It is waiting for communication from the other mdrun parts on other 
nodes.  The type of network connection you have will have a large influence 
on that.

>Here're my commands (there are only 15 nodes available):
>grompp -f grompp.mdp -p topol.top -c conf.gro -o testlauf.tpr -np 15
>lamboot ~/lamboot.startup
>mpirun n0-14 mdrun -nice 0 -s testlauf.tpr -o testlauf.trr -v -c
>testlauf.gro

You should repeat this for 1, 2, 4 etc processors so that you can work out 
the scaling for your particular cluster and simulation system.  Then use 
that gives you the trade off between speed and effiicency.

Catch ya,

Dr. Dallas Warren
Research Fellow
Department of Pharmaceutical Biology and Pharmacology
Victorian College of Pharmacy, Monash University
381 Royal Parade, Parkville VIC 3010
dallas.warren at vcp.monash.edu.au
+61 3 9903 9083
--------------------------------------------------------------------------
When the only tool you own is a hammer, every problem begins to resemble a nail.