[gmx-users] Re: MPI tips

Tue Jan 31 01:53:57 CET 2006

Again, I really don't know much about MPI... But maybe you should try
running something that takes longer than a few seconds. There is always
overhead involved with starting jobs, and I expect that the amount of
overhead gets worse as you increase the number of processors (since
splitting up the job takes some work). Everything I've heard is that GROMACS
scales reasonably well, and certainly doesn't get slower as you increase the
processor number. But this doesn't necessarily hold true if you're running
extremely short jobs, I would think.

David

On 1/30/06, David Mathog <mathog at caltech.edu> wrote:
>
> Erik Lindahl wrote:
>
> > As for running things with MPI, you don't even need the -np flag
> > anymore - it's autodetected. Just use -np X as an argument to grompp
> > in order to  split your topology for X processors.
>
> Thanks everybody, MPI is now running.
>
> Unfortunately, while MPI is working, it is not working well.
> These tests were all based on the gmxdemo "demo" script converted to
> bash which is (temporarily) available here:
>
>   ftp://saf.bio.caltech.edu/pub/pickup/demo_mpi.sh
>
> The results are ok (more or less the same when I look
> at the animation) but as the number of compute nodes goes from 1->20
> the run times get worse and worse.  For instance, in my modified
> demo the second mdrun is controlled by these lines (same
> run parameters as from the original demo file):
>
> MYDATE=`which accudate`
> ...
>
>
> grompp -np $CNODES -shuffle  -f pr -c ${MOL}_b4pr \
>   -r ${MOL}_b4pr -p ${MOL} -o ${MOL}_pr >>output.log 2>&1
>
> ...
>
> STARTTIME=`$MYDATE -t0`
> (mpirun -np $CNODES -wd $USEWORKDIR $PATHMDRUN \
>    -nice 4 -s ${MOL}_pr -o ${MOL}_pr \
>     -c ${MOL}_b4md -v) >>output.log 2>&1
>
> echo "mdrun finished, time elapsed: " `$MYDATE -ds $STARTTIME`
>
> Run times measured for various node numbers were:
>
> Nodes Time [CPU usage on compute node]
> 1     5.4s >98%
> 2     5.7s 50-70%
> 4    28.1s 8-12%
> 20   44.4s 14-16%
>
> In other words, the MPI implementation goes SLOWER with increasing
> nodes.  The final mdrun run from the demo script was 46.4s on one
> node and 452s on 20.  There is virtually no CPU usage on the master
> node.  The compute nodes run slower and slower and it isn't at all
> clear to me why.
>
> The beowulf is 100baseT with NFS mounted home directory.  lam-mpi was
> used.
>
> Any ideas why the parallel version is running so slow???  I already
> tried changing grompp -> grompp_mpi and running without -shuffle
> and neither helped.
>
> Thanks,
>
> David Mathog
> mathog at caltech.edu
> Manager, Sequence Analysis Facility, Biology Division, Caltech
> _______________________________________________
> gmx-users mailing list    gmx-users at gromacs.org
> http://www.gromacs.org/mailman/listinfo/gmx-users
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/mailing_lists/users.php
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20060130/80d29633/attachment.html>