[gmx-users] problem with GROMACS parallelization
Mark Abraham
Mark.Abraham at anu.edu.au
Thu Jan 17 23:41:09 CET 2008
Anna Marabotti wrote:
> Dear all,
> we installed GROMACS v. 3.3.2 on a cluster formed by 20 biprocessor nodes with Centos 4 x86_64, following instructions on the
> GROMACS web site. We compiled it in single precision, parallel version using the --enable-mpi option (LAM MPI was already present on
> the cluster). After the final installation, I made calculations and found that the parallel scalability of the software is very bad:
> in fact, when I use 3 or more processors, the performance is the same (or even worse) than using a single processor.
> As benchmarks, I used two systems: the spider toxin peptide (GROMACS tutorial) in a cubic box of 0.7 nm filled with 2948 water
> molecules (model: spc216) (Total: ~3200 atoms), and a dimeric protein of 718 aa in a cubic box (box edge 0.7 nm) filled with 18425
> water molecules (model: spc216) and 6 Na+ ions (Total: ~62700 atoms). In both cases I set the Gromos 96 FF (G43a1). I used the
> following commands:
There's a parallel benchmark system available from the GROMACS web page.
I'd recommend using it, so that we can be confident there are not
problems with the simulation system.
> grompp_mpi -np "nofprocessors" -f pr.mdp -c systemmini.gro -o systempr.tpr -p system.top
> mpirun -np "nofprocessors" mdrun_mpi -s systempr.tpr -o systempr.trr -c systempr.gro -e systempr.edr -g systempr.log
See man grompp about the use of -shuffle, which might help.
> I paste here the results of the performances of position-restrained MD:
>
> spider toxin peptide:
>
> np time Mnbf/s GFlops ns/day hours/ns
> 1 1m 51s 237 10.797 2.425 7.806 3.074
> 2 0m 59s 614 19.561 4.402 14.664 1.694
> 3 2m 11s 622 9.261 2.082 6.698 3.583
> 4 1m 58s 722 10.283 2.315 7.448 3.222
> 5 1m 40s 580 11.813 2.659 8.554 2.806
> 6 1m 50s 830 10.859 2.442 7.855 3.056
> 7 1m 49s 232 10.878 2.442 7.855 3.056
> 8 2m 2s 292 6.190 1.392 4.477 5.361
> 9 2m 5s 778 9.533 2.150 6.912 3.472
> 10 2m 22s 540 8.349 1.879 6.042 3.972
These are probably too short to get an idea of scaling, because they
could still be dominated by setup time. I would simulate for at least
about 10 minutes to get a good idea. However, merely increasing your
time won't fix the problem introduced by the absence of -shuffle and/or
-sort.
You will also see in the bottom of the .log file that you can get an
idea how (un)balanced the work is across the processes.
> dimeric protein:
>
> np time Mnbf/s GFlops ns/day hours/ns
> 1 13m 51s 391 14.041 2.179 1.043 23.016
> 2 7m 45s 321 25.011 3.883 1.858 12.917
> 3 15m 26s 621 12.586 1.954 0.935 25.667
> 4 15m 15s 3 12.749 1.978 0.946 25.361
> 5 12m 35s 274 15.481 2.401 1.149 20.889
> 6 13m 23s 750 14.517 2.255 1.079 22.250
> 7 12m 25s 659 15.669 2.435 1.164 20.611
> 8 13m 0s 434 14.977 2.325 1.112 21.583
> 9 12m 12s 601 15.949 2.475 1.184 20.278
> 10 13m 1s 724 14.962 2.322 1.111 21.611
>
> I saw in the GROMACS mailing list that it could be due to a problem of communication between nodes, but it seems to me that nobody
> obtained so bad results before. Has anybody some suggestions - apart waiting for GROMACS 4.0 version ;-) - about some further checks
> to do on the system or different compilation/installation to try?
See above.
Mark
More information about the gromacs.org_gmx-users
mailing list