[gmx-users] problem with GROMACS parallelization

Mark Abraham Mark.Abraham at anu.edu.au
Thu Jan 17 23:41:09 CET 2008


Anna Marabotti wrote:
> Dear all,
> we installed GROMACS v. 3.3.2 on a cluster formed by 20 biprocessor nodes with Centos 4 x86_64, following instructions on the
> GROMACS web site. We compiled it in single precision, parallel version using the --enable-mpi option (LAM MPI was already present on
> the cluster). After the final installation, I made calculations and found that the parallel scalability of the software is very bad:
> in fact, when I use 3 or more processors, the performance is the same (or even worse) than using a single processor.
> As benchmarks, I used two systems: the spider toxin peptide (GROMACS tutorial) in a cubic box of 0.7 nm filled with 2948 water
> molecules (model: spc216) (Total: ~3200 atoms), and a dimeric protein of 718 aa in a cubic box (box edge 0.7 nm) filled with 18425
> water molecules (model: spc216) and 6 Na+ ions (Total: ~62700 atoms). In both cases I set the Gromos 96 FF (G43a1). I used the
> following commands:

There's a parallel benchmark system available from the GROMACS web page. 
I'd recommend using it, so that we can be confident there are not 
problems with the simulation system.

> grompp_mpi -np "nofprocessors" -f pr.mdp -c systemmini.gro -o systempr.tpr -p system.top
> mpirun -np "nofprocessors" mdrun_mpi -s systempr.tpr -o systempr.trr -c systempr.gro -e systempr.edr -g systempr.log

See man grompp about the use of -shuffle, which might help.

> I paste here the results of the performances of position-restrained MD:
> 
> spider toxin peptide:
> 
> np	time		Mnbf/s	GFlops	ns/day	hours/ns
> 1	1m 51s 237	10.797	2.425	7.806	3.074
> 2	0m 59s 614	19.561	4.402	14.664	1.694
> 3	2m 11s 622	9.261	2.082	6.698	3.583
> 4	1m 58s 722	10.283	2.315	7.448	3.222
> 5	1m 40s 580	11.813	2.659	8.554	2.806
> 6	1m 50s 830	10.859	2.442	7.855	3.056
> 7	1m 49s 232	10.878	2.442	7.855	3.056
> 8	2m 2s 292	6.190	1.392	4.477	5.361
> 9	2m 5s 778	9.533	2.150	6.912	3.472
> 10	2m 22s 540	8.349	1.879	6.042	3.972 

These are probably too short to get an idea of scaling, because they 
could still be dominated by setup time. I would simulate for at least 
about 10 minutes to get a good idea. However, merely increasing your 
time won't fix the problem introduced by the absence of -shuffle and/or 
-sort.

You will also see in the bottom of the .log file that you can get an 
idea how (un)balanced the work is across the processes.

> dimeric protein:
> 
> np	time		Mnbf/s	GFlops	ns/day	hours/ns
> 1	13m 51s 391	14.041	2.179	1.043	23.016
> 2	7m 45s 321	25.011	3.883	1.858	12.917
> 3	15m 26s 621	12.586	1.954	0.935	25.667
> 4	15m 15s 3	12.749	1.978	0.946	25.361
> 5	12m 35s 274	15.481	2.401	1.149	20.889
> 6	13m 23s 750	14.517	2.255	1.079	22.250
> 7	12m 25s 659	15.669	2.435	1.164	20.611
> 8	13m 0s 434	14.977	2.325	1.112	21.583
> 9	12m 12s 601	15.949	2.475	1.184	20.278
> 10	13m 1s 724	14.962	2.322	1.111	21.611
> 
> I saw in the GROMACS mailing list that it could be due to a problem of communication between nodes, but it seems to me that nobody
> obtained so bad results before. Has anybody some suggestions - apart waiting for GROMACS 4.0 version ;-) - about some further checks
> to do on the system or different compilation/installation to try? 

See above.

Mark



More information about the gromacs.org_gmx-users mailing list