[gmx-users] load imbalance in multiple GPU simulations
yunshi11 .
yunshi09 at gmail.com
Sun Dec 8 19:11:07 CET 2013
Hi all,
My conventional MD run (equilibration) of a protein in TIP3 water had the
"Average load imbalance: 59.4 %" when running with 3 GPUs + 12 CPU cores.
So I wonder how to tweak parameters to optimize the performance.
End of the log file reads:
......
M E G A - F L O P S A C C O U N T I N G
NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels
RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
W3=SPC/TIP3p W4=TIP4p (single or pairs)
V&F=Potential and force V=Potential only F=Force only
Computing: M-Number M-Flops % Flops
-----------------------------------------------------------------------------
Pair Search distance check 78483.330336 706349.973 0.1
NxN QSTab Elec. + VdW [F] 11321254.234368 464171423.609 95.1
NxN QSTab Elec. + VdW [V&F] 114522.922048 6756852.401 1.4
1,4 nonbonded interactions 1645.932918 148133.963 0.0
Calc Weights 25454.159073 916349.727 0.2
Spread Q Bspline 543022.060224 1086044.120 0.2
Gather F Bspline 543022.060224 3258132.361 0.7
3D-FFT 1138719.444112 9109755.553 1.9
Solve PME 353.129616 22600.295 0.0
Reset In Box 424.227500 1272.682 0.0
CG-CoM 424.397191 1273.192 0.0
Bonds 330.706614 19511.690 0.0
Angles 1144.322886 192246.245 0.0
Propers 1718.934378 393635.973 0.1
Impropers 134.502690 27976.560 0.0
Pos. Restr. 321.706434 16085.322 0.0
Virial 424.734826 7645.227 0.0
Stop-CM 85.184882 851.849 0.0
P-Coupling 8484.719691 50908.318 0.0
Calc-Ekin 848.794382 22917.448 0.0
Lincs 313.720420 18823.225 0.0
Lincs-Mat 1564.146576 6256.586 0.0
Constraint-V 8651.865815 69214.927 0.0
Constraint-Vir 417.065668 10009.576 0.0
Settle 2674.808325 863963.089 0.2
-----------------------------------------------------------------------------
Total 487878233.910 100.0
-----------------------------------------------------------------------------
D O M A I N D E C O M P O S I T I O N S T A T I S T I C S
av. #atoms communicated per step for force: 2 x 63413.7
av. #atoms communicated per step for LINCS: 2 x 3922.5
Average load imbalance: 59.4 %
Part of the total run time spent waiting due to load imbalance: 5.0 %
R E A L C Y C L E A N D T I M E A C C O U N T I N G
Computing: Nodes Th. Count Wall t (s) G-Cycles %
-----------------------------------------------------------------------------
Domain decomp. 3 4 2500 42.792 1300.947 4.4
DD comm. load 3 4 31 0.000 0.014 0.0
Neighbor search 3 4 2501 33.076 1005.542 3.4
Launch GPU ops. 3 4 100002 6.537 198.739 0.7
Comm. coord. 3 4 47500 20.349 618.652 2.1
Force 3 4 50001 75.093 2282.944 7.8
Wait + Comm. F 3 4 50001 24.850 755.482 2.6
PME mesh 3 4 50001 597.925 18177.760 62.0
Wait GPU nonlocal 3 4 50001 9.862 299.813 1.0
Wait GPU local 3 4 50001 0.262 7.968 0.0
NB X/F buffer ops. 3 4 195002 33.578 1020.833 3.5
Write traj. 3 4 12 0.506 15.385 0.1
Update 3 4 50001 23.243 706.611 2.4
Constraints 3 4 50001 70.972 2157.657 7.4
Comm. energies 3 4 2501 0.386 11.724 0.0
Rest 3 24.466 743.803 2.5
-----------------------------------------------------------------------------
Total 3 963.899 29303.873 100.0
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
PME redist. X/F 3 4 100002 121.844 3704.214 12.6
PME spread/gather 3 4 100002 300.759 9143.486 31.2
PME 3D-FFT 3 4 100002 111.366 3385.682 11.6
PME 3D-FFT Comm. 3 4 100002 55.347 1682.636 5.7
PME solve 3 4 50001 8.199 249.246 0.9
-----------------------------------------------------------------------------
Core t (s) Wall t (s) (%)
Time: 11533.900 963.899 1196.6
(ns/day) (hour/ns)
Performance: 8.964 2.677
Finished mdrun on node 0 Sun Dec 8 11:04:48 2013
And I set rlist = rvdw = rcoulomb = 1.0.
Is there any documentation that details what those values, e.g. VdW [V&F] ,
mean?
Thanks,
Yun
More information about the gromacs.org_gmx-users
mailing list