[gmx-users] load imbalance in multiple GPU simulations
yunshi11 .
yunshi09 at gmail.com
Mon Dec 9 00:32:26 CET 2013
Hi Szilard,
On Sun, Dec 8, 2013 at 2:48 PM, Szilárd Páll <pall.szilard at gmail.com> wrote:
> Hi,
>
> That's unfortunate, but not unexpected. You are getting a 3x1x1
> decomposition where the "middle" cell has most of the protein, hence
> most of the bonded forces to calculate, while the ones on the side
> have little (or none).
>
> From which values can I tell this?
> Currently, the only thing you can do is to try using more domains,
> perhaps with manual decomposition (such that the initial domains will
> contain as much protein as possible). This may not help much, though.
> In extreme cases (e.g. small system), even using only two of the three
> GPUs could improve performance
Cheers,
> --
> Szilárd
>
>
> On Sun, Dec 8, 2013 at 8:10 PM, yunshi11 . <yunshi09 at gmail.com> wrote:
> > Hi all,
> >
> > My conventional MD run (equilibration) of a protein in TIP3 water had the
> > "Average load imbalance: 59.4 %" when running with 3 GPUs + 12 CPU cores.
> > So I wonder how to tweak parameters to optimize the performance.
> >
> > End of the log file reads:
> >
> > ......
> > M E G A - F L O P S A C C O U N T I N G
> >
> > NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels
> > RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
> > W3=SPC/TIP3p W4=TIP4p (single or pairs)
> > V&F=Potential and force V=Potential only F=Force only
> >
> > Computing: M-Number M-Flops %
> Flops
> >
> -----------------------------------------------------------------------------
> > Pair Search distance check 78483.330336 706349.973 0.1
> > NxN QSTab Elec. + VdW [F] 11321254.234368 464171423.609
> 95.1
> > NxN QSTab Elec. + VdW [V&F] 114522.922048 6756852.401
> 1.4
> > 1,4 nonbonded interactions 1645.932918 148133.963 0.0
> > Calc Weights 25454.159073 916349.727 0.2
> > Spread Q Bspline 543022.060224 1086044.120
> 0.2
> > Gather F Bspline 543022.060224 3258132.361
> 0.7
> > 3D-FFT 1138719.444112 9109755.553
> 1.9
> > Solve PME 353.129616 22600.295 0.0
> > Reset In Box 424.227500 1272.682
> 0.0
> > CG-CoM 424.397191 1273.192
> 0.0
> > Bonds 330.706614 19511.690 0.0
> > Angles 1144.322886 192246.245 0.0
> > Propers 1718.934378 393635.973 0.1
> > Impropers 134.502690 27976.560 0.0
> > Pos. Restr. 321.706434 16085.322
> 0.0
> > Virial 424.734826 7645.227
> 0.0
> > Stop-CM 85.184882 851.849
> 0.0
> > P-Coupling 8484.719691 50908.318
> 0.0
> > Calc-Ekin 848.794382 22917.448 0.0
> > Lincs 313.720420 18823.225 0.0
> > Lincs-Mat 1564.146576 6256.586
> 0.0
> > Constraint-V 8651.865815 69214.927 0.0
> > Constraint-Vir 417.065668 10009.576 0.0
> > Settle 2674.808325 863963.089
> 0.2
> >
> -----------------------------------------------------------------------------
> > Total 487878233.910
> 100.0
> >
> -----------------------------------------------------------------------------
> >
> >
> > D O M A I N D E C O M P O S I T I O N S T A T I S T I C S
> >
> > av. #atoms communicated per step for force: 2 x 63413.7
> > av. #atoms communicated per step for LINCS: 2 x 3922.5
> >
> > Average load imbalance: 59.4 %
> > Part of the total run time spent waiting due to load imbalance: 5.0 %
> >
> >
> > R E A L C Y C L E A N D T I M E A C C O U N T I N G
> >
> > Computing: Nodes Th. Count Wall t (s) G-Cycles %
> >
> -----------------------------------------------------------------------------
> > Domain decomp. 3 4 2500 42.792 1300.947
> 4.4
> > DD comm. load 3 4 31 0.000 0.014
> 0.0
> > Neighbor search 3 4 2501 33.076 1005.542
> 3.4
> > Launch GPU ops. 3 4 100002 6.537 198.739
> 0.7
> > Comm. coord. 3 4 47500 20.349 618.652
> 2.1
> > Force 3 4 50001 75.093 2282.944
> 7.8
> > Wait + Comm. F 3 4 50001 24.850 755.482
> 2.6
> > PME mesh 3 4 50001 597.925 18177.760
> 62.0
> > Wait GPU nonlocal 3 4 50001 9.862 299.813
> 1.0
> > Wait GPU local 3 4 50001 0.262 7.968
> 0.0
> > NB X/F buffer ops. 3 4 195002 33.578 1020.833
> 3.5
> > Write traj. 3 4 12 0.506 15.385
> 0.1
> > Update 3 4 50001 23.243 706.611
> 2.4
> > Constraints 3 4 50001 70.972 2157.657
> 7.4
> > Comm. energies 3 4 2501 0.386 11.724
> 0.0
> > Rest 3 24.466 743.803
> 2.5
> >
> -----------------------------------------------------------------------------
> > Total 3 963.899 29303.873
> 100.0
> >
> -----------------------------------------------------------------------------
> >
> -----------------------------------------------------------------------------
> > PME redist. X/F 3 4 100002 121.844 3704.214
> 12.6
> > PME spread/gather 3 4 100002 300.759 9143.486
> 31.2
> > PME 3D-FFT 3 4 100002 111.366 3385.682
> 11.6
> > PME 3D-FFT Comm. 3 4 100002 55.347 1682.636
> 5.7
> > PME solve 3 4 50001 8.199 249.246
> 0.9
> >
> -----------------------------------------------------------------------------
> >
> > Core t (s) Wall t (s) (%)
> > Time: 11533.900 963.899 1196.6
> > (ns/day) (hour/ns)
> > Performance: 8.964 2.677
> > Finished mdrun on node 0 Sun Dec 8 11:04:48 2013
> >
> >
> >
> > And I set rlist = rvdw = rcoulomb = 1.0.
> >
> > Is there any documentation that details what those values, e.g. VdW
> [V&F] ,
> > mean?
> >
> > Thanks,
> > Yun
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>
More information about the gromacs.org_gmx-users
mailing list