[gmx-users] load imbalance in multiple GPU simulations
Szilárd Páll
pall.szilard at gmail.com
Mon Dec 9 01:24:04 CET 2013
There is no value that tells you exactly that, but are clues. However,
you can check in the log file the ratio of smallest and average
(starting) cell size (the value also printed on the terminal with -v)
and that will tell you how much did the DD shrink the middle cell.
What you can also see is that if you run with -dd no, you'll get high
load imbalance, but equal GPU (non-bonded load), but with -dd auto (or
yes) you'll get much smaller load on the second GPU (use nvidia-smi).
Cheers,
--
Szilárd
PS: You can somehow dump the PDB-s corresponding to the individual
domains, but I don't exactly know how to do it (and that's rather
low-level stuff anyway).
On Mon, Dec 9, 2013 at 1:02 AM, yunshi11 . <yunshi09 at gmail.com> wrote:
> Hi Szilard,
>
>
>
>
> On Sun, Dec 8, 2013 at 2:48 PM, Szilárd Páll <pall.szilard at gmail.com> wrote:
>
>> Hi,
>>
>> That's unfortunate, but not unexpected. You are getting a 3x1x1
>> decomposition where the "middle" cell has most of the protein, hence
>> most of the bonded forces to calculate, while the ones on the side
>> have little (or none).
>>
>> From which values can I tell this?
>
>
>> Currently, the only thing you can do is to try using more domains,
>> perhaps with manual decomposition (such that the initial domains will
>> contain as much protein as possible). This may not help much, though.
>> In extreme cases (e.g. small system), even using only two of the three
>> GPUs could improve performance
>
> Cheers,
>> --
>> Szilárd
>>
>>
>> On Sun, Dec 8, 2013 at 8:10 PM, yunshi11 . <yunshi09 at gmail.com> wrote:
>> > Hi all,
>> >
>> > My conventional MD run (equilibration) of a protein in TIP3 water had the
>> > "Average load imbalance: 59.4 %" when running with 3 GPUs + 12 CPU cores.
>> > So I wonder how to tweak parameters to optimize the performance.
>> >
>> > End of the log file reads:
>> >
>> > ......
>> > M E G A - F L O P S A C C O U N T I N G
>> >
>> > NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels
>> > RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
>> > W3=SPC/TIP3p W4=TIP4p (single or pairs)
>> > V&F=Potential and force V=Potential only F=Force only
>> >
>> > Computing: M-Number M-Flops %
>> Flops
>> >
>> -----------------------------------------------------------------------------
>> > Pair Search distance check 78483.330336 706349.973 0.1
>> > NxN QSTab Elec. + VdW [F] 11321254.234368 464171423.609
>> 95.1
>> > NxN QSTab Elec. + VdW [V&F] 114522.922048 6756852.401
>> 1.4
>> > 1,4 nonbonded interactions 1645.932918 148133.963 0.0
>> > Calc Weights 25454.159073 916349.727 0.2
>> > Spread Q Bspline 543022.060224 1086044.120
>> 0.2
>> > Gather F Bspline 543022.060224 3258132.361
>> 0.7
>> > 3D-FFT 1138719.444112 9109755.553
>> 1.9
>> > Solve PME 353.129616 22600.295 0.0
>> > Reset In Box 424.227500 1272.682
>> 0.0
>> > CG-CoM 424.397191 1273.192
>> 0.0
>> > Bonds 330.706614 19511.690 0.0
>> > Angles 1144.322886 192246.245 0.0
>> > Propers 1718.934378 393635.973 0.1
>> > Impropers 134.502690 27976.560 0.0
>> > Pos. Restr. 321.706434 16085.322
>> 0.0
>> > Virial 424.734826 7645.227
>> 0.0
>> > Stop-CM 85.184882 851.849
>> 0.0
>> > P-Coupling 8484.719691 50908.318
>> 0.0
>> > Calc-Ekin 848.794382 22917.448 0.0
>> > Lincs 313.720420 18823.225 0.0
>> > Lincs-Mat 1564.146576 6256.586
>> 0.0
>> > Constraint-V 8651.865815 69214.927 0.0
>> > Constraint-Vir 417.065668 10009.576 0.0
>> > Settle 2674.808325 863963.089
>> 0.2
>> >
>> -----------------------------------------------------------------------------
>> > Total 487878233.910
>> 100.0
>> >
>> -----------------------------------------------------------------------------
>> >
>> >
>> > D O M A I N D E C O M P O S I T I O N S T A T I S T I C S
>> >
>> > av. #atoms communicated per step for force: 2 x 63413.7
>> > av. #atoms communicated per step for LINCS: 2 x 3922.5
>> >
>> > Average load imbalance: 59.4 %
>> > Part of the total run time spent waiting due to load imbalance: 5.0 %
>> >
>> >
>> > R E A L C Y C L E A N D T I M E A C C O U N T I N G
>> >
>> > Computing: Nodes Th. Count Wall t (s) G-Cycles %
>> >
>> -----------------------------------------------------------------------------
>> > Domain decomp. 3 4 2500 42.792 1300.947
>> 4.4
>> > DD comm. load 3 4 31 0.000 0.014
>> 0.0
>> > Neighbor search 3 4 2501 33.076 1005.542
>> 3.4
>> > Launch GPU ops. 3 4 100002 6.537 198.739
>> 0.7
>> > Comm. coord. 3 4 47500 20.349 618.652
>> 2.1
>> > Force 3 4 50001 75.093 2282.944
>> 7.8
>> > Wait + Comm. F 3 4 50001 24.850 755.482
>> 2.6
>> > PME mesh 3 4 50001 597.925 18177.760
>> 62.0
>> > Wait GPU nonlocal 3 4 50001 9.862 299.813
>> 1.0
>> > Wait GPU local 3 4 50001 0.262 7.968
>> 0.0
>> > NB X/F buffer ops. 3 4 195002 33.578 1020.833
>> 3.5
>> > Write traj. 3 4 12 0.506 15.385
>> 0.1
>> > Update 3 4 50001 23.243 706.611
>> 2.4
>> > Constraints 3 4 50001 70.972 2157.657
>> 7.4
>> > Comm. energies 3 4 2501 0.386 11.724
>> 0.0
>> > Rest 3 24.466 743.803
>> 2.5
>> >
>> -----------------------------------------------------------------------------
>> > Total 3 963.899 29303.873
>> 100.0
>> >
>> -----------------------------------------------------------------------------
>> >
>> -----------------------------------------------------------------------------
>> > PME redist. X/F 3 4 100002 121.844 3704.214
>> 12.6
>> > PME spread/gather 3 4 100002 300.759 9143.486
>> 31.2
>> > PME 3D-FFT 3 4 100002 111.366 3385.682
>> 11.6
>> > PME 3D-FFT Comm. 3 4 100002 55.347 1682.636
>> 5.7
>> > PME solve 3 4 50001 8.199 249.246
>> 0.9
>> >
>> -----------------------------------------------------------------------------
>> >
>> > Core t (s) Wall t (s) (%)
>> > Time: 11533.900 963.899 1196.6
>> > (ns/day) (hour/ns)
>> > Performance: 8.964 2.677
>> > Finished mdrun on node 0 Sun Dec 8 11:04:48 2013
>> >
>> >
>> >
>> > And I set rlist = rvdw = rcoulomb = 1.0.
>> >
>> > Is there any documentation that details what those values, e.g. VdW
>> [V&F] ,
>> > mean?
>> >
>> > Thanks,
>> > Yun
>> > --
>> > Gromacs Users mailing list
>> >
>> > * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> posting!
>> >
>> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >
>> > * For (un)subscribe requests visit
>> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> send a mail to gmx-users-request at gromacs.org.
>> --
>> Gromacs Users mailing list
>>
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> send a mail to gmx-users-request at gromacs.org.
>>
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.
More information about the gromacs.org_gmx-users
mailing list