[gmx-users] load imbalance in multiple GPU simulations

Szilárd Páll pall.szilard at gmail.com
Mon Dec 9 01:24:04 CET 2013


There is no value that tells you exactly that, but are clues. However,
you can check in the log file the ratio of smallest and average
(starting) cell size (the value also printed on the terminal with -v)
and that will tell you how much did the DD shrink the middle cell.
What you can also see is that if you run with -dd no, you'll get high
load imbalance, but equal GPU (non-bonded load), but with -dd auto (or
yes) you'll get much smaller load on the second GPU (use nvidia-smi).

Cheers,
--
Szilárd

PS: You can somehow dump the PDB-s corresponding to the individual
domains, but I don't exactly know how to do it (and that's rather
low-level stuff anyway).

On Mon, Dec 9, 2013 at 1:02 AM, yunshi11 . <yunshi09 at gmail.com> wrote:
> Hi Szilard,
>
>
>
>
> On Sun, Dec 8, 2013 at 2:48 PM, Szilárd Páll <pall.szilard at gmail.com> wrote:
>
>> Hi,
>>
>> That's unfortunate, but not unexpected. You are getting a 3x1x1
>> decomposition where the "middle" cell has most of the protein, hence
>> most of the bonded forces to calculate, while the ones on the side
>> have little (or none).
>>
>> From which values can I tell this?
>
>
>> Currently, the only thing you can do is to try using more domains,
>> perhaps with manual decomposition (such that the initial domains will
>> contain as much protein as possible). This may not help much, though.
>> In extreme cases (e.g. small system), even using only two of the three
>> GPUs could improve performance
>
> Cheers,
>> --
>> Szilárd
>>
>>
>> On Sun, Dec 8, 2013 at 8:10 PM, yunshi11 . <yunshi09 at gmail.com> wrote:
>> > Hi all,
>> >
>> > My conventional MD run (equilibration) of a protein in TIP3 water had the
>> > "Average load imbalance: 59.4 %" when running with 3 GPUs + 12 CPU cores.
>> > So I wonder how to tweak parameters to optimize the performance.
>> >
>> > End of the log file reads:
>> >
>> > ......
>> >         M E G A - F L O P S   A C C O U N T I N G
>> >
>> >  NB=Group-cutoff nonbonded kernels    NxN=N-by-N cluster Verlet kernels
>> >  RF=Reaction-Field  VdW=Van der Waals  QSTab=quadratic-spline table
>> >  W3=SPC/TIP3p  W4=TIP4p (single or pairs)
>> >  V&F=Potential and force  V=Potential only  F=Force only
>> >
>> >  Computing:                               M-Number         M-Flops  %
>> Flops
>> >
>> -----------------------------------------------------------------------------
>> >  Pair Search distance check           78483.330336    706349.973     0.1
>> >  NxN QSTab Elec. + VdW [F]         11321254.234368   464171423.609
>>  95.1
>> >  NxN QSTab Elec. + VdW [V&F]         114522.922048     6756852.401
>> 1.4
>> >  1,4 nonbonded interactions            1645.932918    148133.963     0.0
>> >  Calc Weights                         25454.159073    916349.727     0.2
>> >  Spread Q Bspline                    543022.060224     1086044.120
>> 0.2
>> >  Gather F Bspline                    543022.060224     3258132.361
>> 0.7
>> >  3D-FFT                             1138719.444112     9109755.553
>> 1.9
>> >  Solve PME                              353.129616     22600.295     0.0
>> >  Reset In Box                           424.227500        1272.682
>> 0.0
>> >  CG-CoM                                 424.397191        1273.192
>> 0.0
>> >  Bonds                                  330.706614     19511.690     0.0
>> >  Angles                                1144.322886    192246.245     0.0
>> >  Propers                               1718.934378    393635.973     0.1
>> >  Impropers                              134.502690     27976.560     0.0
>> >  Pos. Restr.                            321.706434       16085.322
>> 0.0
>> >  Virial                                 424.734826        7645.227
>> 0.0
>> >  Stop-CM                                 85.184882         851.849
>> 0.0
>> >  P-Coupling                            8484.719691       50908.318
>> 0.0
>> >  Calc-Ekin                              848.794382     22917.448     0.0
>> >  Lincs                                  313.720420     18823.225     0.0
>> >  Lincs-Mat                             1564.146576        6256.586
>> 0.0
>> >  Constraint-V                          8651.865815     69214.927     0.0
>> >  Constraint-Vir                         417.065668     10009.576     0.0
>> >  Settle                                2674.808325      863963.089
>> 0.2
>> >
>> -----------------------------------------------------------------------------
>> >  Total                                               487878233.910
>> 100.0
>> >
>> -----------------------------------------------------------------------------
>> >
>> >
>> >     D O M A I N   D E C O M P O S I T I O N   S T A T I S T I C S
>> >
>> >  av. #atoms communicated per step for force:  2 x 63413.7
>> >  av. #atoms communicated per step for LINCS:  2 x 3922.5
>> >
>> >  Average load imbalance: 59.4 %
>> >  Part of the total run time spent waiting due to load imbalance: 5.0 %
>> >
>> >
>> >      R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
>> >
>> >  Computing:         Nodes   Th.     Count  Wall t (s)     G-Cycles     %
>> >
>> -----------------------------------------------------------------------------
>> >  Domain decomp.         3    4       2500      42.792     1300.947
>> 4.4
>> >  DD comm. load          3    4         31       0.000        0.014
>> 0.0
>> >  Neighbor search        3    4       2501      33.076     1005.542
>> 3.4
>> >  Launch GPU ops.        3    4     100002       6.537      198.739
>> 0.7
>> >  Comm. coord.           3    4      47500      20.349      618.652
>> 2.1
>> >  Force                  3    4      50001      75.093     2282.944
>> 7.8
>> >  Wait + Comm. F         3    4      50001      24.850      755.482
>> 2.6
>> >  PME mesh               3    4      50001     597.925    18177.760
>>  62.0
>> >  Wait GPU nonlocal      3    4      50001       9.862      299.813
>> 1.0
>> >  Wait GPU local         3    4      50001       0.262        7.968
>> 0.0
>> >  NB X/F buffer ops.     3    4     195002      33.578     1020.833
>> 3.5
>> >  Write traj.            3    4         12       0.506       15.385
>> 0.1
>> >  Update                 3    4      50001      23.243      706.611
>> 2.4
>> >  Constraints            3    4      50001      70.972     2157.657
>> 7.4
>> >  Comm. energies         3    4       2501       0.386       11.724
>> 0.0
>> >  Rest                   3                      24.466      743.803
>> 2.5
>> >
>> -----------------------------------------------------------------------------
>> >  Total                  3                     963.899    29303.873
>> 100.0
>> >
>> -----------------------------------------------------------------------------
>> >
>> -----------------------------------------------------------------------------
>> >  PME redist. X/F        3    4     100002     121.844     3704.214
>>  12.6
>> >  PME spread/gather      3    4     100002     300.759     9143.486
>>  31.2
>> >  PME 3D-FFT             3    4     100002     111.366     3385.682
>>  11.6
>> >  PME 3D-FFT Comm.       3    4     100002      55.347     1682.636
>> 5.7
>> >  PME solve              3    4      50001       8.199      249.246
>> 0.9
>> >
>> -----------------------------------------------------------------------------
>> >
>> >                Core t (s)   Wall t (s)        (%)
>> >        Time:    11533.900      963.899     1196.6
>> >                  (ns/day)    (hour/ns)
>> > Performance:        8.964        2.677
>> > Finished mdrun on node 0 Sun Dec  8 11:04:48 2013
>> >
>> >
>> >
>> > And I set rlist = rvdw = rcoulomb = 1.0.
>> >
>> > Is there any documentation that details what those values, e.g. VdW
>> [V&F] ,
>> > mean?
>> >
>> > Thanks,
>> > Yun
>> > --
>> > Gromacs Users mailing list
>> >
>> > * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> posting!
>> >
>> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >
>> > * For (un)subscribe requests visit
>> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> send a mail to gmx-users-request at gromacs.org.
>> --
>> Gromacs Users mailing list
>>
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> send a mail to gmx-users-request at gromacs.org.
>>
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.


More information about the gromacs.org_gmx-users mailing list