[gmx-users] GPU waits for CPU, any remedies?

Szilárd Páll pall.szilard at gmail.com
Tue Sep 16 17:16:51 CEST 2014


The PP-PME load balancing done in the beginning of the run should
attempt to shift work from the CPU to GPU. The amount of performance
improvement this can bring is limited, but normally it should still do
its job and decrease the PME load.

However, the PP-PME load balancing output, which could provide a clue
on why do you end up with CPU-GPU load imbalance is missing from your
post! Please post a full log file and not just parts that seem useful.

Cheers,
--
Szilárd


On Tue, Sep 16, 2014 at 3:19 PM, Michael Brunsteiner <mbx0009 at yahoo.com> wrote:
>
>
> hi,
>
> testing a new computer we just got i found that for the system i use performance
> is sub-optimal as the GPU appears to be about 50% faster than the CPU (see below
> for details)
> the dynamic load balancing that is performed automatically at the beginning
> of each simulation does not seem to improve things much, giving, for example:
>
> Force evaluation time GPU/CPU: 1.198 ms/2.156 ms = 0.556
>
> i guess this is so because only 15% of the CPU load are used for
> PME mesh, and the rest for something else (are these the bonded forces??)
>
> if i make the inital rcoulomb in the mdp file larger
> then load balance improves to a value closer to 1, e.g:
>
> Force evaluation time GPU/CPU: 2.720 ms/2.502 ms = 1.087
>
> but the overall performance gets, in fact, worse ...
>
> any suggestions ?? (mdp file included at the bottom of this mail)
>
> thanks,
> michael
>
>
>
> the timing:
>
>
>  Computing:          Num   Num      Call    Wall time         Giga-Cycles
>                      Ranks Threads  Count      (s)         total sum    %
> -----------------------------------------------------------------------------
>  Neighbor search        1   12        251       0.574         23.403   2.1
>  Launch GPU ops.        1   12      10001       0.627         25.569   2.3
>  Force                  1   12      10001      17.392        709.604  64.5
>  PME mesh               1   12      10001       4.172        170.234  15.5
>  Wait GPU local         1   12      10001       0.206          8.401   0.8
>  NB X/F buffer ops.     1   12      19751       0.239          9.736   0.9
>  Write traj.            1   12         11       0.381         15.554   1.4
>  Update                 1   12      10001       0.303         12.365   1.1
>  Constraints            1   12      10001       1.458         59.489   5.4
>  Rest                                           1.621         66.139   6.0
> -----------------------------------------------------------------------------
>  Total                                         26.973       1100.493 100.0
> -----------------------------------------------------------------------------
>  Breakdown of PME mesh computation
> -----------------------------------------------------------------------------
>  PME spread/gather      1   12      20002       3.319        135.423  12.3
>  PME 3D-FFT             1   12      20002       0.616         25.138   2.3
>  PME solve Elec         1   12      10001       0.198          8.066   0.7
> -----------------------------------------------------------------------------
>
>  GPU timings
> -----------------------------------------------------------------------------
>  Computing:                         Count  Wall t (s)      ms/step       %
> -----------------------------------------------------------------------------
>  Pair list H2D                        251       0.036        0.144     0.3
>  X / q H2D                          10001       0.317        0.032     2.6
>  Nonbonded F kernel                  9500      10.492        1.104    87.6
>  Nonbonded F+ene k.                   250       0.404        1.617     3.4
>  Nonbonded F+ene+prune k.             251       0.476        1.898     4.0
>  F D2H                              10001       0.258        0.026     2.2
> -----------------------------------------------------------------------------
>  Total                                         11.984        1.198   100.0
> -----------------------------------------------------------------------------
>
> Force evaluation time GPU/CPU: 1.198 ms/2.156 ms = 0.556
> For optimal performance this ratio should be close to 1!
>
>
>
> md.mdpintegrator               = md
> dt                       = 0.002
> nsteps                   = 10000
> comm-grps                = System
> ;
> nstxout                  = 1000
> nstvout                  = 0
> nstfout                  = 0
> nstlog                   = 1000
> nstenergy                = 1000
> ;
> nstlist                  = 20
> ns_type                  = grid
> pbc                      = xyz
> rlist                    = 1.1
> cutoff-scheme            = Verlet
> ;
> coulombtype              = PME
> rcoulomb                 = 0.9
> vdw_type                 = cut-off
> rvdw                     = 0.9
> DispCorr                 = EnerPres
> ;
> tcoupl                   = Berendsen
> tc-grps                  = System
> tau_t                    = 0.2
> ref_t                    = 298.0
> ;
> gen-vel                  = yes
> gen-temp                 = 240.0
> gen-seed                 = -1
> continuation             = no
> ;
> Pcoupl                   = berendsen
> Pcoupltype               = isotropic
> tau_p                    = 0.5
> compressibility          = 1.0e-5
> ref_p                    = 1.0
> ;
> constraints              = hbonds
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.


More information about the gromacs.org_gmx-users mailing list