[gmx-users] GPU waits for CPU, any remedies?

Michael Brunsteiner mbx0009 at yahoo.com
Tue Sep 16 15:32:27 CEST 2014


testing a new computer we just got i found that for the system i use performance
is sub-optimal as the GPU appears to be about 50% faster than the CPU (see below
for details)
the dynamic load balancing that is performed automatically at the beginning
of each simulation does not seem to improve things much, giving, for example:

Force evaluation time GPU/CPU: 1.198 ms/2.156 ms = 0.556

i guess this is so because only 15% of the CPU load are used for
PME mesh, and the rest for something else (are these the bonded forces??)

if i make the inital rcoulomb in the mdp file larger
then load balance improves to a value closer to 1, e.g:

Force evaluation time GPU/CPU: 2.720 ms/2.502 ms = 1.087

but the overall performance gets, in fact, worse ... 

any suggestions ?? (mdp file included at the bottom of this mail)


the timing:

 Computing:          Num   Num      Call    Wall time         Giga-Cycles
                     Ranks Threads  Count      (s)         total sum    %
 Neighbor search        1   12        251       0.574         23.403   2.1
 Launch GPU ops.        1   12      10001       0.627         25.569   2.3
 Force                  1   12      10001      17.392        709.604  64.5
 PME mesh               1   12      10001       4.172        170.234  15.5
 Wait GPU local         1   12      10001       0.206          8.401   0.8
 NB X/F buffer ops.     1   12      19751       0.239          9.736   0.9
 Write traj.            1   12         11       0.381         15.554   1.4
 Update                 1   12      10001       0.303         12.365   1.1
 Constraints            1   12      10001       1.458         59.489   5.4
 Rest                                           1.621         66.139   6.0
 Total                                         26.973       1100.493 100.0
 Breakdown of PME mesh computation
 PME spread/gather      1   12      20002       3.319        135.423  12.3
 PME 3D-FFT             1   12      20002       0.616         25.138   2.3
 PME solve Elec         1   12      10001       0.198          8.066   0.7

 GPU timings
 Computing:                         Count  Wall t (s)      ms/step       %
 Pair list H2D                        251       0.036        0.144     0.3
 X / q H2D                          10001       0.317        0.032     2.6
 Nonbonded F kernel                  9500      10.492        1.104    87.6
 Nonbonded F+ene k.                   250       0.404        1.617     3.4
 Nonbonded F+ene+prune k.             251       0.476        1.898     4.0
 F D2H                              10001       0.258        0.026     2.2
 Total                                         11.984        1.198   100.0

Force evaluation time GPU/CPU: 1.198 ms/2.156 ms = 0.556
For optimal performance this ratio should be close to 1!

md.mdpintegrator               = md
dt                       = 0.002
nsteps                   = 10000
comm-grps                = System
nstxout                  = 1000
nstvout                  = 0
nstfout                  = 0
nstlog                   = 1000
nstenergy                = 1000
nstlist                  = 20
ns_type                  = grid
pbc                      = xyz
rlist                    = 1.1
cutoff-scheme            = Verlet
coulombtype              = PME
rcoulomb                 = 0.9
vdw_type                 = cut-off 
rvdw                     = 0.9
DispCorr                 = EnerPres
tcoupl                   = Berendsen
tc-grps                  = System
tau_t                    = 0.2
ref_t                    = 298.0
gen-vel                  = yes
gen-temp                 = 240.0
gen-seed                 = -1
continuation             = no
Pcoupl                   = berendsen 
Pcoupltype               = isotropic
tau_p                    = 0.5
compressibility          = 1.0e-5
ref_p                    = 1.0
constraints              = hbonds

More information about the gromacs.org_gmx-users mailing list