[gmx-users] GPU waits for CPU, any remedies?
Michael Brunsteiner
mbx0009 at yahoo.com
Tue Sep 16 15:32:27 CEST 2014
hi,
testing a new computer we just got i found that for the system i use performance
is sub-optimal as the GPU appears to be about 50% faster than the CPU (see below
for details)
the dynamic load balancing that is performed automatically at the beginning
of each simulation does not seem to improve things much, giving, for example:
Force evaluation time GPU/CPU: 1.198 ms/2.156 ms = 0.556
i guess this is so because only 15% of the CPU load are used for
PME mesh, and the rest for something else (are these the bonded forces??)
if i make the inital rcoulomb in the mdp file larger
then load balance improves to a value closer to 1, e.g:
Force evaluation time GPU/CPU: 2.720 ms/2.502 ms = 1.087
but the overall performance gets, in fact, worse ...
any suggestions ?? (mdp file included at the bottom of this mail)
thanks,
michael
the timing:
Computing: Num Num Call Wall time Giga-Cycles
Ranks Threads Count (s) total sum %
-----------------------------------------------------------------------------
Neighbor search 1 12 251 0.574 23.403 2.1
Launch GPU ops. 1 12 10001 0.627 25.569 2.3
Force 1 12 10001 17.392 709.604 64.5
PME mesh 1 12 10001 4.172 170.234 15.5
Wait GPU local 1 12 10001 0.206 8.401 0.8
NB X/F buffer ops. 1 12 19751 0.239 9.736 0.9
Write traj. 1 12 11 0.381 15.554 1.4
Update 1 12 10001 0.303 12.365 1.1
Constraints 1 12 10001 1.458 59.489 5.4
Rest 1.621 66.139 6.0
-----------------------------------------------------------------------------
Total 26.973 1100.493 100.0
-----------------------------------------------------------------------------
Breakdown of PME mesh computation
-----------------------------------------------------------------------------
PME spread/gather 1 12 20002 3.319 135.423 12.3
PME 3D-FFT 1 12 20002 0.616 25.138 2.3
PME solve Elec 1 12 10001 0.198 8.066 0.7
-----------------------------------------------------------------------------
GPU timings
-----------------------------------------------------------------------------
Computing: Count Wall t (s) ms/step %
-----------------------------------------------------------------------------
Pair list H2D 251 0.036 0.144 0.3
X / q H2D 10001 0.317 0.032 2.6
Nonbonded F kernel 9500 10.492 1.104 87.6
Nonbonded F+ene k. 250 0.404 1.617 3.4
Nonbonded F+ene+prune k. 251 0.476 1.898 4.0
F D2H 10001 0.258 0.026 2.2
-----------------------------------------------------------------------------
Total 11.984 1.198 100.0
-----------------------------------------------------------------------------
Force evaluation time GPU/CPU: 1.198 ms/2.156 ms = 0.556
For optimal performance this ratio should be close to 1!
md.mdpintegrator = md
dt = 0.002
nsteps = 10000
comm-grps = System
;
nstxout = 1000
nstvout = 0
nstfout = 0
nstlog = 1000
nstenergy = 1000
;
nstlist = 20
ns_type = grid
pbc = xyz
rlist = 1.1
cutoff-scheme = Verlet
;
coulombtype = PME
rcoulomb = 0.9
vdw_type = cut-off
rvdw = 0.9
DispCorr = EnerPres
;
tcoupl = Berendsen
tc-grps = System
tau_t = 0.2
ref_t = 298.0
;
gen-vel = yes
gen-temp = 240.0
gen-seed = -1
continuation = no
;
Pcoupl = berendsen
Pcoupltype = isotropic
tau_p = 0.5
compressibility = 1.0e-5
ref_p = 1.0
;
constraints = hbonds
More information about the gromacs.org_gmx-users
mailing list