[gmx-users] GPU / CPU load imblance

Wed Jun 26 00:33:55 CEST 2013

Hi gmx-users,

    I used  8-cores AMD CPU  with a GTX680 GPU [ with 1536 CUDA Cores]  to
run an example of Umbrella Sampling provided by Justin.
I am happy that GPU acceleration indeed helps me reduce significant time (
from 34 hours to 7 hours)  of computation in this example.
However, I found there was a NOTE on the screen like

++++++++++++++++++++++++++++++++++++++++++
 The GPU has >20% more load than the CPU. This imbalance causes
performance loss, consider using a shorter cut-off and a finer PME grid
 ++++++++++++++++++++++++++++++++++++++++++

Given a 20% load imbalance, I wonder if someone can give suggestions as to
how to avoid performance loss in terms of hardware (GPU/CPU)
improvement  or  the modification of  mdp file (see below).

In terms of hardware,  dose this NOTE suggest that I should use a
higher-capacity GPU like GTX 780 [ with 2304 CUDA Cores] to balance load or
catch up speed  ?
If so,   can it help by adding  another card with  GTX 680 GPU in the same
box ?  but will it cause GPU/CPU imbalance load  again, which two GPU keep
waiting for 8-cores CPU  ?

Second,

++++++++++++++++++++++++++++++++++++++++++
Force evaluation time GPU/CPU: 4.006 ms/2.578 ms = 1.554
For optimal performance this ratio should be close to 1
++++++++++++++++++++++++++++++++++++++++++

I have no idea how this is evaluated by 4.006 ms and 2.578 ms for GPU and
CPU time, respectively.

It will be very helpful to modify  the attached mdp for a better
load balance between GPU and CPU.

I appreciate kind advice and hints to improve this mdp file.

Thanks,

Dwey

########### courtesy  to  Justin #########

title       = Umbrella pulling simulation
define      = -DPOSRES_B
; Run parameters
integrator  = md
dt          = 0.002
tinit       = 0
nsteps      = 5000000   ; 10 ns
nstcomm     = 10
; Output parameters
nstxout     = 50000     ; every 100 ps
nstvout     = 50000
nstfout     = 5000
nstxtcout   = 5000      ; every 10 ps
nstenergy   = 5000
; Bond parameters
constraint_algorithm    = lincs
constraints             = all-bonds
continuation            = yes
; Single-range cutoff scheme
nstlist     = 5
ns_type     = grid
rlist       = 1.4
rcoulomb    = 1.4
rvdw        = 1.4
; PME electrostatics parameters
coulombtype     = PME
fourierspacing  = 0.12
fourier_nx      = 0
fourier_ny      = 0
fourier_nz      = 0
pme_order       = 4
ewald_rtol      = 1e-5
optimize_fft    = yes
; Berendsen temperature coupling is on in two groups
Tcoupl      = Nose-Hoover
tc_grps     = Protein   Non-Protein
tau_t       = 0.5       0.5
ref_t       = 310       310
; Pressure coupling is on
Pcoupl          = Parrinello-Rahman
pcoupltype      = isotropic
tau_p           = 1.0
compressibility = 4.5e-5
ref_p           = 1.0
refcoord_scaling = com
; Generate velocities is off
gen_vel     = no
; Periodic boundary conditions are on in all directions
pbc     = xyz
; Long-range dispersion correction
DispCorr    = EnerPres
cutoff-scheme   = Verlet
; Pull code
pull            = umbrella
pull_geometry   = distance
pull_dim        = N N Y
pull_start      = yes
pull_ngroups    = 1
pull_group0     = Chain_B
pull_group1     = Chain_A
pull_init1      = 0
pull_rate1      = 0.0
pull_k1         = 1000      ; kJ mol^-1 nm^-2
pull_nstxout    = 1000      ; every 2 ps
pull_nstfout    = 1000      ; every 2 ps