[gmx-users] possible configuration for gromacs gpu node
Harry Mark Greenblatt
harry.greenblatt at weizmann.ac.il
Tue May 6 09:15:36 CEST 2014
BS"D
Dear All,
I was asked to provide some examples of what we are doing to assess whether my proposal for a GPU compute node is reasonable
(2 x 3.5GHz E5-2643V2 hexacore, with 2 x Geforce GTX 770; run two jobs, each with six cores and 1 GPU). I did some tests on a workstation some time ago with Gromacs 4.6.2, and so am including that now. Please let me know if this is enough information.
It seems from these test that the CPU (E5-1650, 3.2GHz, and a Quadro K4000) outstripped the GPU. This GPU has half the CUDA cores of what we are proposing. System is a protein bound to DS B-DNA (DNA is restrained). It suggests using a shorter cut-off, but I was using 1.0 here, which is shorter than what I was using in the older cutoff scheme.
Here is the .mdp file
define = -DPOSRES
integrator = md
dt = 0.002 ; ps ! 2 fs
nsteps = 500000 ; total 1,000 ps (1ns)
nstcomm = 10
nstxout = 500 ; collect data every 1 ps
nstxtcout = 500
xtc_grps = Protein DNA Ion
nstenergy = 100
nstvout = 0
nstfout = 0
nstlist = 10
ns_type = grid
rlist = 1.0
coulombtype = PME
;rcoulomb = 1.0
rcoulomb = 1.0
vdwtype = cut-off
cutoff-scheme = Verlet
rvdw = 1.0
pme_order = 4
ewald_rtol = 1e-5
optimize_fft = yes
DispCorr = no
; OPTIONS FOR BONDS
constraints = all-bonds
continuation = yes ; continuation from NPT PR
constraint_algorithm = lincs ; holonomic constraints
lincs_iter = 1 ; accuracy of LINCS
lincs_order = 4 ; also related to accuracy
; Berendsen temperature coupling is on
Tcoupl = v-rescale
tau_t = 0.1 0.1
tc-grps = protein non-protein
ref_t = 300 300
; Pressure coupling is on
;Pcoupl = parrinello-rahmana
Pcoupl = no
Pcoupltype = isotropic
tau_p = 1.0
compressibility = 4.5e-5
ref_p = 1.0
; Generate velocites is on at 300 K.
gen_vel = no
gen_temp = 300.0
gen_seed = -1
;
And at the end of the run:
Computing: M-Number M-Flops % Flops
-----------------------------------------------------------------------------
Pair Search distance check 55582.974192 500246.768 0.1
NxN QSTab Elec. + VdW [F] 15309048.189184 627670975.757 88.9
NxN QSTab Elec. + VdW [V&F] 154666.831424 9125343.054 1.3
1,4 nonbonded interactions 3121.006242 280890.562 0.0
Calc Weights 48916.597833 1760997.522 0.2
Spread Q Bspline 1043554.087104 2087108.174 0.3
Gather F Bspline 1043554.087104 6261324.523 0.9
3D-FFT 6907906.423072 55263251.385 7.8
Solve PME 2591.791424 165874.651 0.0
Shift-X 407.670111 2446.021 0.0
Angles 2274.504549 382116.764 0.1
Propers 3495.506991 800471.101 0.1
Impropers 245.500491 51064.102 0.0
Pos. Restr. 325.000650 16250.033 0.0
Virial 163.312656 2939.628 0.0
Stop-CM 163.120222 1631.202 0.0
Calc-Ekin 3261.165222 88051.461 0.0
Lincs 1262.502525 75750.151 0.0
Lincs-Mat 27294.054588 109176.218 0.0
Constraint-V 17579.035158 140632.281 0.0
Constraint-Vir 163.197633 3916.743 0.0
Settle 5018.010036 1620817.242 0.2
-----------------------------------------------------------------------------
Total 706411275.342 100.0
-----------------------------------------------------------------------------
R E A L C Y C L E A N D T I M E A C C O U N T I N G
Computing: Nodes Th. Count Wall t (s) G-Cycles %
-----------------------------------------------------------------------------
Neighbor search 1 6 12501 36.427 699.418 1.1
Launch GPU ops. 1 6 500001 35.610 683.734 1.1
Force 1 6 500001 123.471 2370.727 3.8
PME mesh 1 6 500001 1261.777 24227.040 38.9
Wait GPU local 1 6 500001 1488.623 28582.658 45.9
NB X/F buffer ops. 1 6 987501 34.047 653.734 1.0
Write traj. 1 6 1004 4.602 88.359 0.1
Update 1 6 500001 41.532 797.453 1.3
Constraints 1 6 500001 197.492 3791.991 6.1
Rest 1 20.561 394.787 0.6
-----------------------------------------------------------------------------
Total 1 3244.142 62289.902 100.0
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
PME spread/gather 1 6 1000002 510.512 9802.198 15.7
PME 3D-FFT 1 6 1000002 683.758 13128.652 21.1
PME solve 1 6 500001 65.117 1250.298 2.0
-----------------------------------------------------------------------------
GPU timings
-----------------------------------------------------------------------------
Computing: Count Wall t (s) ms/step %
-----------------------------------------------------------------------------
Pair list H2D 12501 3.637 0.291 0.1
X / q H2D 500001 49.444 0.099 1.7
Nonbonded F kernel 485000 2685.409 5.537 93.0
Nonbonded F+ene k. 2500 18.131 7.252 0.6
Nonbonded F+prune k. 10000 73.623 7.362 2.5
Nonbonded F+ene+prune k. 2501 22.572 9.025 0.8
F D2H 500001 35.269 0.071 1.2
-----------------------------------------------------------------------------
Total 2888.085 5.776 100.0
-----------------------------------------------------------------------------
Force evaluation time GPU/CPU: 5.776 ms/2.770 ms = 2.085
For optimal performance this ratio should be close to 1!
NOTE: The GPU has >20% more load than the CPU. This imbalance causes
performance loss, consider using a shorter cut-off and a finer PME grid.
Core t (s) Wall t (s) (%)
Time: 19439.980 3244.142 599.2
54:04
(ns/day) (hour/ns)
Performance: 26.633 0.901
Finished mdrun on node 0 Wed Jul 10 17:12:21 2013
Thanks very much,
Harry
-------------------------------------------------------------------------
Harry M. Greenblatt
Associate Staff Scientist
Dept of Structural Biology Harry.Greenblatt at weizmann.ac.il<mailto:arry.Greenblatt at weizmann.ac.il>
Weizmann Institute of Science Phone: 972-8-934-3625
234 Herzl St. Facsimile: 972-8-934-4159
Rehovot, 76100
Israel
More information about the gromacs.org_gmx-users
mailing list