[gmx-users] possible configuration for gromacs gpu node

Harry Mark Greenblatt harry.greenblatt at weizmann.ac.il
Tue May 6 09:15:36 CEST 2014


BS"D

Dear All,

  I was asked to provide some examples of what we are doing to assess whether my proposal for a GPU compute node is reasonable
(2 x  3.5GHz E5-2643V2 hexacore, with 2 x Geforce GTX 770; run two jobs, each with six cores and 1 GPU).  I did some tests on a workstation some time ago with Gromacs 4.6.2, and so am including that now.  Please let me know if this is enough information.

 It seems from these test that the CPU (E5-1650, 3.2GHz, and a Quadro K4000) outstripped the GPU.  This GPU has half the CUDA cores of what we are proposing.  System is a protein bound to DS B-DNA (DNA is restrained).  It suggests using a shorter cut-off, but I was using 1.0 here, which is shorter than what I was using in the older cutoff scheme.

Here is the .mdp file


define              = -DPOSRES
integrator          = md
dt                  = 0.002 ; ps ! 2 fs
nsteps              = 500000 ; total 1,000 ps (1ns)
nstcomm             = 10
nstxout             = 500     ; collect data every 1 ps
nstxtcout           = 500
xtc_grps            = Protein DNA Ion
nstenergy           = 100
nstvout             = 0
nstfout             = 0
nstlist             = 10
ns_type             = grid
rlist               = 1.0
coulombtype         = PME
;rcoulomb            = 1.0
rcoulomb            = 1.0
vdwtype             = cut-off
cutoff-scheme       = Verlet
rvdw                = 1.0
pme_order           = 4
ewald_rtol          = 1e-5
optimize_fft        = yes
DispCorr            = no
; OPTIONS FOR BONDS
constraints         = all-bonds
continuation        = yes      ; continuation from NPT PR
constraint_algorithm  = lincs  ; holonomic constraints
lincs_iter            = 1      ; accuracy of LINCS
lincs_order           = 4      ;  also related to accuracy

; Berendsen temperature coupling is on
Tcoupl                = v-rescale
tau_t                 = 0.1     0.1
tc-grps               = protein     non-protein
ref_t                 = 300         300
; Pressure coupling is on
;Pcoupl              = parrinello-rahmana
Pcoupl              = no
Pcoupltype          = isotropic
tau_p               = 1.0
compressibility     = 4.5e-5
ref_p               = 1.0
; Generate velocites is on at 300 K.
gen_vel             = no
gen_temp            = 300.0
gen_seed            = -1
;


And at the end of the run:


Computing:                               M-Number         M-Flops  % Flops
-----------------------------------------------------------------------------
 Pair Search distance check           55582.974192      500246.768     0.1
 NxN QSTab Elec. + VdW [F]         15309048.189184   627670975.757    88.9
 NxN QSTab Elec. + VdW [V&F]         154666.831424     9125343.054     1.3
 1,4 nonbonded interactions            3121.006242      280890.562     0.0
 Calc Weights                         48916.597833     1760997.522     0.2
 Spread Q Bspline                   1043554.087104     2087108.174     0.3
 Gather F Bspline                   1043554.087104     6261324.523     0.9
 3D-FFT                             6907906.423072    55263251.385     7.8
 Solve PME                             2591.791424      165874.651     0.0
 Shift-X                                407.670111        2446.021     0.0
 Angles                                2274.504549      382116.764     0.1
 Propers                               3495.506991      800471.101     0.1
 Impropers                              245.500491       51064.102     0.0
 Pos. Restr.                            325.000650       16250.033     0.0
 Virial                                 163.312656        2939.628     0.0
 Stop-CM                                163.120222        1631.202     0.0
 Calc-Ekin                             3261.165222       88051.461     0.0
 Lincs                                 1262.502525       75750.151     0.0
 Lincs-Mat                            27294.054588      109176.218     0.0
 Constraint-V                         17579.035158      140632.281     0.0
 Constraint-Vir                         163.197633        3916.743     0.0
 Settle                                5018.010036     1620817.242     0.2
-----------------------------------------------------------------------------
 Total                                               706411275.342   100.0
-----------------------------------------------------------------------------


     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

 Computing:         Nodes   Th.     Count  Wall t (s)     G-Cycles       %
-----------------------------------------------------------------------------
 Neighbor search        1    6      12501      36.427      699.418     1.1
 Launch GPU ops.        1    6     500001      35.610      683.734     1.1
 Force                  1    6     500001     123.471     2370.727     3.8
 PME mesh               1    6     500001    1261.777    24227.040    38.9
 Wait GPU local         1    6     500001    1488.623    28582.658    45.9
 NB X/F buffer ops.     1    6     987501      34.047      653.734     1.0
 Write traj.            1    6       1004       4.602       88.359     0.1
 Update                 1    6     500001      41.532      797.453     1.3
 Constraints            1    6     500001     197.492     3791.991     6.1
 Rest                   1                      20.561      394.787     0.6
-----------------------------------------------------------------------------
 Total                  1                    3244.142    62289.902   100.0
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
 PME spread/gather      1    6    1000002     510.512     9802.198    15.7
 PME 3D-FFT             1    6    1000002     683.758    13128.652    21.1
 PME solve              1    6     500001      65.117     1250.298     2.0
-----------------------------------------------------------------------------

GPU timings
-----------------------------------------------------------------------------
 Computing:                         Count  Wall t (s)      ms/step       %
-----------------------------------------------------------------------------
 Pair list H2D                      12501       3.637        0.291     0.1
 X / q H2D                         500001      49.444        0.099     1.7
 Nonbonded F kernel                485000    2685.409        5.537    93.0
 Nonbonded F+ene k.                  2500      18.131        7.252     0.6
 Nonbonded F+prune k.               10000      73.623        7.362     2.5
 Nonbonded F+ene+prune k.            2501      22.572        9.025     0.8
 F D2H                             500001      35.269        0.071     1.2
-----------------------------------------------------------------------------
 Total                                       2888.085        5.776   100.0
-----------------------------------------------------------------------------

Force evaluation time GPU/CPU: 5.776 ms/2.770 ms = 2.085
For optimal performance this ratio should be close to 1!


NOTE: The GPU has >20% more load than the CPU. This imbalance causes
      performance loss, consider using a shorter cut-off and a finer PME grid.

               Core t (s)   Wall t (s)        (%)
       Time:    19439.980     3244.142      599.2
                         54:04
                 (ns/day)    (hour/ns)
Performance:       26.633        0.901
Finished mdrun on node 0 Wed Jul 10 17:12:21 2013


Thanks very much,


Harry


-------------------------------------------------------------------------

Harry M. Greenblatt

Associate Staff Scientist

Dept of Structural Biology           Harry.Greenblatt at weizmann.ac.il<mailto:arry.Greenblatt at weizmann.ac.il>

Weizmann Institute of Science        Phone:  972-8-934-3625

234 Herzl St.                        Facsimile:   972-8-934-4159

Rehovot, 76100

Israel







More information about the gromacs.org_gmx-users mailing list