[gmx-users] cpu gpu performance

h.alizadeh at znu.ac.ir h.alizadeh at znu.ac.ir
Sun Jan 4 17:48:25 CET 2015


Dear Users,
I'm simulating a membrane protein system with approximately 185000 atoms
with an  Intel Corei7 cpu.
I have two questions:
1. Performance of my simulations is about 1.8ns/day. Is this performance
normal for such a system? Or my simulations are suffering from lack of
performance?
2. when I use mdrun command with -nb gpu, the performance reduces to
1.3ns/day!! How can I resolve this problem?

my mdp file parameters are:
integrator              = md
dt                      = 0.002
nsteps                  = 15000000
nstlog                  = 1000
nstxout                 = 5000
nstvout                 = 5000
nstfout                 = 5000
nstcalcenergy           = 100
nstenergy               = 1000
nstxtcout    = 2000        ; xtc compressed trajectory output every 2 ps
;
cutoff-scheme           = Verlet
nstlist                 = 20
rlist                   = 1.0
coulombtype             = pme
rcoulomb                = 1.0
vdwtype                 = Cut-off
vdw-modifier            = Force-switch
rvdw_switch             = 0.9
rvdw                    = 1.0
;
tcoupl                  = berendsen
tc_grps                 = PROT   NPROT   SOL_ION
tau_t                   = 1.0    1.0     1.0
ref_t                   = 303.15   303.15   303.15
;
pcoupl                  = berendsen
pcoupltype              = semiisotropic
tau_p                   = 5.0     5.0
compressibility         = 4.5e-5  4.5e-5
ref_p                   = 1.0     1.0
;
;
constraints             = h-bonds
constraint_algorithm    = LINCS
continuation        = yes
;
nstcomm                 = 100
comm_mode               = linear
comm_grps               = PROT   NPROT   SOL_ION
;
refcoord_scaling        = com
and at the end of log file when I use gpu I have:

NB=Group-cutoff nonbonded kernels    NxN=N-by-N cluster Verlet kernels
 RF=Reaction-Field  VdW=Van der Waals  QSTab=quadratic-spline table
 W3=SPC/TIP3p  W4=TIP4p (single or pairs)
 V&F=Potential and force  V=Potential only  F=Force only

 Computing:                               M-Number         M-Flops  % Flops
-----------------------------------------------------------------------------
 NB VdW [V&F]                            65.721780          65.722     0.0
 Pair Search distance check             354.095696        3186.861     0.1
 NxN QSTab Elec. + LJ [F]             78361.108992     4153138.777    92.2
 NxN QSTab Elec. + LJ [V&F]            1094.086656       88621.019     2.0
 1,4 nonbonded interactions              92.366244        8312.962     0.2
 Calc Weights                           273.463938        9844.702     0.2
 Spread Q Bspline                      5833.897344       11667.795     0.3
 Gather F Bspline                      5833.897344       35003.384     0.8
 3D-FFT                               19866.277292      158930.218     3.5
 Solve PME                                5.271904         337.402     0.0
 Shift-X                                  2.625854          15.755     0.0
 Bonds                                   14.647068         864.177     0.0
 Propers                                106.938468       24488.909     0.5
 Impropers                                1.961496         407.991     0.0
 Virial                                   4.877756          87.800     0.0
 Stop-CM                                  1.125366          11.254     0.0
 Calc-Ekin                                9.753172         263.336     0.0
 Lincs                                   20.162196        1209.732     0.0
 Lincs-Mat                              129.913632         519.655     0.0
 Constraint-V                            96.517170         772.137     0.0
 Constraint-Vir                           4.084834          98.036     0.0
 Settle                                  18.730926        6050.089     0.1
 (null)                                   0.653184           0.000     0.0
-----------------------------------------------------------------------------
 Total                                                 4503897.712   100.0
-----------------------------------------------------------------------------
     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

On 1 MPI rank, each using 8 OpenMP threads

 Computing:          Num   Num      Call    Wall time         Giga-Cycles
                     Ranks Threads  Count      (s)         total sum    %
-----------------------------------------------------------------------------
 Neighbor search        1    8         14       0.301          8.175   0.4
 Launch GPU ops.        1    8        486       0.063          1.719   0.1
 Force                  1    8        486       4.351        118.334   6.3
 PME mesh               1    8        486       8.685        236.229  12.5
 Wait GPU local         1    8        486      52.321       1423.144  75.5
 NB X/F buffer ops.     1    8        958       0.389         10.571   0.6
 Write traj.            1    8          1       0.265          7.221   0.4
 Update                 1    8        486       0.989         26.887   1.4
 Constraints            1    8        486       1.041         28.308   1.5
 Rest                                           0.915         24.895   1.3
-----------------------------------------------------------------------------
 Total                                         69.319       1885.482 100.0
-----------------------------------------------------------------------------
 Breakdown of PME mesh computation
-----------------------------------------------------------------------------
 PME spread/gather      1    8        972       5.574        151.608   8.0
 PME 3D-FFT             1    8        972       2.862         77.836   4.1
 PME solve Elec         1    8        486       0.216          5.880   0.3
-----------------------------------------------------------------------------

 GPU timings
-----------------------------------------------------------------------------
 Computing:                         Count  Wall t (s)      ms/step       %
-----------------------------------------------------------------------------
 Pair list H2D                         14       0.027        1.919     0.0
 X / q H2D                            486       0.262        0.539     0.4
 Nonbonded F kernel                   460      59.334      128.988    90.8
 Nonbonded F+ene k.                    12       2.819      234.875     4.3
 Nonbonded F+ene+prune k.              14       2.761      197.239     4.2
 F D2H                                486       0.174        0.359     0.3
-----------------------------------------------------------------------------
 Total                                         65.378      134.522   100.0
-----------------------------------------------------------------------------

Force evaluation time GPU/CPU: 134.522 ms/26.822 ms = 5.015
For optimal performance this ratio should be close to 1!
NOTE: The GPU has >20% more load than the CPU. This imbalance causes
      performance loss, consider using a shorter cut-off and a finer PME
grid.

               Core t (s)   Wall t (s)        (%)
       Time:      550.116       69.319      793.6
                 (ns/day)    (hour/ns)
Performance:        1.212       19.810

Best,
Hadi



More information about the gromacs.org_gmx-users mailing list