[gmx-users] cpu gpu performance
h.alizadeh at znu.ac.ir
h.alizadeh at znu.ac.ir
Sun Jan 4 17:48:25 CET 2015
Dear Users,
I'm simulating a membrane protein system with approximately 185000 atoms
with an Intel Corei7 cpu.
I have two questions:
1. Performance of my simulations is about 1.8ns/day. Is this performance
normal for such a system? Or my simulations are suffering from lack of
performance?
2. when I use mdrun command with -nb gpu, the performance reduces to
1.3ns/day!! How can I resolve this problem?
my mdp file parameters are:
integrator = md
dt = 0.002
nsteps = 15000000
nstlog = 1000
nstxout = 5000
nstvout = 5000
nstfout = 5000
nstcalcenergy = 100
nstenergy = 1000
nstxtcout = 2000 ; xtc compressed trajectory output every 2 ps
;
cutoff-scheme = Verlet
nstlist = 20
rlist = 1.0
coulombtype = pme
rcoulomb = 1.0
vdwtype = Cut-off
vdw-modifier = Force-switch
rvdw_switch = 0.9
rvdw = 1.0
;
tcoupl = berendsen
tc_grps = PROT NPROT SOL_ION
tau_t = 1.0 1.0 1.0
ref_t = 303.15 303.15 303.15
;
pcoupl = berendsen
pcoupltype = semiisotropic
tau_p = 5.0 5.0
compressibility = 4.5e-5 4.5e-5
ref_p = 1.0 1.0
;
;
constraints = h-bonds
constraint_algorithm = LINCS
continuation = yes
;
nstcomm = 100
comm_mode = linear
comm_grps = PROT NPROT SOL_ION
;
refcoord_scaling = com
and at the end of log file when I use gpu I have:
NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels
RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
W3=SPC/TIP3p W4=TIP4p (single or pairs)
V&F=Potential and force V=Potential only F=Force only
Computing: M-Number M-Flops % Flops
-----------------------------------------------------------------------------
NB VdW [V&F] 65.721780 65.722 0.0
Pair Search distance check 354.095696 3186.861 0.1
NxN QSTab Elec. + LJ [F] 78361.108992 4153138.777 92.2
NxN QSTab Elec. + LJ [V&F] 1094.086656 88621.019 2.0
1,4 nonbonded interactions 92.366244 8312.962 0.2
Calc Weights 273.463938 9844.702 0.2
Spread Q Bspline 5833.897344 11667.795 0.3
Gather F Bspline 5833.897344 35003.384 0.8
3D-FFT 19866.277292 158930.218 3.5
Solve PME 5.271904 337.402 0.0
Shift-X 2.625854 15.755 0.0
Bonds 14.647068 864.177 0.0
Propers 106.938468 24488.909 0.5
Impropers 1.961496 407.991 0.0
Virial 4.877756 87.800 0.0
Stop-CM 1.125366 11.254 0.0
Calc-Ekin 9.753172 263.336 0.0
Lincs 20.162196 1209.732 0.0
Lincs-Mat 129.913632 519.655 0.0
Constraint-V 96.517170 772.137 0.0
Constraint-Vir 4.084834 98.036 0.0
Settle 18.730926 6050.089 0.1
(null) 0.653184 0.000 0.0
-----------------------------------------------------------------------------
Total 4503897.712 100.0
-----------------------------------------------------------------------------
R E A L C Y C L E A N D T I M E A C C O U N T I N G
On 1 MPI rank, each using 8 OpenMP threads
Computing: Num Num Call Wall time Giga-Cycles
Ranks Threads Count (s) total sum %
-----------------------------------------------------------------------------
Neighbor search 1 8 14 0.301 8.175 0.4
Launch GPU ops. 1 8 486 0.063 1.719 0.1
Force 1 8 486 4.351 118.334 6.3
PME mesh 1 8 486 8.685 236.229 12.5
Wait GPU local 1 8 486 52.321 1423.144 75.5
NB X/F buffer ops. 1 8 958 0.389 10.571 0.6
Write traj. 1 8 1 0.265 7.221 0.4
Update 1 8 486 0.989 26.887 1.4
Constraints 1 8 486 1.041 28.308 1.5
Rest 0.915 24.895 1.3
-----------------------------------------------------------------------------
Total 69.319 1885.482 100.0
-----------------------------------------------------------------------------
Breakdown of PME mesh computation
-----------------------------------------------------------------------------
PME spread/gather 1 8 972 5.574 151.608 8.0
PME 3D-FFT 1 8 972 2.862 77.836 4.1
PME solve Elec 1 8 486 0.216 5.880 0.3
-----------------------------------------------------------------------------
GPU timings
-----------------------------------------------------------------------------
Computing: Count Wall t (s) ms/step %
-----------------------------------------------------------------------------
Pair list H2D 14 0.027 1.919 0.0
X / q H2D 486 0.262 0.539 0.4
Nonbonded F kernel 460 59.334 128.988 90.8
Nonbonded F+ene k. 12 2.819 234.875 4.3
Nonbonded F+ene+prune k. 14 2.761 197.239 4.2
F D2H 486 0.174 0.359 0.3
-----------------------------------------------------------------------------
Total 65.378 134.522 100.0
-----------------------------------------------------------------------------
Force evaluation time GPU/CPU: 134.522 ms/26.822 ms = 5.015
For optimal performance this ratio should be close to 1!
NOTE: The GPU has >20% more load than the CPU. This imbalance causes
performance loss, consider using a shorter cut-off and a finer PME
grid.
Core t (s) Wall t (s) (%)
Time: 550.116 69.319 793.6
(ns/day) (hour/ns)
Performance: 1.212 19.810
Best,
Hadi
More information about the gromacs.org_gmx-users
mailing list