[gmx-users] cpu gpu performance
Mark Abraham
mark.j.abraham at gmail.com
Mon Jan 5 00:53:56 CET 2015
On Sun, Jan 4, 2015 at 5:41 PM, <h.alizadeh at znu.ac.ir> wrote:
> Dear Users,
> I'm simulating a membrane protein system with approximately 185000 atoms
> with an Intel Corei7 cpu.
> I have two questions:
> 1. Performance of my simulations is about 1.8ns/day. Is this performance
> normal for such a system? Or my simulations are suffering from lack of
> performance?
>
The actual performance depends on everything, of course, but this number is
believable.
> 2. when I use mdrun command with -nb gpu, the performance reduces to
> 1.3ns/day!! How can I resolve this problem?
>
mdrun does a simple offload of all the short-ranged non-bonded work to the
GPU. If the GPU is slow relative to the CPU, then that can be a net loss.
Alternatively, this system could be too large for efficient use of older
GPUs - I don't know what the expected behaviour would be.
my mdp file parameters are:
> integrator = md
> dt = 0.002
> nsteps = 15000000
> nstlog = 1000
> nstxout = 5000
> nstvout = 5000
> nstfout = 5000
> nstcalcenergy = 100
> nstenergy = 1000
> nstxtcout = 2000 ; xtc compressed trajectory output every 2 ps
> ;
> cutoff-scheme = Verlet
> nstlist = 20
> rlist = 1.0
> coulombtype = pme
> rcoulomb = 1.0
> vdwtype = Cut-off
> vdw-modifier = Force-switch
> rvdw_switch = 0.9
> rvdw = 1.0
>
I hope you know why you're using this particular combination of non-bonded
settings. In particular, the use of force-switch requires the short-range
implementation to use table lookups. These tend to be slower than the
implementations of alternative modifiers.
> ;
> tcoupl = berendsen
>
Side point - there are known problems with using the Berendsen thermostat
for production simulation. Use something else.
tc_grps = PROT NPROT SOL_ION
> tau_t = 1.0 1.0 1.0
> ref_t = 303.15 303.15 303.15
> ;
> pcoupl = berendsen
> pcoupltype = semiisotropic
> tau_p = 5.0 5.0
> compressibility = 4.5e-5 4.5e-5
> ref_p = 1.0 1.0
> ;
> ;
> constraints = h-bonds
> constraint_algorithm = LINCS
> continuation = yes
> ;
> nstcomm = 100
> comm_mode = linear
> comm_grps = PROT NPROT SOL_ION
> ;
> refcoord_scaling = com
> and at the end of log file when I use gpu I have:
>
> NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels
> RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
> W3=SPC/TIP3p W4=TIP4p (single or pairs)
> V&F=Potential and force V=Potential only F=Force only
>
> Computing: M-Number M-Flops % Flops
>
> -----------------------------------------------------------------------------
> NB VdW [V&F] 65.721780 65.722 0.0
> Pair Search distance check 354.095696 3186.861 0.1
> NxN QSTab Elec. + LJ [F] 78361.108992 4153138.777 92.2
>
Here's the quadratic-spline tabulated kernels being flagged.
> NxN QSTab Elec. + LJ [V&F] 1094.086656 88621.019 2.0
> 1,4 nonbonded interactions 92.366244 8312.962 0.2
> Calc Weights 273.463938 9844.702 0.2
> Spread Q Bspline 5833.897344 11667.795 0.3
> Gather F Bspline 5833.897344 35003.384 0.8
> 3D-FFT 19866.277292 158930.218 3.5
> Solve PME 5.271904 337.402 0.0
> Shift-X 2.625854 15.755 0.0
> Bonds 14.647068 864.177 0.0
> Propers 106.938468 24488.909 0.5
> Impropers 1.961496 407.991 0.0
> Virial 4.877756 87.800 0.0
> Stop-CM 1.125366 11.254 0.0
> Calc-Ekin 9.753172 263.336 0.0
> Lincs 20.162196 1209.732 0.0
> Lincs-Mat 129.913632 519.655 0.0
> Constraint-V 96.517170 772.137 0.0
> Constraint-Vir 4.084834 98.036 0.0
> Settle 18.730926 6050.089 0.1
> (null) 0.653184 0.000 0.0
>
> -----------------------------------------------------------------------------
> Total 4503897.712 100.0
>
> -----------------------------------------------------------------------------
> R E A L C Y C L E A N D T I M E A C C O U N T I N G
>
> On 1 MPI rank, each using 8 OpenMP threads
>
> Computing: Num Num Call Wall time Giga-Cycles
> Ranks Threads Count (s) total sum %
>
> -----------------------------------------------------------------------------
> Neighbor search 1 8 14 0.301 8.175 0.4
> Launch GPU ops. 1 8 486 0.063 1.719 0.1
> Force 1 8 486 4.351 118.334 6.3
> PME mesh 1 8 486 8.685 236.229 12.5
> Wait GPU local 1 8 486 52.321 1423.144 75.5
>
and here's the CPU spending 75% of its time waiting for the GPU.
> NB X/F buffer ops. 1 8 958 0.389 10.571 0.6
> Write traj. 1 8 1 0.265 7.221 0.4
> Update 1 8 486 0.989 26.887 1.4
> Constraints 1 8 486 1.041 28.308 1.5
> Rest 0.915 24.895 1.3
>
> -----------------------------------------------------------------------------
> Total 69.319 1885.482 100.0
>
> -----------------------------------------------------------------------------
> Breakdown of PME mesh computation
>
> -----------------------------------------------------------------------------
> PME spread/gather 1 8 972 5.574 151.608 8.0
> PME 3D-FFT 1 8 972 2.862 77.836 4.1
> PME solve Elec 1 8 486 0.216 5.880 0.3
>
> -----------------------------------------------------------------------------
>
> GPU timings
>
> -----------------------------------------------------------------------------
> Computing: Count Wall t (s) ms/step %
>
> -----------------------------------------------------------------------------
> Pair list H2D 14 0.027 1.919 0.0
> X / q H2D 486 0.262 0.539 0.4
> Nonbonded F kernel 460 59.334 128.988 90.8
>
and the GPU is just taking a long time to get its work done.
Mark
Nonbonded F+ene k. 12 2.819 234.875 4.3
> Nonbonded F+ene+prune k. 14 2.761 197.239 4.2
> F D2H 486 0.174 0.359 0.3
>
> -----------------------------------------------------------------------------
> Total 65.378 134.522 100.0
>
> -----------------------------------------------------------------------------
>
> Force evaluation time GPU/CPU: 134.522 ms/26.822 ms = 5.015
> For optimal performance this ratio should be close to 1!
> NOTE: The GPU has >20% more load than the CPU. This imbalance causes
> performance loss, consider using a shorter cut-off and a finer PME
> grid.
>
> Core t (s) Wall t (s) (%)
> Time: 550.116 69.319 793.6
> (ns/day) (hour/ns)
> Performance: 1.212 19.810
>
> Best,
> Hadi
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>
More information about the gromacs.org_gmx-users
mailing list