[gmx-users] How to increase the volatile gpu-util
Mark Abraham
mark.j.abraham at gmail.com
Sat Jul 29 09:49:24 CEST 2017
Hi,
There is not much to be done with an RF simulation. The GPU is idle when
doing search, update and constraints. You could try a higher nstlist, as
the message reports, eg gmx mdrun -nstlist 25 but you'll only get a small
improvement.
The way to get better utilisation, if your science needs work with multiple
simulations, is to run two simulations on your node. Those will run out of
phase with eachother, slightly slower, but with nearly double the sampling
throughput. See
http://onlinelibrary.wiley.com/doi/10.1002/jcc.24030/abstract (or same on
arxiv).
Mark
On Sat, 29 Jul 2017 07:24 leila karami <karami.leila1 at gmail.com> wrote:
> Dear Mark,
>
> Thank for your answer.
>
> > Much more relevant is what gromacs reports in the end of the log file.
>
> The end of my log file is as follows:
>
> ============================================================
>
> M E G A - F L O P S A C C O U N T I N G
>
> NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels
> RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
> W3=SPC/TIP3p W4=TIP4p (single or pairs)
> V&F=Potential and force V=Potential only F=Force only
>
> Computing: M-Number M-Flops % Flops
>
> -----------------------------------------------------------------------------
> Pair Search distance check 388751078.685856 3498759708.173 0.3
> NxN RF Elec. + LJ [F] 28110392002.016254 1068194896076.618
> 98.2
> NxN RF Elec. + LJ [V&F] 283943606.686080 15332954761.048 1.4
> Shift-X 5089620.339308 30537722.036 0.0
> Bonds 2100000.007000 123900000.413 0.0
> Angles 1650000.005500 277200000.924 0.0
> Impropers 150000.000500 31200000.104 0.0
> Virial 5090295.339353 91625316.108 0.0
> Stop-CM 1017924.678616 10179246.786 0.0
> P-Coupling 5089620.000000 30537720.000 0.0
> Calc-Ekin 10179240.678616 274839498.323 0.0
> Lincs 525000.005250 31500000.315 0.0
> Lincs-Mat 10800000.108000 43200000.432 0.0
> Constraint-V 1050000.007000 8400000.056 0.0
> Constraint-Vir 26250.001750 630000.042 0.0
>
> -----------------------------------------------------------------------------
> Total 1087980360051.378
> 100.0
>
> -----------------------------------------------------------------------------
>
>
> R E A L C Y C L E A N D T I M E A C C O U N T I N G
>
> On 1 MPI rank, each using 16 OpenMP threads
>
> Computing: Num Num Call Wall time Giga-Cycles
> Ranks Threads Count (s) total sum %
>
> -----------------------------------------------------------------------------
> Neighbor search 1 16 15000001 149304.198 5242873.051 11.7
> Launch GPU ops. 1 16 300000001 17306.465 607723.034 1.4
>
>
> 70691,2 99%
> Neighbor search 1 16 15000001 149304.198 5242873.051 11.7
> Launch GPU ops. 1 16 300000001 17306.465 607723.034 1.4
> Force 1 16 300000001 398616.578 13997571.024 31.3
> Wait GPU local 1 16 300000001 371904.849 13059578.625 29.2
> NB X/F buffer ops. 1 16 585000001 131860.152 4630318.829 10.4
> Write traj. 1 16 7414 1428.772 50171.859 0.1
> Update 1 16 300000001 136357.663 4788250.627 10.7
> Constraints 1 16 300000001 12172.356 427436.850 1.0
> Rest 52848.935 1855810.221 4.2
>
> -----------------------------------------------------------------------------
> Total 1271799.968 44659734.120 100.0
>
> -----------------------------------------------------------------------------
>
> GPU timings
>
> -----------------------------------------------------------------------------
> Computing: Count Wall t (s) ms/step %
>
> -----------------------------------------------------------------------------
> Pair list H2D 15000001 18161.643 1.211 2.3
> X / q H2D 300000001 274639.405 0.915 35.2
> Nonbonded F kernel 285000000 276274.002 0.969 35.4
> Nonbonded F+prune k. 12000000 15443.741 1.287 2.0
> Nonbonded F+ene+prune k. 3000001 3975.874 1.325 0.5
> F D2H 300000001 191562.559 0.639 24.6
>
> -----------------------------------------------------------------------------
> Total 780057.223 2.600 100.0
>
> -----------------------------------------------------------------------------
>
> Force evaluation time GPU/CPU: 2.600 ms/1.329 ms = 1.957
>
> NOTE: 12 % of the run time was spent in pair search,
> you might want to increase nstlist (this has no effect on accuracy)
>
>
> Core t (s) Wall t (s) (%)
> Time: 20365044.088 1271799.968 1601.3
> 14d17h16:39
> (ns/day) (hour/ns)
> Performance: 611.417 0.039
> Finished mdrun on rank 0 Sat Jul 29 05:40:47 2017
>
> =============================================================
>
> Best,
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>
More information about the gromacs.org_gmx-users
mailing list