[gmx-users] hardware problem of GPU?
Mark Abraham
mark.j.abraham at gmail.com
Fri Jun 6 14:08:50 CEST 2014
There's only one rank, so you can only be using one GPU!
Mark
On Jun 6, 2014 8:13 AM, "Albert" <mailmd2011 at gmail.com> wrote:
> Hi Mark:
>
> thanks a lot for reply. Here is my log file informations. I've got
> another GPU machine with two GTX690, and the double CPU job is much faster
> than single GPU. But this dual GTX780Ti is not the case, so I carious about
> what's happening to the hardware since Gromacs was compiled in the same
> way, and the testing system are the same.
>
> thanks a lot
>
> -----------------------------------------------log----------
> ------------------------------------------------------------
> -----------------------
> NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels
> RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
> W3=SPC/TIP3p W4=TIP4p (single or pairs)
> V&F=Potential and force V=Potential only F=Force only
>
> Computing: M-Number M-Flops % Flops
> ------------------------------------------------------------
> -----------------
> Pair Search distance check 449758.183440 4047823.651 0.1
> NxN Ewald Elec. + VdW [F] 114203606.933184 7537438057.590 95.3
> NxN Ewald Elec. + VdW [V&F] 1153624.365888 123437807.150 1.6
> 1,4 nonbonded interactions 30707.512283 2763676.105 0.0
> Calc Weights 413752.665501 14895095.958 0.2
> Spread Q Bspline 8826723.530688 17653447.061 0.2
> Gather F Bspline 8826723.530688 52960341.184 0.7
> 3D-FFT 15297568.453746 122380547.630 1.5
> Solve PME 7839.867456 501751.517 0.0
> Shift-X 3447.992667 20687.956 0.0
> Angles 21342.508537 3585541.434 0.0
> Propers 32957.513183 7547270.519 0.1
> Impropers 3147.501259 654680.262 0.0
> RB-Dihedrals 87.500035 21612.509 0.0
> Virial 13803.055212 248454.994 0.0
> Stop-CM 1379.285334 13792.853 0.0
> Calc-Ekin 27583.610334 744757.479 0.0
> Lincs 11865.004746 711900.285 0.0
> Lincs-Mat 256110.102444 1024440.410 0.0
> Constraint-V 149670.059868 1197360.479 0.0
> Constraint-Vir 13780.555122 330733.323 0.0
> Settle 41980.016792 13559545.424 0.2
> ------------------------------------------------------------
> -----------------
> Total 7905739325.773 100.0
> ------------------------------------------------------------
> -----------------
>
>
> R E A L C Y C L E A N D T I M E A C C O U N T I N G
>
> Computing: Nodes Th. Count Wall t (s) G-Cycles %
> ------------------------------------------------------------
> -----------------
> Neighbor search 1 20 62501 156.247 9380.162 2.3
> Launch GPU ops. 1 20 2500001 182.404 10950.541 2.7
> Force 1 20 2500001 1047.581 62890.858 15.3
> PME mesh 1 20 2500001 2546.280 152864.323 37.3
> Wait GPU local 1 20 2500001 808.773 48554.193 11.8
> NB X/F buffer ops. 1 20 4937501 114.557 6877.380 1.7
> Write traj. 1 20 58 1.380 82.874 0.0
> Update 1 20 2500001 519.331 31177.740 7.6
> Constraints 1 20 2500001 757.477 45474.638 11.1
> Rest 1 694.482 41692.777 10.2
> ------------------------------------------------------------
> -----------------
> Total 1 6828.512 409945.484 100.0
> ------------------------------------------------------------
> -----------------
> ------------------------------------------------------------
> -----------------
> PME spread/gather 1 20 5000002 1910.053 114668.815 28.0
> PME 3D-FFT 1 20 5000002 516.241 30992.236 7.6
> PME solve 1 20 2500001 112.115 6730.761 1.6
> ------------------------------------------------------------
> -----------------
>
> GPU timings
> ------------------------------------------------------------
> -----------------
> Computing: Count Wall t (s) ms/step %
> ------------------------------------------------------------
> -----------------
> Pair list H2D 62501 14.934 0.239 0.3
> X / q H2D 2500001 206.939 0.083 4.6
> Nonbonded F kernel 2250000 3527.275 1.568 78.8
> Nonbonded F+ene k. 187500 405.370 2.162 9.1
> Nonbonded F+ene+prune k. 62501 167.980 2.688 3.8
> F D2H 2500001 154.048 0.062 3.4
> ------------------------------------------------------------
> -----------------
> Total 4476.545 1.791 100.0
> ------------------------------------------------------------
> -----------------
>
> Force evaluation time GPU/CPU: 1.791 ms/1.438 ms = 1.246
> For optimal performance this ratio should be close to 1!
>
>
> NOTE: The GPU has >20% more load than the CPU. This imbalance causes
> performance loss, consider using a shorter cut-off and a finer PME
> grid.
>
> Core t (s) Wall t (s) (%)
> Time: 136384.758 6828.512 1997.3
> 1h53:48
> (ns/day) (hour/ns)
> Performance: 63.264 0.379
>
>
>
>
> On 06/05/2014 10:05 PM, Mark Abraham wrote:
>
>> What did you learn from the performance output at the end of the log file?
>>
>> Mark
>>
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/
> Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>
More information about the gromacs.org_gmx-users
mailing list