[gmx-users] >60% slowdown with GPU / verlet and sd integrator

Berk Hess gmx3 at hotmail.com
Wed Jan 16 15:44:59 CET 2013


Hi,

Unfortunately this is not a bug, but a feature!
We made the non-bondeds so fast on the GPU that integration and constraints take more time.
The sd1 integrator is almost as fast as the md integrator, but slightly less accurate.
In most cases that's a good solution.

I closed the redmine issue:
http://redmine.gromacs.org/issues/1121

Cheers,

Berk

----------------------------------------
> Date: Wed, 16 Jan 2013 17:26:18 +0300
> Subject: Re: [gmx-users] >60% slowdown with GPU / verlet and sd integrator
> From: jmsstarlight at gmail.com
> To: gmx-users at gromacs.org
>
> Hi all!
>
> I've also done some calculations with the SD integraator used as the
> thermostat ( without t_coupl ) with the system of 65k atoms I obtained
> 10ns\day performance on gtc 670 and 4th core i5.
> I haventrun any simulations with MD integrator yet so It should test it.
>
> James
>
> 2013/1/15 Szilárd Páll <szilard.pall at cbr.su.se>:
> > Hi Floris,
> >
> > Great feedback, this needs to be looked into. Could you please file a bug
> > report, preferably with a tpr (and/or all inputs) as well as log files.
> >
> > Thanks,
> >
> > --
> > Szilárd
> >
> >
> > On Tue, Jan 15, 2013 at 3:50 AM, Floris Buelens <floris_buelens at yahoo.com>wrote:
> >
> >> Hi,
> >>
> >>
> >> I'm seeing MD simulation running a lot slower with the sd integrator than
> >> with md - ca. 10 vs. 30 ns/day for my 47000 atom system. I found no
> >> documented indication that this should be the case.
> >> Timings and logs pasted in below - wall time seems to be accumulating up
> >> in Update and Rest, adding up to >60% of total. The effect is still there
> >> without GPU, ca. 40% slowdown when switching from group to Verlet with the
> >> SD integrator
> >> System: Xeon E5-1620, 1x GTX 680, gromacs
> >> 4.6-beta3-dev-20130107-e66851a-unknown, GCC 4.4.6 and 4.7.0
> >>
> >> I didn't file a bug report yet as I don't have much variety of testing
> >> conditions available right now, I hope someone else has a moment to try to
> >> reproduce?
> >>
> >> Timings:
> >>
> >> cpu (ns/day)
> >> sd / verlet: 6
> >> sd / group: 10
> >> md / verlet: 9.2
> >> md / group: 11.4
> >>
> >> gpu (ns/day)
> >> sd / verlet: 11
> >> md / verlet: 29.8
> >>
> >>
> >>
> >> **************MD integrator, GPU / verlet
> >>
> >> M E G A - F L O P S A C C O U N T I N G
> >>
> >> NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels
> >> RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
> >> W3=SPC/TIP3p W4=TIP4p (single or pairs)
> >> V&F=Potential and force V=Potential only F=Force only
> >>
> >> Computing: M-Number M-Flops % Flops
> >>
> >> -----------------------------------------------------------------------------
> >> Pair Search distance check 1244.988096 11204.893 0.1
> >> NxN QSTab Elec. + VdW [F] 194846.615488 7988711.235 91.9
> >> NxN QSTab Elec. + VdW [V&F] 2009.923008 118585.457 1.4
> >> 1,4 nonbonded interactions 31.616322 2845.469 0.0
> >> Calc Weights 703.010574 25308.381 0.3
> >> Spread Q Bspline 14997.558912 29995.118 0.3
> >> Gather F Bspline 14997.558912 89985.353 1.0
> >> 3D-FFT 47658.567884 381268.543 4.4
> >> Solve PME 20.580896 1317.177 0.0
> >> Shift-X 9.418458 56.511 0.0
> >> Angles 21.879375 3675.735 0.0
> >> Propers 48.599718 11129.335 0.1
> >> Virial 23.498403 422.971 0.0
> >> Stop-CM 2.436616 24.366 0.0
> >> Calc-Ekin 93.809716 2532.862 0.0
> >> Lincs 12.147284 728.837 0.0
> >> Lincs-Mat 131.328750 525.315 0.0
> >> Constraint-V 246.633614 1973.069 0.0
> >> Constraint-Vir 23.486379 563.673 0.0
> >> Settle 74.129451 23943.813 0.3
> >>
> >> -----------------------------------------------------------------------------
> >> Total 8694798.114 100.0
> >>
> >> -----------------------------------------------------------------------------
> >>
> >>
> >> R E A L C Y C L E A N D T I M E A C C O U N T I N G
> >>
> >> Computing: Nodes Th. Count Wall t (s) G-Cycles %
> >>
> >> -----------------------------------------------------------------------------
> >> Neighbor search 1 8 201 0.944 27.206 3.3
> >> Launch GPU ops. 1 8 5001 0.371 10.690 1.3
> >> Force 1 8 5001 2.185 62.987 7.7
> >> PME mesh 1 8 5001 15.033 433.441 52.9
> >> Wait GPU local 1 8 5001 1.551 44.719 5.5
> >> NB X/F buffer ops. 1 8 9801 0.538 15.499 1.9
> >> Write traj. 1 8 2 0.725 20.912 2.6
> >> Update 1 8 5001 2.318 66.826 8.2
> >> Constraints 1 8 5001 2.898 83.551 10.2
> >> Rest 1 1.832 52.828 6.5
> >>
> >> -----------------------------------------------------------------------------
> >> Total 1 28.394 818.659 100.0
> >>
> >> -----------------------------------------------------------------------------
> >>
> >> -----------------------------------------------------------------------------
> >> PME spread/gather 1 8 10002 8.745 252.144 30.8
> >> PME 3D-FFT 1 8 10002 5.392 155.458 19.0
> >> PME solve 1 8 5001 0.869 25.069 3.1
> >>
> >> -----------------------------------------------------------------------------
> >>
> >> GPU timings
> >>
> >> -----------------------------------------------------------------------------
> >> Computing: Count Wall t (s) ms/step %
> >>
> >> -----------------------------------------------------------------------------
> >> Pair list H2D 201 0.080 0.397 0.4
> >> X / q H2D 5001 0.698 0.140 3.7
> >> Nonbonded F kernel 4400 14.856 3.376 79.1
> >> Nonbonded F+ene k. 400 1.667 4.167 8.9
> >> Nonbonded F+prune k. 100 0.441 4.407 2.3
> >> Nonbonded F+ene+prune k. 101 0.535 5.300 2.9
> >> F D2H 5001 0.501 0.100 2.7
> >>
> >> -----------------------------------------------------------------------------
> >> Total 18.778 3.755 100.0
> >>
> >> -----------------------------------------------------------------------------
> >>
> >> Force evaluation time GPU/CPU: 3.755 ms/3.443 ms = 1.091
> >> For optimal performance this ratio should be close to 1!
> >>
> >>
> >> Core t (s) Wall t (s) (%)
> >> Time: 221.730 28.394 780.9
> >> (ns/day) (hour/ns)
> >> Performance: 30.435 0.789
> >>
> >>
> >>
> >>
> >> *****************SD integrator, GPU / verlet
> >>
> >> M E G A - F L O P S A C C O U N T I N G
> >>
> >> NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels
> >> RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
> >> W3=SPC/TIP3p W4=TIP4p (single or pairs)
> >> V&F=Potential and force V=Potential only F=Force only
> >>
> >> Computing: M-Number M-Flops % Flops
> >>
> >> -----------------------------------------------------------------------------
> >> Pair Search distance check 1254.604928 11291.444 0.1
> >> NxN QSTab Elec. + VdW [F] 197273.059584 8088195.443 91.6
> >> NxN QSTab Elec. + VdW [V&F] 2010.150784 118598.896 1.3
> >> 1,4 nonbonded interactions 31.616322 2845.469 0.0
> >> Calc Weights 703.010574 25308.381 0.3
> >> Spread Q Bspline 14997.558912 29995.118 0.3
> >> Gather F Bspline 14997.558912 89985.353 1.0
> >> 3D-FFT 47473.892284 379791.138 4.3
> >> Solve PME 20.488896 1311.289 0.0
> >> Shift-X 9.418458 56.511 0.0
> >> Angles 21.879375 3675.735 0.0
> >> Propers 48.599718 11129.335 0.1
> >> Virial 23.498403 422.971 0.0
> >> Update 234.336858 7264.443 0.1
> >> Stop-CM 2.436616 24.366 0.0
> >> Calc-Ekin 93.809716 2532.862 0.0
> >> Lincs 24.289712 1457.383 0.0
> >> Lincs-Mat 262.605000 1050.420 0.0
> >> Constraint-V 246.633614 1973.069 0.0
> >> Constraint-Vir 23.486379 563.673 0.0
> >> Settle 148.229268 47878.054 0.5
> >>
> >> -----------------------------------------------------------------------------
> >> Total 8825351.354 100.0
> >>
> >> -----------------------------------------------------------------------------
> >>
> >>
> >> R E A L C Y C L E A N D T I M E A C C O U N T I N G
> >>
> >> Computing: Nodes Th. Count Wall t (s) G-Cycles %
> >>
> >> -----------------------------------------------------------------------------
> >> Neighbor search 1 8 201 0.945 27.212 1.2
> >> Launch GPU ops. 1 8 5001 0.384 11.069 0.5
> >> Force 1 8 5001 2.180 62.791 2.7
> >> PME mesh 1 8 5001 15.029 432.967 18.5
> >> Wait GPU local 1 8 5001 3.327 95.844 4.1
> >> NB X/F buffer ops. 1 8 9801 0.542 15.628 0.7
> >> Write traj. 1 8 2 0.749 21.582 0.9
> >> Update 1 8 5001 28.044 807.908 34.5
> >> Constraints 1 8 10002 5.562 160.243 6.8
> >> Rest 1 24.488 705.458 30.1
> >>
> >> -----------------------------------------------------------------------------
> >> Total 1 81.250 2340.701 100.0
> >>
> >> -----------------------------------------------------------------------------
> >>
> >> -----------------------------------------------------------------------------
> >> PME spread/gather 1 8 10002 8.769 252.615 10.8
> >> PME 3D-FFT 1 8 10002 5.367 154.630 6.6
> >> PME solve 1 8 5001 0.865 24.910 1.1
> >>
> >> -----------------------------------------------------------------------------
> >>
> >> GPU timings
> >>
> >> -----------------------------------------------------------------------------
> >> Computing: Count Wall t (s) ms/step %
> >>
> >> -----------------------------------------------------------------------------
> >> Pair list H2D 201 0.080 0.398 0.4
> >> X / q H2D 5001 0.699 0.140 3.4
> >> Nonbonded F kernel 4400 16.271 3.698 79.6
> >> Nonbonded F+ene k. 400 1.827 4.568 8.9
> >> Nonbonded F+prune k. 100 0.482 4.816 2.4
> >> Nonbonded F+ene+prune k. 101 0.584 5.787 2.9
> >> F D2H 5001 0.505 0.101 2.5
> >>
> >> -----------------------------------------------------------------------------
> >> Total 20.448 4.089 100.0
> >>
> >> -----------------------------------------------------------------------------
> >>
> >> Force evaluation time GPU/CPU: 4.089 ms/3.441 ms = 1.188
> >> For optimal performance this ratio should be close to 1!
> >>
> >> Core t (s) Wall t (s) (%)
> >> Time: 643.440 81.250 791.9
> >> (ns/day) (hour/ns)
> >> Performance: 10.636 2.256
> >> --
> >> gmx-users mailing list gmx-users at gromacs.org
> >> http://lists.gromacs.org/mailman/listinfo/gmx-users
> >> * Please search the archive at
> >> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> >> * Please don't post (un)subscribe requests to the list. Use the
> >> www interface or send it to gmx-users-request at gromacs.org.
> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>
> > --
> > gmx-users mailing list gmx-users at gromacs.org
> > http://lists.gromacs.org/mailman/listinfo/gmx-users
> > * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> > * Please don't post (un)subscribe requests to the list. Use the
> > www interface or send it to gmx-users-request at gromacs.org.
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> --
> gmx-users mailing list gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
 		 	   		  


More information about the gromacs.org_gmx-users mailing list