[gmx-users] >60% slowdown with GPU / verlet and sd integrator

Thu Jan 17 16:31:24 CET 2013

Hi,

Please use the fix I put on the redmine issue, as that's even faster and you can use sd again.

We should probably rephrase the note a bit in case the GPU has more work to do than the CPU.
In your case there is simple no work to do for the CPU.
Ideally we would let the CPU handle some non-bonded work, but that probably won't happen in 4.6.

A solution might be buying a 3x a fast GPU.

Cheers,

Berk

----------------------------------------
> Date: Thu, 17 Jan 2013 18:48:17 +0400
> Subject: Re: [gmx-users] >60% slowdown with GPU / verlet and sd integrator
> From: jmsstarlight at gmail.com
> To: gmx-users at gromacs.org
>
> Dear Gromacs Developers!
>
> Using sd1 integrator I've obtain good performance with the core-5 +
> GTX 670 ( 13ns\per day) for the system of 60k atoms. That results on
> 30% better than with the sd integrator.
>
> Buit on my another work-station which differs only by slower GPU ( GT
> 640). I've obtained some gpu\cpu mis-match.
>
> Force evaluation time GPU/CPU: 6.835 ms/2.026 ms = 3.373 ( # on
> the first station with GTX 670 I ve obtained GPU/CPU: ratio close to
> 1.
>
> At both cases I'm using the same simulation parameters with 0,8
> cutoffs (it's also important that in the second case I've calculated
> another system consisted of 33k atoms by means of umbrella sampling
> pulling)). Could you tell me how I could increase performance on my
> second station ( to reduce gpucpu ratio) ? I've attached log for that
> simulation here http://www.sendspace.com/file/x0e3z8
>
> James
>
> 2013/1/17 Szilárd Páll <szilard.pall at cbr.su.se>:
> > Hi,
> >
> > Just to note for the users who might read this: the report is valid, some
> > non-thread-parallel code is the reason and we hope to have a fix for 4.6.0.
> >
> > For updates, follow the issue #1211.
> >
> > Cheers,
> >
> > --
> > Szilárd
> >
> >
> > On Wed, Jan 16, 2013 at 4:45 PM, Berk Hess <gmx3 at hotmail.com> wrote:
> >
> >>
> >> The issue I'm referring to is about a factor of 2 in update and
> >> constraints, but here it's much more.
> >> I just found out that the SD update is not OpenMP threaded (and I even
> >> noted in the code why this is).
> >> I reopened the issue and will find a solution.
> >>
> >> Cheers.
> >>
> >> Berk
> >>
> >> ----------------------------------------
> >> > Date: Wed, 16 Jan 2013 16:20:32 +0100
> >> > Subject: Re: [gmx-users] >60% slowdown with GPU / verlet and sd
> >> integrator
> >> > From: mark.j.abraham at gmail.com
> >> > To: gmx-users at gromacs.org
> >> >
> >> > We should probably note this effect on the wiki somewhere?
> >> >
> >> > Mark
> >> >
> >> > On Wed, Jan 16, 2013 at 3:44 PM, Berk Hess <gmx3 at hotmail.com> wrote:
> >> >
> >> > >
> >> > > Hi,
> >> > >
> >> > > Unfortunately this is not a bug, but a feature!
> >> > > We made the non-bondeds so fast on the GPU that integration and
> >> > > constraints take more time.
> >> > > The sd1 integrator is almost as fast as the md integrator, but slightly
> >> > > less accurate.
> >> > > In most cases that's a good solution.
> >> > >
> >> > > I closed the redmine issue:
> >> > > http://redmine.gromacs.org/issues/1121
> >> > >
> >> > > Cheers,
> >> > >
> >> > > Berk
> >> > >
> >> > > ----------------------------------------
> >> > > > Date: Wed, 16 Jan 2013 17:26:18 +0300
> >> > > > Subject: Re: [gmx-users] >60% slowdown with GPU / verlet and sd
> >> > > integrator
> >> > > > From: jmsstarlight at gmail.com
> >> > > > To: gmx-users at gromacs.org
> >> > > >
> >> > > > Hi all!
> >> > > >
> >> > > > I've also done some calculations with the SD integraator used as the
> >> > > > thermostat ( without t_coupl ) with the system of 65k atoms I
> >> obtained
> >> > > > 10ns\day performance on gtc 670 and 4th core i5.
> >> > > > I haventrun any simulations with MD integrator yet so It should test
> >> it.
> >> > > >
> >> > > > James
> >> > > >
> >> > > > 2013/1/15 Szilárd Páll <szilard.pall at cbr.su.se>:
> >> > > > > Hi Floris,
> >> > > > >
> >> > > > > Great feedback, this needs to be looked into. Could you please
> >> file a
> >> > > bug
> >> > > > > report, preferably with a tpr (and/or all inputs) as well as log
> >> files.
> >> > > > >
> >> > > > > Thanks,
> >> > > > >
> >> > > > > --
> >> > > > > Szilárd
> >> > > > >
> >> > > > >
> >> > > > > On Tue, Jan 15, 2013 at 3:50 AM, Floris Buelens <
> >> > > floris_buelens at yahoo.com>wrote:
> >> > > > >
> >> > > > >> Hi,
> >> > > > >>
> >> > > > >>
> >> > > > >> I'm seeing MD simulation running a lot slower with the sd
> >> integrator
> >> > > than
> >> > > > >> with md - ca. 10 vs. 30 ns/day for my 47000 atom system. I found
> >> no
> >> > > > >> documented indication that this should be the case.
> >> > > > >> Timings and logs pasted in below - wall time seems to be
> >> accumulating
> >> > > up
> >> > > > >> in Update and Rest, adding up to >60% of total. The effect is
> >> still
> >> > > there
> >> > > > >> without GPU, ca. 40% slowdown when switching from group to Verlet
> >> > > with the
> >> > > > >> SD integrator
> >> > > > >> System: Xeon E5-1620, 1x GTX 680, gromacs
> >> > > > >> 4.6-beta3-dev-20130107-e66851a-unknown, GCC 4.4.6 and 4.7.0
> >> > > > >>
> >> > > > >> I didn't file a bug report yet as I don't have much variety of
> >> testing
> >> > > > >> conditions available right now, I hope someone else has a moment
> >> to
> >> > > try to
> >> > > > >> reproduce?
> >> > > > >>
> >> > > > >> Timings:
> >> > > > >>
> >> > > > >> cpu (ns/day)
> >> > > > >> sd / verlet: 6
> >> > > > >> sd / group: 10
> >> > > > >> md / verlet: 9.2
> >> > > > >> md / group: 11.4
> >> > > > >>
> >> > > > >> gpu (ns/day)
> >> > > > >> sd / verlet: 11
> >> > > > >> md / verlet: 29.8
> >> > > > >>
> >> > > > >>
> >> > > > >>
> >> > > > >> **************MD integrator, GPU / verlet
> >> > > > >>
> >> > > > >> M E G A - F L O P S A C C O U N T I N G
> >> > > > >>
> >> > > > >> NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet
> >> kernels
> >> > > > >> RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
> >> > > > >> W3=SPC/TIP3p W4=TIP4p (single or pairs)
> >> > > > >> V&F=Potential and force V=Potential only F=Force only
> >> > > > >>
> >> > > > >> Computing: M-Number M-Flops % Flops
> >> > > > >>
> >> > > > >>
> >> > >
> >> -----------------------------------------------------------------------------
> >> > > > >> Pair Search distance check 1244.988096 11204.893 0.1
> >> > > > >> NxN QSTab Elec. + VdW [F] 194846.615488 7988711.235 91.9
> >> > > > >> NxN QSTab Elec. + VdW [V&F] 2009.923008 118585.457 1.4
> >> > > > >> 1,4 nonbonded interactions 31.616322 2845.469 0.0
> >> > > > >> Calc Weights 703.010574 25308.381 0.3
> >> > > > >> Spread Q Bspline 14997.558912 29995.118 0.3
> >> > > > >> Gather F Bspline 14997.558912 89985.353 1.0
> >> > > > >> 3D-FFT 47658.567884 381268.543 4.4
> >> > > > >> Solve PME 20.580896 1317.177 0.0
> >> > > > >> Shift-X 9.418458 56.511 0.0
> >> > > > >> Angles 21.879375 3675.735 0.0
> >> > > > >> Propers 48.599718 11129.335 0.1
> >> > > > >> Virial 23.498403 422.971 0.0
> >> > > > >> Stop-CM 2.436616 24.366 0.0
> >> > > > >> Calc-Ekin 93.809716 2532.862 0.0
> >> > > > >> Lincs 12.147284 728.837 0.0
> >> > > > >> Lincs-Mat 131.328750 525.315 0.0
> >> > > > >> Constraint-V 246.633614 1973.069 0.0
> >> > > > >> Constraint-Vir 23.486379 563.673 0.0
> >> > > > >> Settle 74.129451 23943.813 0.3
> >> > > > >>
> >> > > > >>
> >> > >
> >> -----------------------------------------------------------------------------
> >> > > > >> Total 8694798.114 100.0
> >> > > > >>
> >> > > > >>
> >> > >
> >> -----------------------------------------------------------------------------
> >> > > > >>
> >> > > > >>
> >> > > > >> R E A L C Y C L E A N D T I M E A C C O U N T I N G
> >> > > > >>
> >> > > > >> Computing: Nodes Th. Count Wall t (s) G-Cycles %
> >> > > > >>
> >> > > > >>
> >> > >
> >> -----------------------------------------------------------------------------
> >> > > > >> Neighbor search 1 8 201 0.944 27.206 3.3
> >> > > > >> Launch GPU ops. 1 8 5001 0.371 10.690 1.3
> >> > > > >> Force 1 8 5001 2.185 62.987 7.7
> >> > > > >> PME mesh 1 8 5001 15.033 433.441 52.9
> >> > > > >> Wait GPU local 1 8 5001 1.551 44.719 5.5
> >> > > > >> NB X/F buffer ops. 1 8 9801 0.538 15.499 1.9
> >> > > > >> Write traj. 1 8 2 0.725 20.912 2.6
> >> > > > >> Update 1 8 5001 2.318 66.826 8.2
> >> > > > >> Constraints 1 8 5001 2.898 83.551 10.2
> >> > > > >> Rest 1 1.832 52.828 6.5
> >> > > > >>
> >> > > > >>
> >> > >
> >> -----------------------------------------------------------------------------
> >> > > > >> Total 1 28.394 818.659 100.0
> >> > > > >>
> >> > > > >>
> >> > >
> >> -----------------------------------------------------------------------------
> >> > > > >>
> >> > > > >>
> >> > >
> >> -----------------------------------------------------------------------------
> >> > > > >> PME spread/gather 1 8 10002 8.745 252.144 30.8
> >> > > > >> PME 3D-FFT 1 8 10002 5.392 155.458 19.0
> >> > > > >> PME solve 1 8 5001 0.869 25.069 3.1
> >> > > > >>
> >> > > > >>
> >> > >
> >> -----------------------------------------------------------------------------
> >> > > > >>
> >> > > > >> GPU timings
> >> > > > >>
> >> > > > >>
> >> > >
> >> -----------------------------------------------------------------------------
> >> > > > >> Computing: Count Wall t (s) ms/step %
> >> > > > >>
> >> > > > >>
> >> > >
> >> -----------------------------------------------------------------------------
> >> > > > >> Pair list H2D 201 0.080 0.397 0.4
> >> > > > >> X / q H2D 5001 0.698 0.140 3.7
> >> > > > >> Nonbonded F kernel 4400 14.856 3.376 79.1
> >> > > > >> Nonbonded F+ene k. 400 1.667 4.167 8.9
> >> > > > >> Nonbonded F+prune k. 100 0.441 4.407 2.3
> >> > > > >> Nonbonded F+ene+prune k. 101 0.535 5.300 2.9
> >> > > > >> F D2H 5001 0.501 0.100 2.7
> >> > > > >>
> >> > > > >>
> >> > >
> >> -----------------------------------------------------------------------------
> >> > > > >> Total 18.778 3.755 100.0
> >> > > > >>
> >> > > > >>
> >> > >
> >> -----------------------------------------------------------------------------
> >> > > > >>
> >> > > > >> Force evaluation time GPU/CPU: 3.755 ms/3.443 ms = 1.091
> >> > > > >> For optimal performance this ratio should be close to 1!
> >> > > > >>
> >> > > > >>
> >> > > > >> Core t (s) Wall t (s) (%)
> >> > > > >> Time: 221.730 28.394 780.9
> >> > > > >> (ns/day) (hour/ns)
> >> > > > >> Performance: 30.435 0.789
> >> > > > >>
> >> > > > >>
> >> > > > >>
> >> > > > >>
> >> > > > >> *****************SD integrator, GPU / verlet
> >> > > > >>
> >> > > > >> M E G A - F L O P S A C C O U N T I N G
> >> > > > >>
> >> > > > >> NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet
> >> kernels
> >> > > > >> RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
> >> > > > >> W3=SPC/TIP3p W4=TIP4p (single or pairs)
> >> > > > >> V&F=Potential and force V=Potential only F=Force only
> >> > > > >>
> >> > > > >> Computing: M-Number M-Flops % Flops
> >> > > > >>
> >> > > > >>
> >> > >
> >> -----------------------------------------------------------------------------
> >> > > > >> Pair Search distance check 1254.604928 11291.444 0.1
> >> > > > >> NxN QSTab Elec. + VdW [F] 197273.059584 8088195.443 91.6
> >> > > > >> NxN QSTab Elec. + VdW [V&F] 2010.150784 118598.896 1.3
> >> > > > >> 1,4 nonbonded interactions 31.616322 2845.469 0.0
> >> > > > >> Calc Weights 703.010574 25308.381 0.3
> >> > > > >> Spread Q Bspline 14997.558912 29995.118 0.3
> >> > > > >> Gather F Bspline 14997.558912 89985.353 1.0
> >> > > > >> 3D-FFT 47473.892284 379791.138 4.3
> >> > > > >> Solve PME 20.488896 1311.289 0.0
> >> > > > >> Shift-X 9.418458 56.511 0.0
> >> > > > >> Angles 21.879375 3675.735 0.0
> >> > > > >> Propers 48.599718 11129.335 0.1
> >> > > > >> Virial 23.498403 422.971 0.0
> >> > > > >> Update 234.336858 7264.443 0.1
> >> > > > >> Stop-CM 2.436616 24.366 0.0
> >> > > > >> Calc-Ekin 93.809716 2532.862 0.0
> >> > > > >> Lincs 24.289712 1457.383 0.0
> >> > > > >> Lincs-Mat 262.605000 1050.420 0.0
> >> > > > >> Constraint-V 246.633614 1973.069 0.0
> >> > > > >> Constraint-Vir 23.486379 563.673 0.0
> >> > > > >> Settle 148.229268 47878.054 0.5
> >> > > > >>
> >> > > > >>
> >> > >
> >> -----------------------------------------------------------------------------
> >> > > > >> Total 8825351.354 100.0
> >> > > > >>
> >> > > > >>
> >> > >
> >> -----------------------------------------------------------------------------
> >> > > > >>
> >> > > > >>
> >> > > > >> R E A L C Y C L E A N D T I M E A C C O U N T I N G
> >> > > > >>
> >> > > > >> Computing: Nodes Th. Count Wall t (s) G-Cycles %
> >> > > > >>
> >> > > > >>
> >> > >
> >> -----------------------------------------------------------------------------
> >> > > > >> Neighbor search 1 8 201 0.945 27.212 1.2
> >> > > > >> Launch GPU ops. 1 8 5001 0.384 11.069 0.5
> >> > > > >> Force 1 8 5001 2.180 62.791 2.7
> >> > > > >> PME mesh 1 8 5001 15.029 432.967 18.5
> >> > > > >> Wait GPU local 1 8 5001 3.327 95.844 4.1
> >> > > > >> NB X/F buffer ops. 1 8 9801 0.542 15.628 0.7
> >> > > > >> Write traj. 1 8 2 0.749 21.582 0.9
> >> > > > >> Update 1 8 5001 28.044 807.908 34.5
> >> > > > >> Constraints 1 8 10002 5.562 160.243 6.8
> >> > > > >> Rest 1 24.488 705.458 30.1
> >> > > > >>
> >> > > > >>
> >> > >
> >> -----------------------------------------------------------------------------
> >> > > > >> Total 1 81.250 2340.701 100.0
> >> > > > >>
> >> > > > >>
> >> > >
> >> -----------------------------------------------------------------------------
> >> > > > >>
> >> > > > >>
> >> > >
> >> -----------------------------------------------------------------------------
> >> > > > >> PME spread/gather 1 8 10002 8.769 252.615 10.8
> >> > > > >> PME 3D-FFT 1 8 10002 5.367 154.630 6.6
> >> > > > >> PME solve 1 8 5001 0.865 24.910 1.1
> >> > > > >>
> >> > > > >>
> >> > >
> >> -----------------------------------------------------------------------------
> >> > > > >>
> >> > > > >> GPU timings
> >> > > > >>
> >> > > > >>
> >> > >
> >> -----------------------------------------------------------------------------
> >> > > > >> Computing: Count Wall t (s) ms/step %
> >> > > > >>
> >> > > > >>
> >> > >
> >> -----------------------------------------------------------------------------
> >> > > > >> Pair list H2D 201 0.080 0.398 0.4
> >> > > > >> X / q H2D 5001 0.699 0.140 3.4
> >> > > > >> Nonbonded F kernel 4400 16.271 3.698 79.6
> >> > > > >> Nonbonded F+ene k. 400 1.827 4.568 8.9
> >> > > > >> Nonbonded F+prune k. 100 0.482 4.816 2.4
> >> > > > >> Nonbonded F+ene+prune k. 101 0.584 5.787 2.9
> >> > > > >> F D2H 5001 0.505 0.101 2.5
> >> > > > >>
> >> > > > >>
> >> > >
> >> -----------------------------------------------------------------------------
> >> > > > >> Total 20.448 4.089 100.0
> >> > > > >>
> >> > > > >>
> >> > >
> >> -----------------------------------------------------------------------------
> >> > > > >>
> >> > > > >> Force evaluation time GPU/CPU: 4.089 ms/3.441 ms = 1.188
> >> > > > >> For optimal performance this ratio should be close to 1!
> >> > > > >>
> >> > > > >> Core t (s) Wall t (s) (%)
> >> > > > >> Time: 643.440 81.250 791.9
> >> > > > >> (ns/day) (hour/ns)
> >> > > > >> Performance: 10.636 2.256
> >> > > > >> --
> >> > > > >> gmx-users mailing list gmx-users at gromacs.org
> >> > > > >> http://lists.gromacs.org/mailman/listinfo/gmx-users
> >> > > > >> * Please search the archive at
> >> > > > >> http://www.gromacs.org/Support/Mailing_Lists/Search before
> >> posting!
> >> > > > >> * Please don't post (un)subscribe requests to the list. Use the
> >> > > > >> www interface or send it to gmx-users-request at gromacs.org.
> >> > > > >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >> > > > >>
> >> > > > > --
> >> > > > > gmx-users mailing list gmx-users at gromacs.org
> >> > > > > http://lists.gromacs.org/mailman/listinfo/gmx-users
> >> > > > > * Please search the archive at
> >> > > http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> >> > > > > * Please don't post (un)subscribe requests to the list. Use the
> >> > > > > www interface or send it to gmx-users-request at gromacs.org.
> >> > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >> > > > --
> >> > > > gmx-users mailing list gmx-users at gromacs.org
> >> > > > http://lists.gromacs.org/mailman/listinfo/gmx-users
> >> > > > * Please search the archive at
> >> > > http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> >> > > > * Please don't post (un)subscribe requests to the list. Use the
> >> > > > www interface or send it to gmx-users-request at gromacs.org.
> >> > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >> > > --
> >> > > gmx-users mailing list gmx-users at gromacs.org
> >> > > http://lists.gromacs.org/mailman/listinfo/gmx-users
> >> > > * Please search the archive at
> >> > > http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> >> > > * Please don't post (un)subscribe requests to the list. Use the
> >> > > www interface or send it to gmx-users-request at gromacs.org.
> >> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >> > >
> >> > --
> >> > gmx-users mailing list gmx-users at gromacs.org
> >> > http://lists.gromacs.org/mailman/listinfo/gmx-users
> >> > * Please search the archive at
> >> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> >> > * Please don't post (un)subscribe requests to the list. Use the
> >> > www interface or send it to gmx-users-request at gromacs.org.
> >> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >> --
> >> gmx-users mailing list gmx-users at gromacs.org
> >> http://lists.gromacs.org/mailman/listinfo/gmx-users
> >> * Please search the archive at
> >> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> >> * Please don't post (un)subscribe requests to the list. Use the
> >> www interface or send it to gmx-users-request at gromacs.org.
> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>
> > --
> > gmx-users mailing list gmx-users at gromacs.org
> > http://lists.gromacs.org/mailman/listinfo/gmx-users
> > * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> > * Please don't post (un)subscribe requests to the list. Use the
> > www interface or send it to gmx-users-request at gromacs.org.
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> --
> gmx-users mailing list gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists