[gmx-users] low performance 2 GTX 980+ Intel CPU Core i7-5930K 3.5 GHz (2011-3)
Carlos Navarro Retamal
carlos.navarro87 at gmail.com
Wed Dec 31 16:46:41 CET 2014
Dear everyone,
In order to check if my workstation were able to work with bigger systems, i ran a md simulation of a system of 265175 atoms, but sadly this was it performance with one GPU:
> P P - P M E L O A D B A L A N C I N G
>
> PP/PME load balancing changed the cut-off and PME settings:
> particle-particle PME
> rcoulomb rlist grid spacing 1/beta
> initial 1.400 nm 1.451 nm 96 96 84 0.156 nm 0.448 nm
> final 1.464 nm 1.515 nm 84 84 80 0.167 nm 0.469 nm
> cost-ratio 1.14 0.73
> (note that these numbers concern only part of the total PP and PME load)
>
>
> M E G A - F L O P S A C C O U N T I N G
>
> NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels
> RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
> W3=SPC/TIP3p W4=TIP4p (single or pairs)
> V&F=Potential and force V=Potential only F=Force only
>
> Computing: M-Number M-Flops % Flops
> -----------------------------------------------------------------------------
> NB VdW [V&F] 9330.786612 9330.787 0.0
> Pair Search distance check 60538.981664 544850.835 0.0
> NxN Ewald Elec. + LJ [F] 23126654.798080 1526359216.673 96.9
> NxN Ewald Elec. + LJ [V&F] 234136.147904 25052567.826 1.6
> 1,4 nonbonded interactions 13156.663128 1184099.682 0.1
> Calc Weights 39777.045525 1431973.639 0.1
> Spread Q Bspline 848576.971200 1697153.942 0.1
> Gather F Bspline 848576.971200 5091461.827 0.3
> 3D-FFT 1079386.516464 8635092.132 0.5
> Solve PME 353.070736 22596.527 0.0
> Shift-X 331.733925 1990.404 0.0
> Propers 13320.966414 3050501.309 0.2
> Impropers 340.306806 70783.816 0.0
> Virial 1326.365220 23874.574 0.0
> Stop-CM 133.117850 1331.178 0.0
> Calc-Ekin 2652.280350 71611.569 0.0
> Lincs 4966.549329 297992.960 0.0
> Lincs-Mat 111969.439344 447877.757 0.0
> Constraint-V 18222.114435 145776.915 0.0
> Constraint-Vir 1325.795106 31819.083 0.0
> Settle 2763.005259 892450.699 0.1
> (null) 116.802336 0.000 0.0
> -----------------------------------------------------------------------------
> Total 1575064354.133 100.0
> -----------------------------------------------------------------------------
>
>
> R E A L C Y C L E A N D T I M E A C C O U N T I N G
>
> On 1 MPI rank, each using 12 OpenMP threads
>
> Computing: Num Num Call Wall time Giga-Cycles
> Ranks Threads Count (s) total sum %
> -----------------------------------------------------------------------------
> Neighbor search 1 12 1251 27.117 1138.913 2.4
> Launch GPU ops. 1 12 50001 5.444 228.653 0.5
> Force 1 12 50001 390.693 16409.109 34.0
> PME mesh 1 12 50001 443.170 18613.138 38.5
> Wait GPU local 1 12 50001 8.133 341.590 0.7
> NB X/F buffer ops. 1 12 98751 30.272 1271.429 2.6
> Write traj. 1 12 12 1.148 48.198 0.1
> Update 1 12 50001 63.980 2687.175 5.6
> Constraints 1 12 50001 124.709 5237.788 10.8
> Rest 55.169 2317.087 4.8
> -----------------------------------------------------------------------------
> Total 1149.836 48293.079 100.0
> -----------------------------------------------------------------------------
> Breakdown of PME mesh computation
> -----------------------------------------------------------------------------
> PME spread/gather 1 12 100002 358.298 15048.493 31.2
> PME 3D-FFT 1 12 100002 78.270 3287.334 6.8
> PME solve Elec 1 12 50001 6.221 261.268 0.5
> -----------------------------------------------------------------------------
>
> GPU timings
> -----------------------------------------------------------------------------
> Computing: Count Wall t (s) ms/step %
> -----------------------------------------------------------------------------
> Pair list H2D 1251 3.975 3.178 0.5
> X / q H2D 50001 36.248 0.725 4.6
> Nonbonded F kernel 45000 618.354 13.741 78.7
> Nonbonded F+ene k. 3750 72.721 19.392 9.3
> Nonbonded F+ene+prune k. 1251 28.993 23.176 3.7
> F D2H 50001 25.267 0.505 3.2
> -----------------------------------------------------------------------------
> Total 785.559 15.711 100.0
> -----------------------------------------------------------------------------
>
> Force evaluation time GPU/CPU: 15.711 ms/16.677 ms = 0.942
> For optimal performance this ratio should be close to 1!
>
> Core t (s) Wall t (s) (%)
> Time: 13663.176 1149.836 1188.3
> (ns/day) (hour/ns)
> Performance: 7.514 3.194
> Finished mdrun on rank 0 Wed Dec 31 01:44:22 2014
i also noticed this at the beginning:
> step 80: timed with pme grid 96 96 84, coulomb cutoff 1.400: 3287.7 M-cycles
> step 160: timed with pme grid 84 84 80, coulomb cutoff 1.464: 3180.2 M-cycles
> step 240: timed with pme grid 72 72 72, coulomb cutoff 1.708: 3948.2 M-cycles
> step 320: timed with pme grid 96 96 84, coulomb cutoff 1.400: 3319.4 M-cycles
> step 400: timed with pme grid 96 96 80, coulomb cutoff 1.435: 3213.8 M-cycles
> step 480: timed with pme grid 84 84 80, coulomb cutoff 1.464: 3194.6 M-cycles
> step 560: timed with pme grid 80 80 80, coulomb cutoff 1.537: 3343.4 M-cycles
> step 640: timed with pme grid 80 80 72, coulomb cutoff 1.594: 3571.9 M-cycles
> optimal pme grid 84 84 80, coulomb cutoff 1.464
> Step Time Lambda
> 5000 10.00000 0.00000
and when i add the second graphic card the performance drop again to about 5-6ns/day
This performance is really weird, because i got about ~5 ns/day in a different work station (GTX 770 and a i7-4770).
Is there something i’m missing regarding the correct use of a second GPU?
Kind regards and a happy new year to everyone,
Carlos
--
Carlos Navarro Retamal
Bioinformatic engineer
Ph.D(c) in Applied Science, Universidad de Talca, Chile
Center of Bioinformatics and Molecular Simulations (CBSM)
Universidad de Talca
2 Norte 685, Casilla 721, Talca - Chile
Teléfono: 56-71-201 798,
Fax: 56-71-201 561
Email: carlos.navarro87 at gmail.com or cnavarro at utalca.cl
On Tuesday, December 30, 2014 at 4:58 PM, Carlos Navarro Retamal wrote:
> Dear Justin (and everyone)
> I tried using -pin parameters as following:
> mdrun -nt 6 -pin on -pinoffset 0 -gpu_id 0 -deffnm test1 &
> mdrun -nt 6 -pin on -pinoffset 7 -gpu_id 1 -deffnm test2 &
>
> The performance increase a little bit (from ~17ns/day to 22ns/day) but i got the same warning message:
>
> > Force evaluation time GPU/CPU: 2.206 ms/4.462 ms = 0.494
> > For optimal performance this ratio should be close to 1!
> >
> >
> > NOTE: The GPU has >25% less load than the CPU. This imbalance causes
> > performance loss.
> >
> > Core t (s) Wall t (s) (%)
> > Time: 2301.995 386.857 595.1
> > (ns/day) (hour/ns)
> > Performance: 22.334 1.075
> > Finished mdrun on rank 0 Tue Dec 30 16:49:14 2014
> >
>
>
>
> looking also into the nvidia settings, i saw that when i run only one mdrun process, the respective graphic card that is working on it showed a ~50% of performance, but when i run a new mdrun instance, the performance drop to about ~30% on each each graphic card ( i set a maximum performance mode).
> Do you think this may be the problem?. and if this IS the issue, is there a way to solve it?
> Kind regards,
> Carlos
>
> --
> Carlos Navarro Retamal
> Bioinformatic engineer
> Ph.D(c) in Applied Science, Universidad de Talca, Chile
> Center of Bioinformatics and Molecular Simulations (CBSM)
> Universidad de Talca
> 2 Norte 685, Casilla 721, Talca - Chile
> Teléfono: 56-71-201 798,
> Fax: 56-71-201 561
> Email: carlos.navarro87 at gmail.com (mailto:carlos.navarro87 at gmail.com) or cnavarro at utalca.cl (mailto:cnavarro at utalca.cl)
>
>
> On Tuesday, December 30, 2014 at 4:03 PM, Carlos Navarro Retamal wrote:
>
> > Dear Justin,
> > Thanks a lot for your reply.
> > I tried with another system (~130k with the same results) 1GPU > 2 GPU’s
> > In any case i’ll read the documentation you mention on your previous reply (and hopefully i’ll be able to run 2 process simultaneously .
> > Kind regards,
> > Carlos
> >
> >
> > --
> > Carlos Navarro Retamal
> > Bioinformatic engineer
> > Ph.D(c) in Applied Science, Universidad de Talca, Chile
> > Center of Bioinformatics and Molecular Simulations (CBSM)
> > Universidad de Talca
> > 2 Norte 685, Casilla 721, Talca - Chile
> > Teléfono: 56-71-201 798,
> > Fax: 56-71-201 561
> > Email: carlos.navarro87 at gmail.com (mailto:carlos.navarro87 at gmail.com) or cnavarro at utalca.cl (mailto:cnavarro at utalca.cl)
> >
> >
> > On Tuesday, December 30, 2014 at 3:40 PM, Justin Lemkul wrote:
> >
> > >
> > >
> > > On 12/30/14 1:38 PM, Carlos Navarro Retamal wrote:
> > > > Dear Justin,
> > > > Thanks a lot for your reply.
> > > >
> > > > > You can use the multiple cards to run
> > > > > concurrent simulations on each (provided cooling is adequate to do this).
> > > > >
> > > >
> > > >
> > > >
> > > >
> > > > I tried that. I launched 2 simulation at the same time, but on each i got the next warning at the end:
> > > >
> > > > > Force evaluation time GPU/CPU: 3.177 ms/5.804 ms = 0.547
> > > > > For optimal performance this ratio should be close to 1!
> > > > >
> > > > >
> > > > > NOTE: The GPU has >25% less load than the CPU. This imbalance cause
> > > > > performance loss.
> > > > >
> > > >
> > > >
> > > > and i got really low performance ( ~ 17ns/day each)
> > > > using the following commands:
> > > >
> > > > > mdrun -deffnm test1 -gpu_id 0 -v
> > > > > mdrun -deffnm test2 -gpu_id 1 -v
> > > > >
> > > >
> > > >
> > > >
> > > >
> > > > Is there a better way to perform multiple md simulations at the same time?
> > >
> > > In this case, both GPU are probably fighting for CPU resources (note how the CPU
> > > force evaluation is the limiting factor). You'll need to set -pin and
> > > -pinoffset suitably, IIRC. See discussion at
> > >
> > > http://www.gromacs.org/Documentation/Acceleration_and_parallelization
> > >
> > > -Justin
> > >
> > > --
> > > ==================================================
> > >
> > > Justin A. Lemkul, Ph.D.
> > > Ruth L. Kirschstein NRSA Postdoctoral Fellow
> > >
> > > Department of Pharmaceutical Sciences
> > > School of Pharmacy
> > > Health Sciences Facility II, Room 629
> > > University of Maryland, Baltimore
> > > 20 Penn St.
> > > Baltimore, MD 21201
> > >
> > > jalemkul at outerbanks.umaryland.edu (mailto:jalemkul at outerbanks.umaryland.edu) | (410) 706-7441
> > > http://mackerell.umaryland.edu/~jalemkul
> > >
> > > ==================================================
> > > --
> > > Gromacs Users mailing list
> > >
> > > * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
> > >
> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >
> > > * For (un)subscribe requests visit
> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org (mailto:gmx-users-request at gromacs.org).
> > >
> > >
> > >
> >
> >
>
More information about the gromacs.org_gmx-users
mailing list