[gmx-users] low performance 2 GTX 980+ Intel CPU Core i7-5930K 3.5 GHz (2011-3)
Justin Lemkul
jalemkul at vt.edu
Wed Dec 31 17:38:34 CET 2014
On 12/31/14 10:46 AM, Carlos Navarro Retamal wrote:
> Dear everyone,
> In order to check if my workstation were able to work with bigger systems, i ran a md simulation of a system of 265175 atoms, but sadly this was it performance with one GPU:
>
>> P P - P M E L O A D B A L A N C I N G
>>
>> PP/PME load balancing changed the cut-off and PME settings:
>> particle-particle PME
>> rcoulomb rlist grid spacing 1/beta
>> initial 1.400 nm 1.451 nm 96 96 84 0.156 nm 0.448 nm
>> final 1.464 nm 1.515 nm 84 84 80 0.167 nm 0.469 nm
>> cost-ratio 1.14 0.73
>> (note that these numbers concern only part of the total PP and PME load)
>>
>>
>> M E G A - F L O P S A C C O U N T I N G
>>
>> NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels
>> RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
>> W3=SPC/TIP3p W4=TIP4p (single or pairs)
>> V&F=Potential and force V=Potential only F=Force only
>>
>> Computing: M-Number M-Flops % Flops
>> -----------------------------------------------------------------------------
>> NB VdW [V&F] 9330.786612 9330.787 0.0
>> Pair Search distance check 60538.981664 544850.835 0.0
>> NxN Ewald Elec. + LJ [F] 23126654.798080 1526359216.673 96.9
>> NxN Ewald Elec. + LJ [V&F] 234136.147904 25052567.826 1.6
>> 1,4 nonbonded interactions 13156.663128 1184099.682 0.1
>> Calc Weights 39777.045525 1431973.639 0.1
>> Spread Q Bspline 848576.971200 1697153.942 0.1
>> Gather F Bspline 848576.971200 5091461.827 0.3
>> 3D-FFT 1079386.516464 8635092.132 0.5
>> Solve PME 353.070736 22596.527 0.0
>> Shift-X 331.733925 1990.404 0.0
>> Propers 13320.966414 3050501.309 0.2
>> Impropers 340.306806 70783.816 0.0
>> Virial 1326.365220 23874.574 0.0
>> Stop-CM 133.117850 1331.178 0.0
>> Calc-Ekin 2652.280350 71611.569 0.0
>> Lincs 4966.549329 297992.960 0.0
>> Lincs-Mat 111969.439344 447877.757 0.0
>> Constraint-V 18222.114435 145776.915 0.0
>> Constraint-Vir 1325.795106 31819.083 0.0
>> Settle 2763.005259 892450.699 0.1
>> (null) 116.802336 0.000 0.0
>> -----------------------------------------------------------------------------
>> Total 1575064354.133 100.0
>> -----------------------------------------------------------------------------
>>
>>
>> R E A L C Y C L E A N D T I M E A C C O U N T I N G
>>
>> On 1 MPI rank, each using 12 OpenMP threads
>>
>> Computing: Num Num Call Wall time Giga-Cycles
>> Ranks Threads Count (s) total sum %
>> -----------------------------------------------------------------------------
>> Neighbor search 1 12 1251 27.117 1138.913 2.4
>> Launch GPU ops. 1 12 50001 5.444 228.653 0.5
>> Force 1 12 50001 390.693 16409.109 34.0
>> PME mesh 1 12 50001 443.170 18613.138 38.5
>> Wait GPU local 1 12 50001 8.133 341.590 0.7
>> NB X/F buffer ops. 1 12 98751 30.272 1271.429 2.6
>> Write traj. 1 12 12 1.148 48.198 0.1
>> Update 1 12 50001 63.980 2687.175 5.6
>> Constraints 1 12 50001 124.709 5237.788 10.8
>> Rest 55.169 2317.087 4.8
>> -----------------------------------------------------------------------------
>> Total 1149.836 48293.079 100.0
>> -----------------------------------------------------------------------------
>> Breakdown of PME mesh computation
>> -----------------------------------------------------------------------------
>> PME spread/gather 1 12 100002 358.298 15048.493 31.2
>> PME 3D-FFT 1 12 100002 78.270 3287.334 6.8
>> PME solve Elec 1 12 50001 6.221 261.268 0.5
>> -----------------------------------------------------------------------------
>>
>> GPU timings
>> -----------------------------------------------------------------------------
>> Computing: Count Wall t (s) ms/step %
>> -----------------------------------------------------------------------------
>> Pair list H2D 1251 3.975 3.178 0.5
>> X / q H2D 50001 36.248 0.725 4.6
>> Nonbonded F kernel 45000 618.354 13.741 78.7
>> Nonbonded F+ene k. 3750 72.721 19.392 9.3
>> Nonbonded F+ene+prune k. 1251 28.993 23.176 3.7
>> F D2H 50001 25.267 0.505 3.2
>> -----------------------------------------------------------------------------
>> Total 785.559 15.711 100.0
>> -----------------------------------------------------------------------------
>>
>> Force evaluation time GPU/CPU: 15.711 ms/16.677 ms = 0.942
>> For optimal performance this ratio should be close to 1!
>>
>> Core t (s) Wall t (s) (%)
>> Time: 13663.176 1149.836 1188.3
>> (ns/day) (hour/ns)
>> Performance: 7.514 3.194
>> Finished mdrun on rank 0 Wed Dec 31 01:44:22 2014
>
>
This is consistent with what we see for similarly sized systems.
>
>
>
> i also noticed this at the beginning:
>
>> step 80: timed with pme grid 96 96 84, coulomb cutoff 1.400: 3287.7 M-cycles
>> step 160: timed with pme grid 84 84 80, coulomb cutoff 1.464: 3180.2 M-cycles
>> step 240: timed with pme grid 72 72 72, coulomb cutoff 1.708: 3948.2 M-cycles
>> step 320: timed with pme grid 96 96 84, coulomb cutoff 1.400: 3319.4 M-cycles
>> step 400: timed with pme grid 96 96 80, coulomb cutoff 1.435: 3213.8 M-cycles
>> step 480: timed with pme grid 84 84 80, coulomb cutoff 1.464: 3194.6 M-cycles
>> step 560: timed with pme grid 80 80 80, coulomb cutoff 1.537: 3343.4 M-cycles
>> step 640: timed with pme grid 80 80 72, coulomb cutoff 1.594: 3571.9 M-cycles
>> optimal pme grid 84 84 80, coulomb cutoff 1.464
This is normal tuning by mdrun to get you optimal performance.
>> Step Time Lambda
>> 5000 10.00000 0.00000
>
>
>
> and when i add the second graphic card the performance drop again to about 5-6ns/day
>
> This performance is really weird, because i got about ~5 ns/day in a different work station (GTX 770 and a i7-4770).
> Is there something i’m missing regarding the correct use of a second GPU?
Note the following (from above):
"Force evaluation time GPU/CPU: 15.711 ms/16.677 ms = 0.942"
Adding another GPU won't help you. Your bottleneck is the CPU.
-Justin
--
==================================================
Justin A. Lemkul, Ph.D.
Ruth L. Kirschstein NRSA Postdoctoral Fellow
Department of Pharmaceutical Sciences
School of Pharmacy
Health Sciences Facility II, Room 629
University of Maryland, Baltimore
20 Penn St.
Baltimore, MD 21201
jalemkul at outerbanks.umaryland.edu | (410) 706-7441
http://mackerell.umaryland.edu/~jalemkul
==================================================
More information about the gromacs.org_gmx-users
mailing list