[gmx-developers] 2018-beta1: Performance of membrane simulations
Magnus Lundborg
magnus.lundborg at scilifelab.se
Fri Dec 1 22:54:19 CET 2017
Hi,
I'm running simulations with the CHARMM forcefield, which also uses UB and
experienced similar things. Apparently the flops count in the first table
is not the actual time for the calculations, if I understood the
explanations correctly. So it's the Force row in the second table that's
bonded forces (with long range and PME on GPU). So I tried making a SIMD
version of UB (only standard angles are SIMD optimised) and got almost a
50% performance gain. Making also bonds using SIMD only have an additional
1 or 2%. My patch is just a draft as it's not clear what future SIMD
functions should look like, but ill share it with you so that you can try
it. However, it won't be in the next release, I guess.
Cheers,
Magnus
Den 1 dec. 2017 22:34 skrev "Jochen Hub" <jhub at gwdg.de>:
Dear developers,
I started a thread in the user list yesterday (and Szilard already gave a
quick answer) but I felt this point is relevant for the developers list.
We did some benchmarks with the 2018-beta1 with PME on the GPU - overall
fantastic (!!) - we just don't understand the performance of lipid membrane
simulations (Slipids or Charmm36, with UB potentials). They contain roughly
50% lipid, 50% water atoms. Please see here:
http://cmb.bio.uni-goettingen.de/bench.pdf
As you see in the linked PDF, the Slipid simulations are limited by the CPU
up to 10 (!) quite strong Xeon cores, when using a GTX 1080. Szilard
pointed out that is is probably due to bonded UB interactions - however,
they make only 0.2% of the Flops, see the log output pasted below, for
ntomp=4 or 10 (for 128 Slipids system with 1nm cutoff). The Flops-Summary
is nearly the same for ntomp=4 or 10, so only the ntomp=4 is shown below.
In contrast, protein simulations (whether membrane protein or purely in
water) behave as one hopes, showing that we can buy a cheap CPU when doing
PME on the GPU.
So my question is: Is this expected? Is this really due to Urey-Bradley? Or
maybe due to Constraints? In case that UB is limiting, are there any plans
to port this also onto the GPU in the future?
This has also impact on hardware: Depending on whether you run protein or
membrane simulation, you need to buy different hardware.
Many thanks for any input, and many thanks again for the fabulous work on
2018!
Jochen
Computing: M-Number M-Flops % Flops
------------------------------------------------------------
-----------------
Pair Search distance check 151.929968 1367.370 0.0
NxN Ewald Elec. + LJ [F] 157598.160192 10401478.573 97.2
NxN Ewald Elec. + LJ [V&F] 1623.781504 173744.621 1.6
1,4 nonbonded interactions 200.360064 18032.406 0.2
Shift-X 1.553664 9.322 0.0
Propers 246.449280 56436.885 0.5
Impropers 1.280256 266.293 0.0
Virial 7.657759 137.840 0.0
Stop-CM 1.553664 15.537 0.0
P-Coupling 7.646464 45.879 0.0
Calc-Ekin 15.262464 412.087 0.0
Lincs 74.894976 4493.699 0.0
Lincs-Mat 1736.027136 6944.109 0.1
Constraint-V 226.605312 1812.842 0.0
Constraint-Vir 7.614336 182.744 0.0
Settle 25.605120 8270.454 0.1
Urey-Bradley 144.668928 26474.414 0.2
------------------------------------------------------------
-----------------
Total 10700125.072 100.0
------------------------------------------------------------
-----------------
R E A L C Y C L E A N D T I M E A C C O U N T I N G
On 1 MPI rank, each using 4 OpenMP threads
Computing: Num Num Call Wall time Giga-Cycles
Ranks Threads Count (s) total sum %
------------------------------------------------------------
-----------------
Neighbor search 1 4 51 0.260 2.284 2.4
Launch GPU ops. 1 4 10002 0.591 5.191 5.4
Force 1 4 5001 7.314 64.211 67.1
Wait PME GPU gather 1 4 5001 0.071 0.626 0.7
Reduce GPU PME F 1 4 5001 0.078 0.684 0.7
Wait GPU NB local 1 4 5001 0.017 0.151 0.2
NB X/F buffer ops. 1 4 9951 0.321 2.822 2.9
Write traj. 1 4 2 0.117 1.026 1.1
Update 1 4 5001 0.199 1.749 1.8
Constraints 1 4 5001 1.853 16.270 17.0
Rest 0.085 0.743 0.8
------------------------------------------------------------
-----------------
Total 10.907 95.757 100.0
------------------------------------------------------------
-----------------
********************************
****** 10 Open MP threads ******
********************************
On 1 MPI rank, each using 10 OpenMP threads
Computing: Num Num Call Wall time Giga-Cycles
Ranks Threads Count (s) total sum %
------------------------------------------------------------
-----------------
Neighbor search 1 10 51 0.120 2.625 2.3
Launch GPU ops. 1 10 10002 0.580 12.731 11.3
Force 1 10 5001 2.999 65.828 58.4
Wait PME GPU gather 1 10 5001 0.066 1.459 1.3
Reduce GPU PME F 1 10 5001 0.045 0.980 0.9
Wait GPU NB local 1 10 5001 0.014 0.308 0.3
NB X/F buffer ops. 1 10 9951 0.157 3.453 3.1
Write traj. 1 10 2 0.147 3.224 2.9
Update 1 10 5001 0.140 3.067 2.7
Constraints 1 10 5001 0.814 17.867 15.9
Rest 0.053 1.161 1.0
------------------------------------------------------------
-----------------
Total 5.135 112.703 100.0
------------------------------------------------------------
-----------------
--
---------------------------------------------------
Dr. Jochen Hub
Computational Molecular Biophysics Group
Institute for Microbiology and Genetics
Georg-August-University of Göttingen
Justus-von-Liebig-Weg 11, 37077 Göttingen, Germany
<https://maps.google.com/?q=Justus-von-Liebig-Weg+11,+37077+G%C3%B6ttingen,+Germany&entry=gmail&source=g>
.
Phone: +49-551-39-14189
http://cmb.bio.uni-goettingen.de/
---------------------------------------------------
--
Gromacs Developers mailing list
* Please search the archive at http://www.gromacs.org/Support
/Mailing_Lists/GMX-developers_List before posting!
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers or
send a mail to gmx-developers-request at gromacs.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20171201/f11ca4fb/attachment-0001.html>
More information about the gromacs.org_gmx-developers
mailing list