[gmx-developers] 2018-beta1: Performance of membrane simulations

Jochen Hub jhub at gwdg.de
Mon Dec 4 18:44:47 CET 2017


Hi all,

first of all, many thanks for work on getting Magnus patch into 2018.

Here is an update on the benchmarks. Magnus patch gives an impressive 
speedup with Slipids (which use Urey-Bradley just like Charmm lipids).

http://cmb.bio.uni-goettingen.de/Slipids_bench_UBsimd.pdf

Interestingly, with a popular 1nm cutoff, the speed is still CPU-limited 
even at 12 cores (but not with 1.4nm cutoff). So moving bonded 
calculations to the GPU (in a future release) might bring even better 
performance.

Cheers,
Jochen

Am 01.12.17 um 23:44 schrieb Magnus Lundborg:
> Hi,
> 
> I guess it would be of general interest, so it could be worth 
> considering, but what goes into the release is not in my hands. However, 
> I think most people (probably including you) would not get quite as 
> dramatic gain as my system contains very little water, which means that 
> bonded interactions will be a large portion.
> 
> In the long run I think it would be good to both make UB SIMD optimised 
> and to move all bondeds to GPUs. But that's for later.
> 
> Cheers,
> 
> Magnus
> 
> Den 1 dec. 2017 23:21 skrev "Jochen Hub" <jhub at gwdg.de 
> <mailto:jhub at gwdg.de>>:
> 
>     Hi Magnus,
> 
>     many thanks, and impressive, that clarifies my question.
> 
>     Since UB has such as drastic effect on performance, maybe you can
>     convince the other developers to make an exception and get a UB-SIMD
>     patch into 2018?
> 
>     I understand, normally only bug fixes after beta release, but is 33%
>     performance loss (50% gain) not close to a bug?
> 
>     Cheers,
>     Jochen
> 
>     Am 01.12.17 um 22:54 schrieb Magnus Lundborg:
> 
>         Hi,
> 
>         I'm running simulations with the CHARMM forcefield, which also
>         uses UB and experienced similar things. Apparently the flops
>         count in the first table is not the actual time for the
>         calculations, if I understood the explanations correctly. So
>         it's the Force row in the second table that's bonded forces
>         (with long range and PME on GPU). So I tried making a SIMD
>         version of UB (only standard angles are SIMD optimised) and got
>         almost a 50% performance gain. Making also bonds using SIMD only
>         have an additional 1 or 2%. My patch is just a draft as it's not
>         clear what future SIMD functions should look like, but ill share
>         it with you so that you can try it. However, it won't be in the
>         next release, I guess.
> 
>         Cheers,
> 
>         Magnus
> 
> 
>         Den 1 dec. 2017 22:34 skrev "Jochen Hub" <jhub at gwdg.de
>         <mailto:jhub at gwdg.de> <mailto:jhub at gwdg.de <mailto:jhub at gwdg.de>>>:
> 
>              Dear developers,
> 
>              I started a thread in the user list yesterday (and Szilard
>         already
>              gave a quick answer) but I felt this point is relevant for the
>              developers list.
> 
>              We did some benchmarks with the 2018-beta1 with PME on the
>         GPU -
>              overall fantastic (!!) - we just don't understand the
>         performance of
>              lipid membrane simulations (Slipids or Charmm36, with UB
>              potentials). They contain roughly 50% lipid, 50% water
>         atoms. Please
>              see here:
> 
>         http://cmb.bio.uni-goettingen.de/bench.pdf
>         <http://cmb.bio.uni-goettingen.de/bench.pdf>
>              <http://cmb.bio.uni-goettingen.de/bench.pdf
>         <http://cmb.bio.uni-goettingen.de/bench.pdf>>
> 
>              As you see in the linked PDF, the Slipid simulations are
>         limited by
>              the CPU up to 10 (!) quite strong Xeon cores, when using a
>         GTX 1080.
>              Szilard pointed out that is is probably due to bonded UB
>              interactions - however, they make only 0.2% of the Flops,
>         see the
>              log output pasted below, for ntomp=4 or 10 (for 128 Slipids
>         system
>              with 1nm cutoff). The Flops-Summary is nearly the same for
>         ntomp=4
>              or 10, so only the ntomp=4 is shown below.
> 
>              In contrast, protein simulations (whether membrane protein
>         or purely
>              in water) behave as one hopes, showing that we can buy a
>         cheap CPU
>              when doing PME on the GPU.
> 
>              So my question is: Is this expected? Is this really due to
>              Urey-Bradley? Or maybe due to Constraints? In case that UB is
>              limiting, are there any plans to port this also onto the
>         GPU in the
>              future?
> 
>              This has also impact on hardware: Depending on whether you run
>              protein or membrane simulation, you need to buy different
>         hardware.
> 
>              Many thanks for any input, and many thanks again for the
>         fabulous
>              work on 2018!
> 
>              Jochen
> 
> 
>                Computing:                               M-Number       
>           M-Flops     % Flops
>             
>         -----------------------------------------------------------------------------
>                Pair Search distance check             151.929968       
>         1367.370         0.0
>                NxN Ewald Elec. + LJ [F]            157598.160192   
>         10401478.573        97.2
>                NxN Ewald Elec. + LJ [V&F]            1623.781504     
>         173744.621         1.6
>                1,4 nonbonded interactions             200.360064     
>           18032.406         0.2
>                Shift-X                                  1.553664       
>             9.322         0.0
>                Propers                                246.449280     
>           56436.885         0.5
>                Impropers                                1.280256       
>           266.293         0.0
>                Virial                                   7.657759       
>           137.840         0.0
>                Stop-CM                                  1.553664       
>            15.537         0.0
>                P-Coupling                               7.646464       
>            45.879         0.0
>                Calc-Ekin                               15.262464       
>           412.087         0.0
>                Lincs                                   74.894976       
>         4493.699         0.0
>                Lincs-Mat                             1736.027136       
>         6944.109         0.1
>                Constraint-V                           226.605312       
>         1812.842         0.0
>                Constraint-Vir                           7.614336       
>           182.744         0.0
>                Settle                                  25.605120       
>         8270.454         0.1
>                Urey-Bradley                           144.668928     
>           26474.414         0.2
>             
>         -----------------------------------------------------------------------------
>                Total                                               
>         10700125.072       100.0
>             
>         -----------------------------------------------------------------------------
> 
> 
>                    R E A L   C Y C L E   A N D   T I M E   A C C O U N T
>         I N G
> 
>              On 1 MPI rank, each using 4 OpenMP threads
> 
>                Computing:          Num   Num      Call    Wall time     
>                 Giga-Cycles
>                                    Ranks Threads  Count      (s)       
>           total
>              sum    %
>             
>         -----------------------------------------------------------------------------
>                Neighbor search        1    4         51       0.260     
>                 2.284   2.4
>                Launch GPU ops.        1    4      10002       0.591     
>                 5.191   5.4
>                Force                  1    4       5001       7.314     
>                 64.211  67.1
>                Wait PME GPU gather    1    4       5001       0.071     
>                 0.626   0.7
>                Reduce GPU PME F       1    4       5001       0.078     
>                 0.684   0.7
>                Wait GPU NB local      1    4       5001       0.017     
>                 0.151   0.2
>                NB X/F buffer ops.     1    4       9951       0.321     
>                 2.822   2.9
>                Write traj.            1    4          2       0.117     
>                 1.026   1.1
>                Update                 1    4       5001       0.199     
>                 1.749   1.8
>                Constraints            1    4       5001       1.853     
>                 16.270  17.0
>                Rest                                           0.085     
>                 0.743   0.8
>             
>         -----------------------------------------------------------------------------
>                Total                                         10.907     
>                 95.757 100.0
>             
>         -----------------------------------------------------------------------------
> 
>              ********************************
>              ****** 10 Open MP threads ******
>              ********************************
> 
>              On 1 MPI rank, each using 10 OpenMP threads
> 
>                Computing:          Num   Num      Call    Wall time     
>                 Giga-Cycles
>                                    Ranks Threads  Count      (s)       
>           total
>              sum    %
>             
>         -----------------------------------------------------------------------------
>                Neighbor search        1   10         51       0.120     
>                 2.625   2.3
>                Launch GPU ops.        1   10      10002       0.580     
>                 12.731  11.3
>                Force                  1   10       5001       2.999     
>                 65.828  58.4
>                Wait PME GPU gather    1   10       5001       0.066     
>                 1.459   1.3
>                Reduce GPU PME F       1   10       5001       0.045     
>                 0.980   0.9
>                Wait GPU NB local      1   10       5001       0.014     
>                 0.308   0.3
>                NB X/F buffer ops.     1   10       9951       0.157     
>                 3.453   3.1
>                Write traj.            1   10          2       0.147     
>                 3.224   2.9
>                Update                 1   10       5001       0.140     
>                 3.067   2.7
>                Constraints            1   10       5001       0.814     
>                 17.867  15.9
>                Rest                                           0.053     
>                 1.161   1.0
>             
>         -----------------------------------------------------------------------------
>                Total                                          5.135     
>               112.703 100.0
>             
>         -----------------------------------------------------------------------------
> 
> 
> 
>              --     ---------------------------------------------------
>              Dr. Jochen Hub
>              Computational Molecular Biophysics Group
>              Institute for Microbiology and Genetics
>              Georg-August-University of Göttingen
>              Justus-von-Liebig-Weg 11, 37077 Göttingen, Germany
>             
>         <https://maps.google.com/?q=Justus-von-Liebig-Weg+11,+37077+G%C3%B6ttingen,+Germany&entry=gmail&source=g
>         <https://maps.google.com/?q=Justus-von-Liebig-Weg+11,+37077+G%C3%B6ttingen,+Germany&entry=gmail&source=g>>.
>              Phone: +49-551-39-14189 <tel:%2B49-551-39-14189>
>         <tel:%2B49-551-39-14189>
>         http://cmb.bio.uni-goettingen.de/
>         <http://cmb.bio.uni-goettingen.de/>
>         <http://cmb.bio.uni-goettingen.de/
>         <http://cmb.bio.uni-goettingen.de/>>
>              ---------------------------------------------------
>              --     Gromacs Developers mailing list
> 
>              * Please search the archive at
>         http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List
>         <http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List>
>             
>         <http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List <http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List>>
>              before posting!
> 
>              * Can't post? Read
>         http://www.gromacs.org/Support/Mailing_Lists
>         <http://www.gromacs.org/Support/Mailing_Lists>
>              <http://www.gromacs.org/Support/Mailing_Lists
>         <http://www.gromacs.org/Support/Mailing_Lists>>
> 
>              * For (un)subscribe requests visit
>         https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>         <https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers>
>             
>         <https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>         <https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers>>
>              or send a mail to gmx-developers-request at gromacs.org
>         <mailto:gmx-developers-request at gromacs.org>
>              <mailto:gmx-developers-request at gromacs.org
>         <mailto:gmx-developers-request at gromacs.org>>.
> 
> 
> 
> 
> 
>     -- 
>     ---------------------------------------------------
>     Dr. Jochen Hub
>     Computational Molecular Biophysics Group
>     Institute for Microbiology and Genetics
>     Georg-August-University of Göttingen
>     Justus-von-Liebig-Weg 11, 37077 Göttingen, Germany.
>     Phone: +49-551-39-14189 <tel:%2B49-551-39-14189>
>     http://cmb.bio.uni-goettingen.de/ <http://cmb.bio.uni-goettingen.de/>
>     ---------------------------------------------------
>     -- 
>     Gromacs Developers mailing list
> 
>     * Please search the archive at
>     http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List
>     <http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List>
>     before posting!
> 
>     * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>     <http://www.gromacs.org/Support/Mailing_Lists>
> 
>     * For (un)subscribe requests visit
>     https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>     <https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers>
>     or send a mail to gmx-developers-request at gromacs.org
>     <mailto:gmx-developers-request at gromacs.org>.
> 
> 
> 

-- 
---------------------------------------------------
Dr. Jochen Hub
Computational Molecular Biophysics Group
Institute for Microbiology and Genetics
Georg-August-University of Göttingen
Justus-von-Liebig-Weg 11, 37077 Göttingen, Germany.
Phone: +49-551-39-14189
http://cmb.bio.uni-goettingen.de/
---------------------------------------------------


More information about the gromacs.org_gmx-developers mailing list