[gmx-developers] 2018-beta1: Performance of membrane simulations

Magnus Lundborg magnus.lundborg at scilifelab.se
Fri Dec 1 23:44:46 CET 2017


Hi,

I guess it would be of general interest, so it could be worth considering,
but what goes into the release is not in my hands. However, I think most
people (probably including you) would not get quite as dramatic gain as my
system contains very little water, which means that bonded interactions
will be a large portion.

In the long run I think it would be good to both make UB SIMD optimised and
to move all bondeds to GPUs. But that's for later.

Cheers,

Magnus

Den 1 dec. 2017 23:21 skrev "Jochen Hub" <jhub at gwdg.de>:

> Hi Magnus,
>
> many thanks, and impressive, that clarifies my question.
>
> Since UB has such as drastic effect on performance, maybe you can convince
> the other developers to make an exception and get a UB-SIMD patch into 2018?
>
> I understand, normally only bug fixes after beta release, but is 33%
> performance loss (50% gain) not close to a bug?
>
> Cheers,
> Jochen
>
> Am 01.12.17 um 22:54 schrieb Magnus Lundborg:
>
>> Hi,
>>
>> I'm running simulations with the CHARMM forcefield, which also uses UB
>> and experienced similar things. Apparently the flops count in the first
>> table is not the actual time for the calculations, if I understood the
>> explanations correctly. So it's the Force row in the second table that's
>> bonded forces (with long range and PME on GPU). So I tried making a SIMD
>> version of UB (only standard angles are SIMD optimised) and got almost a
>> 50% performance gain. Making also bonds using SIMD only have an additional
>> 1 or 2%. My patch is just a draft as it's not clear what future SIMD
>> functions should look like, but ill share it with you so that you can try
>> it. However, it won't be in the next release, I guess.
>>
>> Cheers,
>>
>> Magnus
>>
>>
>> Den 1 dec. 2017 22:34 skrev "Jochen Hub" <jhub at gwdg.de <mailto:
>> jhub at gwdg.de>>:
>>
>>     Dear developers,
>>
>>     I started a thread in the user list yesterday (and Szilard already
>>     gave a quick answer) but I felt this point is relevant for the
>>     developers list.
>>
>>     We did some benchmarks with the 2018-beta1 with PME on the GPU -
>>     overall fantastic (!!) - we just don't understand the performance of
>>     lipid membrane simulations (Slipids or Charmm36, with UB
>>     potentials). They contain roughly 50% lipid, 50% water atoms. Please
>>     see here:
>>
>>     http://cmb.bio.uni-goettingen.de/bench.pdf
>>     <http://cmb.bio.uni-goettingen.de/bench.pdf>
>>
>>     As you see in the linked PDF, the Slipid simulations are limited by
>>     the CPU up to 10 (!) quite strong Xeon cores, when using a GTX 1080.
>>     Szilard pointed out that is is probably due to bonded UB
>>     interactions - however, they make only 0.2% of the Flops, see the
>>     log output pasted below, for ntomp=4 or 10 (for 128 Slipids system
>>     with 1nm cutoff). The Flops-Summary is nearly the same for ntomp=4
>>     or 10, so only the ntomp=4 is shown below.
>>
>>     In contrast, protein simulations (whether membrane protein or purely
>>     in water) behave as one hopes, showing that we can buy a cheap CPU
>>     when doing PME on the GPU.
>>
>>     So my question is: Is this expected? Is this really due to
>>     Urey-Bradley? Or maybe due to Constraints? In case that UB is
>>     limiting, are there any plans to port this also onto the GPU in the
>>     future?
>>
>>     This has also impact on hardware: Depending on whether you run
>>     protein or membrane simulation, you need to buy different hardware.
>>
>>     Many thanks for any input, and many thanks again for the fabulous
>>     work on 2018!
>>
>>     Jochen
>>
>>
>>       Computing:                               M-Number         M-Flops
>>    % Flops
>>     ------------------------------------------------------------
>> -----------------
>>       Pair Search distance check             151.929968        1367.370
>>        0.0
>>       NxN Ewald Elec. + LJ [F]            157598.160192    10401478.573
>>       97.2
>>       NxN Ewald Elec. + LJ [V&F]            1623.781504      173744.621
>>        1.6
>>       1,4 nonbonded interactions             200.360064       18032.406
>>        0.2
>>       Shift-X                                  1.553664           9.322
>>        0.0
>>       Propers                                246.449280       56436.885
>>        0.5
>>       Impropers                                1.280256         266.293
>>        0.0
>>       Virial                                   7.657759         137.840
>>        0.0
>>       Stop-CM                                  1.553664          15.537
>>        0.0
>>       P-Coupling                               7.646464          45.879
>>        0.0
>>       Calc-Ekin                               15.262464         412.087
>>        0.0
>>       Lincs                                   74.894976        4493.699
>>        0.0
>>       Lincs-Mat                             1736.027136        6944.109
>>        0.1
>>       Constraint-V                           226.605312        1812.842
>>        0.0
>>       Constraint-Vir                           7.614336         182.744
>>        0.0
>>       Settle                                  25.605120        8270.454
>>        0.1
>>       Urey-Bradley                           144.668928       26474.414
>>        0.2
>>     ------------------------------------------------------------
>> -----------------
>>       Total                                                10700125.072
>>      100.0
>>     ------------------------------------------------------------
>> -----------------
>>
>>
>>           R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
>>
>>     On 1 MPI rank, each using 4 OpenMP threads
>>
>>       Computing:          Num   Num      Call    Wall time
>>  Giga-Cycles
>>                           Ranks Threads  Count      (s)         total
>>     sum    %
>>     ------------------------------------------------------------
>> -----------------
>>       Neighbor search        1    4         51       0.260
>>  2.284   2.4
>>       Launch GPU ops.        1    4      10002       0.591
>>  5.191   5.4
>>       Force                  1    4       5001       7.314
>>  64.211  67.1
>>       Wait PME GPU gather    1    4       5001       0.071
>>  0.626   0.7
>>       Reduce GPU PME F       1    4       5001       0.078
>>  0.684   0.7
>>       Wait GPU NB local      1    4       5001       0.017
>>  0.151   0.2
>>       NB X/F buffer ops.     1    4       9951       0.321
>>  2.822   2.9
>>       Write traj.            1    4          2       0.117
>>  1.026   1.1
>>       Update                 1    4       5001       0.199
>>  1.749   1.8
>>       Constraints            1    4       5001       1.853
>>  16.270  17.0
>>       Rest                                           0.085
>>  0.743   0.8
>>     ------------------------------------------------------------
>> -----------------
>>       Total                                         10.907
>>  95.757 100.0
>>     ------------------------------------------------------------
>> -----------------
>>
>>     ********************************
>>     ****** 10 Open MP threads ******
>>     ********************************
>>
>>     On 1 MPI rank, each using 10 OpenMP threads
>>
>>       Computing:          Num   Num      Call    Wall time
>>  Giga-Cycles
>>                           Ranks Threads  Count      (s)         total
>>     sum    %
>>     ------------------------------------------------------------
>> -----------------
>>       Neighbor search        1   10         51       0.120
>>  2.625   2.3
>>       Launch GPU ops.        1   10      10002       0.580
>>  12.731  11.3
>>       Force                  1   10       5001       2.999
>>  65.828  58.4
>>       Wait PME GPU gather    1   10       5001       0.066
>>  1.459   1.3
>>       Reduce GPU PME F       1   10       5001       0.045
>>  0.980   0.9
>>       Wait GPU NB local      1   10       5001       0.014
>>  0.308   0.3
>>       NB X/F buffer ops.     1   10       9951       0.157
>>  3.453   3.1
>>       Write traj.            1   10          2       0.147
>>  3.224   2.9
>>       Update                 1   10       5001       0.140
>>  3.067   2.7
>>       Constraints            1   10       5001       0.814
>>  17.867  15.9
>>       Rest                                           0.053
>>  1.161   1.0
>>     ------------------------------------------------------------
>> -----------------
>>       Total                                          5.135
>>  112.703 100.0
>>     ------------------------------------------------------------
>> -----------------
>>
>>
>>
>>     --     ---------------------------------------------------
>>     Dr. Jochen Hub
>>     Computational Molecular Biophysics Group
>>     Institute for Microbiology and Genetics
>>     Georg-August-University of Göttingen
>>     Justus-von-Liebig-Weg 11, 37077 Göttingen, Germany
>>     <https://maps.google.com/?q=Justus-von-Liebig-Weg+11,+37077+
>> G%C3%B6ttingen,+Germany&entry=gmail&source=g>.
>>     Phone: +49-551-39-14189 <tel:%2B49-551-39-14189>
>>     http://cmb.bio.uni-goettingen.de/ <http://cmb.bio.uni-goettingen.de/>
>>     ---------------------------------------------------
>>     --     Gromacs Developers mailing list
>>
>>     * Please search the archive at
>>     http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List
>>     <http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List>
>>     before posting!
>>
>>     * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>     <http://www.gromacs.org/Support/Mailing_Lists>
>>
>>     * For (un)subscribe requests visit
>>     https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx
>> -developers
>>     <https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gm
>> x-developers>
>>     or send a mail to gmx-developers-request at gromacs.org
>>     <mailto:gmx-developers-request at gromacs.org>.
>>
>>
>>
>>
>>
> --
> ---------------------------------------------------
> Dr. Jochen Hub
> Computational Molecular Biophysics Group
> Institute for Microbiology and Genetics
> Georg-August-University of Göttingen
> Justus-von-Liebig-Weg 11, 37077 Göttingen, Germany.
> Phone: +49-551-39-14189
> http://cmb.bio.uni-goettingen.de/
> ---------------------------------------------------
> --
> Gromacs Developers mailing list
>
> * Please search the archive at http://www.gromacs.org/Support
> /Mailing_Lists/GMX-developers_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
> or send a mail to gmx-developers-request at gromacs.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20171201/92ccacb9/attachment-0003.html>


More information about the gromacs.org_gmx-developers mailing list