[gmx-developers] Possible regression in gromacs 4.6

Alexey Shvetsov alexxy at omrb.pnpi.spb.ru
Thu Jun 21 18:22:51 CEST 2012


Hi all!

After merging commit
commit 5ba7125c5972f2aafde2310eaa4a345cbac55da5
Author: Erik Lindahl <erik at kth.se>
Date:   Mon May 28 20:54:17 2012 +0200

     New CPU detection & AVX/SSE code, removed raw assembly files.

I noticed regression in gromacs speed. I used two systems for tests one 
7bna and second speptide froma examples

For 7bna system old 4.6 version 4.6-dev-20120418-3759a-dirty-unknown 
gives
      R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

  Computing:         Nodes     Number     G-Cycles    Seconds     %
-----------------------------------------------------------------------
  Domain decomp.        16       5000      502.025      335.5     1.5
  DD comm. load         16       5000        4.309        2.9     0.0
  DD comm. bounds       16       5000       13.941        9.3     0.0
  Comm. coord.          16      50001      497.769      332.7     1.5
  Neighbor search       16       5001     1630.241     1089.6     4.8
  Force                 16      50001    23079.690    15425.1    67.3
  Wait + Comm. F        16      50001      618.862      413.6     1.8
  PME mesh              16      50001     6564.978     4387.7    19.1
  Write traj.           16        101       16.666       11.1     0.0
  Update                16      50001      384.280      256.8     1.1
  Constraints           16      50001      592.154      395.8     1.7
  Comm. energies        16       5001      125.537       83.9     0.4
  Rest                  16                 256.227      171.2     0.7
-----------------------------------------------------------------------
  Total                 16               34286.680    22915.3   100.0
-----------------------------------------------------------------------
-----------------------------------------------------------------------
  PME redist. X/F       16     100002     1176.273      786.2     3.4
  PME spread/gather     16     100002     2119.858     1416.8     6.2
  PME 3D-FFT            16     100002     1041.014      695.8     3.0
  PME 3D-FFT Comm.      16     200004     1905.967     1273.8     5.6
  PME solve             16      50001      316.714      211.7     0.9
-----------------------------------------------------------------------

         Parallel run - timing based on wallclock.

                NODE (s)   Real (s)      (%)
        Time:    716.102    716.102    100.0
                        11:56
                (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
Performance:   1482.789     73.686     12.066      1.989

New version 4.6-dev-20120618-283a0e5-dirty-unknown with sse4.1 
acceleration enabled gives only
      R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

  Computing:         Nodes     Number     G-Cycles    Seconds     %
-----------------------------------------------------------------------
  Domain decomp.        16       5000      503.648      336.6     0.5
  DD comm. load         16       5000        5.666        3.8     0.0
  DD comm. bounds       16       5000       11.637        7.8     0.0
  Comm. coord.          16      50001      480.473      321.1     0.4
  Neighbor search       16       5001     1665.565     1113.2     1.5
  Force                 16      50001    98860.466    66073.0    89.0
  Wait + Comm. F        16      50001      608.138      406.4     0.5
  PME mesh              16      50001     7605.687     5083.2     6.8
  Write traj.           16        103       17.010       11.4     0.0
  Update                16      50001      383.590      256.4     0.3
  Constraints           16      50001      582.954      389.6     0.5
  Comm. energies        16       5001      132.665       88.7     0.1
  Rest                  16                 257.063      171.8     0.2
-----------------------------------------------------------------------
  Total                 16              111114.560    74263.0   100.0
-----------------------------------------------------------------------
-----------------------------------------------------------------------
  PME redist. X/F       16     100002     2258.309     1509.3     2.0
  PME spread/gather     16     100002     2111.979     1411.5     1.9
  PME 3D-FFT            16     100002     1046.271      699.3     0.9
  PME 3D-FFT Comm.      16     200004     1854.221     1239.3     1.7
  PME solve             16      50001      329.985      220.5     0.3
-----------------------------------------------------------------------

         Parallel run - timing based on wallclock.

                NODE (s)   Real (s)      (%)
        Time:   2320.719   2320.719    100.0
                        38:40
                (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
Performance:    457.569     22.739      3.723      6.446


-- 
Best Regards,
Alexey 'Alexxy' Shvetsov
Petersburg Nuclear Physics Institute, NRC Kurchatov Institute, 
Gatchina, Russia
Department of Molecular and Radiation Biophysics
Gentoo Team Ru
Gentoo Linux Dev
mailto:alexxyum at gmail.com
mailto:alexxy at gentoo.org
mailto:alexxy at omrb.pnpi.spb.ru



More information about the gromacs.org_gmx-developers mailing list