[gmx-developers] Possible regression in gromacs 4.6
Alexey Shvetsov
alexxy at omrb.pnpi.spb.ru
Thu Jun 21 18:22:51 CEST 2012
Hi all!
After merging commit
commit 5ba7125c5972f2aafde2310eaa4a345cbac55da5
Author: Erik Lindahl <erik at kth.se>
Date: Mon May 28 20:54:17 2012 +0200
New CPU detection & AVX/SSE code, removed raw assembly files.
I noticed regression in gromacs speed. I used two systems for tests one
7bna and second speptide froma examples
For 7bna system old 4.6 version 4.6-dev-20120418-3759a-dirty-unknown
gives
R E A L C Y C L E A N D T I M E A C C O U N T I N G
Computing: Nodes Number G-Cycles Seconds %
-----------------------------------------------------------------------
Domain decomp. 16 5000 502.025 335.5 1.5
DD comm. load 16 5000 4.309 2.9 0.0
DD comm. bounds 16 5000 13.941 9.3 0.0
Comm. coord. 16 50001 497.769 332.7 1.5
Neighbor search 16 5001 1630.241 1089.6 4.8
Force 16 50001 23079.690 15425.1 67.3
Wait + Comm. F 16 50001 618.862 413.6 1.8
PME mesh 16 50001 6564.978 4387.7 19.1
Write traj. 16 101 16.666 11.1 0.0
Update 16 50001 384.280 256.8 1.1
Constraints 16 50001 592.154 395.8 1.7
Comm. energies 16 5001 125.537 83.9 0.4
Rest 16 256.227 171.2 0.7
-----------------------------------------------------------------------
Total 16 34286.680 22915.3 100.0
-----------------------------------------------------------------------
-----------------------------------------------------------------------
PME redist. X/F 16 100002 1176.273 786.2 3.4
PME spread/gather 16 100002 2119.858 1416.8 6.2
PME 3D-FFT 16 100002 1041.014 695.8 3.0
PME 3D-FFT Comm. 16 200004 1905.967 1273.8 5.6
PME solve 16 50001 316.714 211.7 0.9
-----------------------------------------------------------------------
Parallel run - timing based on wallclock.
NODE (s) Real (s) (%)
Time: 716.102 716.102 100.0
11:56
(Mnbf/s) (GFlops) (ns/day) (hour/ns)
Performance: 1482.789 73.686 12.066 1.989
New version 4.6-dev-20120618-283a0e5-dirty-unknown with sse4.1
acceleration enabled gives only
R E A L C Y C L E A N D T I M E A C C O U N T I N G
Computing: Nodes Number G-Cycles Seconds %
-----------------------------------------------------------------------
Domain decomp. 16 5000 503.648 336.6 0.5
DD comm. load 16 5000 5.666 3.8 0.0
DD comm. bounds 16 5000 11.637 7.8 0.0
Comm. coord. 16 50001 480.473 321.1 0.4
Neighbor search 16 5001 1665.565 1113.2 1.5
Force 16 50001 98860.466 66073.0 89.0
Wait + Comm. F 16 50001 608.138 406.4 0.5
PME mesh 16 50001 7605.687 5083.2 6.8
Write traj. 16 103 17.010 11.4 0.0
Update 16 50001 383.590 256.4 0.3
Constraints 16 50001 582.954 389.6 0.5
Comm. energies 16 5001 132.665 88.7 0.1
Rest 16 257.063 171.8 0.2
-----------------------------------------------------------------------
Total 16 111114.560 74263.0 100.0
-----------------------------------------------------------------------
-----------------------------------------------------------------------
PME redist. X/F 16 100002 2258.309 1509.3 2.0
PME spread/gather 16 100002 2111.979 1411.5 1.9
PME 3D-FFT 16 100002 1046.271 699.3 0.9
PME 3D-FFT Comm. 16 200004 1854.221 1239.3 1.7
PME solve 16 50001 329.985 220.5 0.3
-----------------------------------------------------------------------
Parallel run - timing based on wallclock.
NODE (s) Real (s) (%)
Time: 2320.719 2320.719 100.0
38:40
(Mnbf/s) (GFlops) (ns/day) (hour/ns)
Performance: 457.569 22.739 3.723 6.446
--
Best Regards,
Alexey 'Alexxy' Shvetsov
Petersburg Nuclear Physics Institute, NRC Kurchatov Institute,
Gatchina, Russia
Department of Molecular and Radiation Biophysics
Gentoo Team Ru
Gentoo Linux Dev
mailto:alexxyum at gmail.com
mailto:alexxy at gentoo.org
mailto:alexxy at omrb.pnpi.spb.ru
More information about the gromacs.org_gmx-developers
mailing list