[gmx-users] help with poor performance on gromacs on Cray linux

Sat Jun 28 23:45:05 CEST 2014

Dear Mark,

Thanks a lot for your kind help!
I notice this on the head of *log file.
It is saying "

*Binary not matching hardware - you might be losing
performance.Acceleration most likely to fit this hardware:
AVX_128_FMAAcceleration selected at GROMACS compile time: SSE2*I compiled
the gmx *tpr on the local machine and sent to the cluster.

Please help take a look the following *log file:
-------------------------------------------------------------------------------------

Log file opened on Tue Jun 17 16:13:17 2014
Host: nid02116  pid: 22133  nodeid: 0  nnodes:  80
Gromacs version:    VERSION 4.6.5
Precision:          single
Memory model:       64 bit
MPI library:        MPI
OpenMP support:     enabled
GPU support:        disabled
invsqrt routine:    gmx_software_invsqrt(x)
CPU acceleration:   SSE2
FFT library:        fftw-3.3.2-sse2
Large file support: enabled
RDTSCP usage:       enabled
Built on:           Tue Jun 17 13:20:08 EDT 2014
Built by:           ****@***-ext5 [CMAKE]
Build OS/arch:      Linux 2.6.32.59-0.7.2-default x86_64
Build CPU vendor:   AuthenticAMD
Build CPU brand:    AMD Opteron(tm) Processor 6140
Build CPU family:   16   Model: 9   Stepping: 1
Build CPU features: apic clfsh cmov cx8 cx16 htt lahf_lm misalignsse mmx
msr nonstop_tsc pdpe1gb popcnt pse rdtscp sse2 sse3 sse4a
C compiler:         /opt/cray/xt-asyncpe/5.26/bin/cc GNU
/opt/cray/xt-asyncpe/5.26/bin/cc: INFO: Compiling with
CRAYPE_COMPILE_TARGET=native.
C compiler flags:   -msse2    -Wextra -Wno-missing-field-initializers
-Wno-sign-compare -Wall -Wno-unused -Wunused-value -Wno-unused-parameter
-Wno-array-bounds -Wno-maybe-uninitialized -Wno-strict-overflow
-fomit-frame-pointer -funroll-all-loops -fexcess-precision=fast  -O3
-DNDEBUG
........

Initializing Domain Decomposition on 80 nodes
Dynamic load balancing: auto
Will sort the charge groups at every domain (re)decomposition

NOTE: Periodic molecules are present in this system. Because of this, the
domain decomposition algorithm cannot easily determine the minimum cell
size that it requires for treating bonded interactions. Instead, domain
decomposition will assume that half the non-bonded cut-off will be a
suitable lower bound.

Minimum cell size due to bonded interactions: 0.600 nm
Using 16 separate PME nodes, per user request
Scaling the initial minimum size with 1/0.8 (option -dds) = 1.25
Optimizing the DD grid for 64 cells with a minimum initial size of 0.750 nm
The maximum allowed number of cells is: X 11 Y 11 Z 20
Domain decomposition grid 4 x 4 x 4, separate PME nodes 16
PME domain decomposition: 4 x 4 x 1
Interleaving PP and PME nodes
This is a particle-particle only node

Domain decomposition nodeid 0, coordinates 0 0 0

Using two step summing over 5 groups of on average 12.8 processes

Using 80 MPI processes

Detecting CPU-specific acceleration.
Present hardware specification:
Vendor: AuthenticAMD
Brand:  AMD Opteron(TM) Processor 6274
Family: 21  Model:  1  Stepping:  2
Features: aes apic avx clfsh cmov cx8 cx16 fma4 htt lahf_lm misalignsse mmx
msr nonstop_tsc pclmuldq pdpe1gb popcnt pse rdtscp sse2 sse3 sse4a sse4.1
sse4.2 ssse3 xop
Acceleration most likely to fit this hardware: AVX_128_FMA
Acceleration selected at GROMACS compile time: SSE2

*Binary not matching hardware - you might be losing
performance.Acceleration most likely to fit this hardware:
AVX_128_FMAAcceleration selected at GROMACS compile time: SSE2*

Table routines are used for coulomb: FALSE
Table routines are used for vdw:     FALSE
Will do PME sum in reciprocal space.

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L. G. Pedersen
A smooth particle mesh Ewald method
J. Chem. Phys. 103 (1995) pp. 8577-8592
-------- -------- --- Thank You --- -------- --------

Will do ordinary reciprocal space Ewald sum.
Using a Gaussian width (1/beta) of 0.384195 nm for Ewald
Cut-off's:   NS: 1.2   Coulomb: 1.2   LJ: 1.2
Long Range LJ corr.: <C6> 6.2437e-04
System total charge: 0.000
Generated table with 1100 data points for Ewald.
Tabscale = 500 points/nm
Generated table with 1100 data points for LJ6.
Tabscale = 500 points/nm
Generated table with 1100 data points for LJ12.
Tabscale = 500 points/nm
Generated table with 1100 data points for 1-4 COUL.
Tabscale = 500 points/nm
Generated table with 1100 data points for 1-4 LJ6.
Tabscale = 500 points/nm
Generated table with 1100 data points for 1-4 LJ12.
Tabscale = 500 points/nm
Potential shift: LJ r^-12: 0.000 r^-6 0.000, Ewald 0.000e+00
Initialized non-bonded Ewald correction tables, spacing: 7.23e-04 size: 3046

Non-default thread affinity set probably by the OpenMP library,
disabling internal thread affinity

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++

----------------------

Best regards,

Thom

Message: 1
> Date: Fri, 27 Jun 2014 23:49:53 +0200
> From: Mark Abraham <mark.j.abraham at gmail.com>
> To: Discussion list for GROMACS users <gmx-users at gromacs.org>
> Subject: Re: [gmx-users] help with poor performance on gromacs on Cray
>         linux
> Message-ID:
>         <CAMNuMAQG=n7FNSxKAg9hXJGXQQ+MZX5+L+F73r=UOA_1i4S8=
> Q at mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> That thread referred to the Cray compilers (these machines ship several),
> but whether that is relevant we don't know. Showing the top and bottom .log
> file chunks is absolutely critical if you want performance feedback.
>
> Mark
> On Jun 26, 2014 10:55 PM, "Tom" <dnaafm at gmail.com> wrote:
>
> > Justin,
> >
> > I compared the peromance (the time spent for mdrun) using md.log files
> for
> > the same simulatin run on Cary Linux and any other Linux system.
> >
> > I agree different hardware can have different performance.
> > But these tests were run on the supper computer clusters with very good
> > reputations of performance. The one on Cray is very slow.
> >
> > I am the first time to run gmc on Cray linux. I am doubting if there is
> any
> > wrong for my installation.
> >
> > From the previous dissusion, gmx looks to have performance problem
> > on Cray linux:
> >
> >
> https://mailman-1.sys.kth.se/pipermail/gromacs.org_gmx-users/2013-May/081473.html
> >
> > I am also wondering if the newest version solved this issue.
> >
> > Thanks!
> >
> > Thom
> >
> >
> > >
> > > Message: 2
> > > Date: Mon, 23 Jun 2014 08:17:55 -0400
> > > From: Justin Lemkul <jalemkul at vt.edu>
> > > To: gmx-users at gromacs.org
> > > Subject: Re: [gmx-users] help with poor performance on gromacs on Cray
> > >         linux
> > > Message-ID: <53A81AF3.7090401 at vt.edu>
> > > Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> > >
> > >
> > >
> > > On 6/23/14, 1:12 AM, Tom wrote:
> > > > Dear Gromacs Developers and Experts:
> > > >
> > > > I noticed that the performance of gromacs on Cray linux clusters is
> > only
> > > > 36.7% of the normal.
> > > >
> > >
> > > Normal what?  Another run on the same system?  You can't directly
> compare
> > > different clusters with different hardware.
> > >
> > > >
> > > > The following is the detail about the installation
> > > > --------------------------
> > > > CC=gcc FC=ifort F77=ifort CXX=icpc
> > > > CMAKE_PREFIX_PATH=/opt/cray/modulefiles/cray-mpich/6.3.0
> > > > cmake .. -DGMX_BUILD_OWN_FFTW=ON -DGMX_GPU=OFF -DGMX_MPI=ON
> > > > -DBUILD_SHARED_LIBS=off -DCMAKE_SKIP_RPATH=ON
> > > > -DCMAKE_INSTALL_PREFIX=~/App/GROMACS
> > > > make F77=gfortran
> > > > make install
> > > > ----------------------
> > > >
> > > > This is the bash_profile:
> > > > ---------------------
> > > > module swap PrgEnv-pgi PrgEnv-gnu
> > > > module load cmake
> > > > export PATH=/home/test/App/GROMACS/bin:$PATH
> > > > -------------------------
> > > >
> > > > Is there any suggestion for my installation to improve the
> efficiency?
> > > >
> > >
> > > More important is the output of the .log file from the simulation.  It
> > > will tell
> > > you where mdrun spent all its time.
> > >
> > > -Justin
> > >
> > > --
> > > ==================================================
> > >
> > > Justin A. Lemkul, Ph.D.
> > > Ruth L. Kirschstein NRSA Postdoctoral Fellow
> > >
> > > Department of Pharmaceutical Sciences
> > > School of Pharmacy
> > > Health Sciences Facility II, Room 601
> > > University of Maryland, Baltimore
> > > 20 Penn St.
> > > Baltimore, MD 21201
> > >
> > > jalemkul at outerbanks.umaryland.edu | (410) 706-7441
> > > http://mackerell.umaryland.edu/~jalemkul
> > >
> > > ==================================================
> > >
> > >
> > >
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> >
>
>
> ------------------------------
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>
>
> End of gromacs.org_gmx-users Digest, Vol 122, Issue 124
> *******************************************************
>