[gmx-users] gromacs.org_gmx-users Digest, Vol 123, Issue 29

Tom dnaafm at gmail.com
Fri Jul 4 18:19:41 CEST 2014


Hi Mark

Thanks a lot for your information!
I used the newest version to compile and will have not such report about
mathcing of
Acceleration and hardware". But the performace is still the same as before.
It is slow.
about 37% peformance on Cray...

I did use gcc to compile but am using cray-mpich. I am wondering if
cray-mpich causes
the low performance. Or i need to change some options in the installation.

The following is the detail about the installation
--------------------------
CC=gcc FC=ifort F77=ifort CXX=icpc
CMAKE_PREFIX_PATH=/opt/cray/modulefiles/cray-mpich/6.3.0
cmake .. -DGMX_BUILD_OWN_FFTW=ON -DGMX_GPU=OFF -DGMX_MPI=ON
-DBUILD_SHARED_LIBS=off -DCMAKE_SKIP_RPATH=ON
-DCMAKE_INSTALL_PREFIX=~/App/GROMACS
make F77=gfortran
make install
----------------------
This is the bash_profile:
---------------------
module swap PrgEnv-pgi PrgEnv-gnu
module load cmake
export PATH=/home/test/App/GROMACS/bin:$PATH
-------------------------

Thanks agin for all the help!

Thom

>
> Message: 1
> Date: Fri, 4 Jul 2014 10:29:07 +0200
> From: Mark Abraham <mark.j.abraham at gmail.com>
> To: Discussion list for GROMACS users <gmx-users at gromacs.org>
> Cc: Discussion list for GROMACS users
>         <gromacs.org_gmx-users at maillist.sys.kth.se>
> Subject: Re: [gmx-users] help with poor performance on gromacs on Cray
>         linux
> Message-ID:
>         <
> CAMNuMASdP4G2hf9LCjwxofzYGERg3idZ+AG45vgpvWLOuni8Lw at mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> On Sat, Jun 28, 2014 at 11:44 PM, Tom <dnaafm at gmail.com> wrote:
>
> > Dear Mark,
> >
> > Thanks a lot for your kind help!
> > I notice this on the head of *log file.
> > It is saying "
> >
> >
> >
> >
> > *Binary not matching hardware - you might be losing
> > performance.Acceleration most likely to fit this hardware:
> > AVX_128_FMAAcceleration selected at GROMACS compile time: SSE2*I compiled
> > the gmx *tpr on the local machine and sent to the cluster.
> >
>
> That's about a factor of three you're losing. Compile for your target, like
> the message says.
>
>
> > Please help take a look the following *log file:
> >
> >
> -------------------------------------------------------------------------------------
> >
> > Log file opened on Tue Jun 17 16:13:17 2014
> > Host: nid02116  pid: 22133  nodeid: 0  nnodes:  80
> > Gromacs version:    VERSION 4.6.5
> > Precision:          single
> > Memory model:       64 bit
> > MPI library:        MPI
> > OpenMP support:     enabled
> > GPU support:        disabled
> > invsqrt routine:    gmx_software_invsqrt(x)
> > CPU acceleration:   SSE2
> > FFT library:        fftw-3.3.2-sse2
> > Large file support: enabled
> > RDTSCP usage:       enabled
> > Built on:           Tue Jun 17 13:20:08 EDT 2014
> > Built by:           ****@***-ext5 [CMAKE]
> > Build OS/arch:      Linux 2.6.32.59-0.7.2-default x86_64
> > Build CPU vendor:   AuthenticAMD
> > Build CPU brand:    AMD Opteron(tm) Processor 6140
> > Build CPU family:   16   Model: 9   Stepping: 1
> > Build CPU features: apic clfsh cmov cx8 cx16 htt lahf_lm misalignsse mmx
> > msr nonstop_tsc pdpe1gb popcnt pse rdtscp sse2 sse3 sse4a
> > C compiler:         /opt/cray/xt-asyncpe/5.26/bin/cc GNU
> > /opt/cray/xt-asyncpe/5.26/bin/cc: INFO: Compiling with
> > CRAYPE_COMPILE_TARGET=native.
> >
>
> As discussed, the Cray compilers do not do as good a job as the
> Cray-provided gcc or icc compilers. Use those.
>
> Mark
>
> C compiler flags:   -msse2    -Wextra -Wno-missing-field-initializers
> > -Wno-sign-compare -Wall -Wno-unused -Wunused-value -Wno-unused-parameter
> > -Wno-array-bounds -Wno-maybe-uninitialized -Wno-strict-overflow
> > -fomit-frame-pointer -funroll-all-loops -fexcess-precision=fast  -O3
> > -DNDEBUG
> > ........
> >
> > Initializing Domain Decomposition on 80 nodes
> > Dynamic load balancing: auto
> > Will sort the charge groups at every domain (re)decomposition
> >
> > NOTE: Periodic molecules are present in this system. Because of this, the
> > domain decomposition algorithm cannot easily determine the minimum cell
> > size that it requires for treating bonded interactions. Instead, domain
> > decomposition will assume that half the non-bonded cut-off will be a
> > suitable lower bound.
> >
> > Minimum cell size due to bonded interactions: 0.600 nm
> > Using 16 separate PME nodes, per user request
> > Scaling the initial minimum size with 1/0.8 (option -dds) = 1.25
> > Optimizing the DD grid for 64 cells with a minimum initial size of 0.750
> nm
> > The maximum allowed number of cells is: X 11 Y 11 Z 20
> > Domain decomposition grid 4 x 4 x 4, separate PME nodes 16
> > PME domain decomposition: 4 x 4 x 1
> > Interleaving PP and PME nodes
> > This is a particle-particle only node
> >
> > Domain decomposition nodeid 0, coordinates 0 0 0
> >
> > Using two step summing over 5 groups of on average 12.8 processes
> >
> > Using 80 MPI processes
> >
> > Detecting CPU-specific acceleration.
> > Present hardware specification:
> > Vendor: AuthenticAMD
> > Brand:  AMD Opteron(TM) Processor 6274
> > Family: 21  Model:  1  Stepping:  2
> > Features: aes apic avx clfsh cmov cx8 cx16 fma4 htt lahf_lm misalignsse
> mmx
> > msr nonstop_tsc pclmuldq pdpe1gb popcnt pse rdtscp sse2 sse3 sse4a sse4.1
> > sse4.2 ssse3 xop
> > Acceleration most likely to fit this hardware: AVX_128_FMA
> > Acceleration selected at GROMACS compile time: SSE2
> >
> >
> >
> >
> > *Binary not matching hardware - you might be losing
> > performance.Acceleration most likely to fit this hardware:
> > AVX_128_FMAAcceleration selected at GROMACS compile time: SSE2*
> >
> > Table routines are used for coulomb: FALSE
> > Table routines are used for vdw:     FALSE
> > Will do PME sum in reciprocal space.
> >
> > ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> > U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L. G.
> > Pedersen
> > A smooth particle mesh Ewald method
> > J. Chem. Phys. 103 (1995) pp. 8577-8592
> > -------- -------- --- Thank You --- -------- --------
> >
> > Will do ordinary reciprocal space Ewald sum.
> > Using a Gaussian width (1/beta) of 0.384195 nm for Ewald
> > Cut-off's:   NS: 1.2   Coulomb: 1.2   LJ: 1.2
> > Long Range LJ corr.: <C6> 6.2437e-04
> > System total charge: 0.000
> > Generated table with 1100 data points for Ewald.
> > Tabscale = 500 points/nm
> > Generated table with 1100 data points for LJ6.
> > Tabscale = 500 points/nm
> > Generated table with 1100 data points for LJ12.
> > Tabscale = 500 points/nm
> > Generated table with 1100 data points for 1-4 COUL.
> > Tabscale = 500 points/nm
> > Generated table with 1100 data points for 1-4 LJ6.
> > Tabscale = 500 points/nm
> > Generated table with 1100 data points for 1-4 LJ12.
> > Tabscale = 500 points/nm
> > Potential shift: LJ r^-12: 0.000 r^-6 0.000, Ewald 0.000e+00
> > Initialized non-bonded Ewald correction tables, spacing: 7.23e-04 size:
> > 3046
> >
> >
> > Non-default thread affinity set probably by the OpenMP library,
> > disabling internal thread affinity
> >
> > ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> >
> > ----------------------
> >
> > Best regards,
> >
> > Thom
> >
> >
> >
> >
> >
> > Message: 1
> > > Date: Fri, 27 Jun 2014 23:49:53 +0200
> > > From: Mark Abraham <mark.j.abraham at gmail.com>
> > > To: Discussion list for GROMACS users <gmx-users at gromacs.org>
> > > Subject: Re: [gmx-users] help with poor performance on gromacs on Cray
> > >         linux
> > > Message-ID:
> > >         <CAMNuMAQG=n7FNSxKAg9hXJGXQQ+MZX5+L+F73r=UOA_1i4S8=
> > > Q at mail.gmail.com>
> > > Content-Type: text/plain; charset=UTF-8
> > >
> > > That thread referred to the Cray compilers (these machines ship
> several),
> > > but whether that is relevant we don't know. Showing the top and bottom
> > .log
> > > file chunks is absolutely critical if you want performance feedback.
> > >
> > > Mark
> > > On Jun 26, 2014 10:55 PM, "Tom" <dnaafm at gmail.com> wrote:
> > >
> > > > Justin,
> > > >
> > > > I compared the peromance (the time spent for mdrun) using md.log
> files
> > > for
> > > > the same simulatin run on Cary Linux and any other Linux system.
> > > >
> > > > I agree different hardware can have different performance.
> > > > But these tests were run on the supper computer clusters with very
> good
> > > > reputations of performance. The one on Cray is very slow.
> > > >
> > > > I am the first time to run gmc on Cray linux. I am doubting if there
> is
> > > any
> > > > wrong for my installation.
> > > >
> > > > From the previous dissusion, gmx looks to have performance problem
> > > > on Cray linux:
> > > >
> > > >
> > >
> >
> https://mailman-1.sys.kth.se/pipermail/gromacs.org_gmx-users/2013-May/081473.html
> > > >
> > > > I am also wondering if the newest version solved this issue.
> > > >
> > > > Thanks!
> > > >
> > > > Thom
> > > >
> > > >
>


More information about the gromacs.org_gmx-users mailing list