[gmx-users] performance 1 gpu
Mark Abraham
mark.j.abraham at gmail.com
Wed Oct 1 14:55:45 CEST 2014
Hi,
Not really surprising. The compiler teams try to optimize the performance
of lots of different kinds of code on a range of platforms. Some kinds of
code aren't prioritized for a given compiler team. Often there are
trade-offs that mean some kinds of code gets faster while others get
slower, perhaps differently for different hardware targets, while everybody
gradually tries to reach nirvana. Actual performance is a matter of how
well a specific code works with a specific compiler on specific hardware.
Mark
On Wed, Oct 1, 2014 at 2:24 PM, Johnny Lu <johnny.lu128 at gmail.com> wrote:
> That is surprising. I thought intel compiler is the best compiler for intel
> cpu.
>
> On Tue, Sep 30, 2014 at 5:40 PM, Szilárd Páll <pall.szilard at gmail.com>
> wrote:
>
> > Unfortunately this is an ill-balanced hardware setup (for GROMACS), so
> > what you see is not unexpected.
> >
> > There are a couple of things you can try, but don't expect more than a
> > few % improvement:
> > - try to lower nstlist (unless you already get a close to 0 buffer);
> > this will decrease the non-bonded time, hence the CPU waiting/idling,
> > but it will also increase the search time (and DD time if applicable),
> > so you'll have to see what wors best for you;
> > - try to use the -nb gpu_cpu mode, this does a rather splitting the
> > non-bonded workload, but if you are lucky (=you don't get too much
> > non-local load which will be computed now on the CPU), you may be able
> > to get a bit better performance.
> >
> > You may want to try gcc 4.8 or 4.9 and FFTW 3.3.x, you will most
> > likely get better performance than with icc+MKL.
> >
> >
> > On Thu, Sep 25, 2014 at 12:50 PM, Johnny Lu <johnny.lu128 at gmail.com>
> > wrote:
> > > Hi.
> > >
> > > I wonder if gromacs 4.6.7 can run faster on xsede.org because I see
> cpu
> > > waits for gpu in the log.
> > >
> > > There is 16 cpu (2.7 GHz), 1 phi co-processor, and 1 GPU.
> >
> > Get that Phi swapped to a GPU and you'll be happier ;)
> >
> > > I compiled gromacs with gpu and without phi and with intel compiler and
> > mkl.
> > >
> > > I didn't install for 5.0.1 because I worry this bug might mess up
> > > equilibration when I switch from one ensemble to another one (
> > > http://redmine.gromacs.org/issues/1603).
> >
> > It's been fixed, 5.0.2 will be released soon, so I suggest you wait for
> it.
> >
> > > Below are from the log:
> > >
> > > Gromacs version: VERSION 4.6.7
> > > Precision: single
> > > Memory model: 64 bit
> > > MPI library: thread_mpi
> > > OpenMP support: enabled
> > > GPU support: enabled
> > > invsqrt routine: gmx_software_invsqrt(x)
> > > CPU acceleration: AVX_256
> > > FFT library: MKL
> > > Large file support: enabled
> > > RDTSCP usage: enabled
> > > Built on: Wed Sep 24 08:33:22 CDT 2014
> > > Built by: jlu128 at login2.stampede.tacc.utexas.edu [CMAKE]
> > > Build OS/arch: Linux 2.6.32-431.17.1.el6.x86_64 x86_64
> > > Build CPU vendor: GenuineIntel
> > > Build CPU brand: Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
> > > Build CPU family: 6 Model: 45 Stepping: 7
> > > Build CPU features: aes apic avx clfsh cmov cx8 cx16 htt lahf_lm mmx
> msr
> > > nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdtscp sse2 sse3
> sse4.1
> > > sse4.2 ssse3 tdt x2apic
> > > C compiler:
> > > /opt/apps/intel/13/composer_xe_2013.3.163/bin/intel64/icc Intel icc
> (ICC)
> > > 13.1.1 20130313
> > > C compiler flags: -mavx -mkl=sequential -std=gnu99 -Wall -ip
> > > -funroll-all-loops -O3 -DNDEBUG
> > > C++ compiler:
> > > /opt/apps/intel/13/composer_xe_2013.3.163/bin/intel64/icc Intel icc
> (ICC)
> > > 13.1.1 20130313
> > > C++ compiler flags: -mavx -Wall -ip -funroll-all-loops -O3
> -DNDEBUG
> > > Linked with Intel MKL version 11.0.3.
> > > CUDA compiler: /opt/apps/cuda/6.0/bin/nvcc nvcc: NVIDIA (R) Cuda
> > > compiler driver;Copyright (c) 2005-2013 NVIDIA Corporation;Built on
> > > Thu_Mar_13_11:58:58_PDT_2014;Cuda compilation tools, release 6.0,
> V6.0.1
> > > CUDA compiler
> > >
> >
> flags:-gencode;arch=compute_20,code=sm_20;-gencode;arch=compute_20,code=sm_21;-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_35,code=compute_35;-use_fast_math;;
> > > -mavx;-Wall;-ip;-funroll-all-loops;-O3;-DNDEBUG
> > > CUDA driver: 6.0
> > > CUDA runtime: 6.0
> > >
> > > ...
> > > Using 1 MPI thread
> > > Using 16 OpenMP threads
> > >
> > > Detecting CPU-specific acceleration.
> > > Present hardware specification:
> > > Vendor: GenuineIntel
> > > Brand: Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
> > > Family: 6 Model: 45 Stepping: 7
> > > Features: aes apic avx clfsh cmov cx8 cx16 htt lahf_lm mmx msr
> > nonstop_tsc
> > > pcid pclmuldq pdcm pdpe1gb popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2
> > ssse3
> > > tdt x2apic
> > > Acceleration most likely to fit this hardware: AVX_256
> > > Acceleration selected at GROMACS compile time: AVX_256
> > >
> > >
> > > 1 GPU detected:
> > > #0: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible
> > >
> > > 1 GPU auto-selected for this run.
> > > Mapping of GPU to the 1 PP rank in this node: #0
> > >
> > > Will do PME sum in reciprocal space.
> > >
> > > ...
> > >
> > > M E G A - F L O P S A C C O U N T I N G
> > >
> > > NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels
> > > RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
> > > W3=SPC/TIP3p W4=TIP4p (single or pairs)
> > > V&F=Potential and force V=Potential only F=Force only
> > >
> > > Computing: M-Number M-Flops %
> > Flops
> > >
> >
> -----------------------------------------------------------------------------
> > > Pair Search distance check 1517304.154000 13655737.386
> > 0.1
> > > NxN Ewald Elec. + VdW [F] 370461474.587968 24450457322.806
> > 92.7
> > > NxN Ewald Elec. + VdW [V&F] 3742076.012672 400402133.356
> > 1.5
> > > 1,4 nonbonded interactions 101910.006794 9171900.611
> > 0.0
> > > Calc Weights 1343655.089577 48371583.225
> > 0.2
> > > Spread Q Bspline 28664641.910976 57329283.822
> > 0.2
> > > Gather F Bspline 28664641.910976 171987851.466
> > 0.7
> > > 3D-FFT 141557361.449024 1132458891.592
> > 4.3
> > > Solve PME 61439.887616 3932152.807
> > 0.0
> > > Shift-X 11197.154859 67182.929
> > 0.0
> > > Angles 71010.004734 11929680.795
> > 0.0
> > > Propers 108285.007219 24797266.653
> > 0.1
> > > Impropers 8145.000543 1694160.113
> > 0.0
> > > Virial 44856.029904 807408.538
> > 0.0
> > > Stop-CM 4478.909718 44789.097
> > 0.0
> > > Calc-Ekin 89577.059718 2418580.612
> > 0.0
> > > Lincs 39405.002627 2364300.158
> > 0.0
> > > Lincs-Mat 852120.056808 3408480.227
> > 0.0
> > > Constraint-V 487680.032512 3901440.260
> > 0.0
> > > Constraint-Vir 44827.529885 1075860.717
> > 0.0
> > > Settle 136290.009086 44021672.935
> > 0.2
> > >
> >
> -----------------------------------------------------------------------------
> > > Total 26384297680.107
> > 100.0
> > >
> >
> -----------------------------------------------------------------------------
> > >
> > >
> > > R E A L C Y C L E A N D T I M E A C C O U N T I N G
> > >
> > > Computing: Nodes Th. Count Wall t (s) G-Cycles
> > %
> > >
> >
> -----------------------------------------------------------------------------
> > > Neighbor search 1 16 375001 578.663 24997.882
> > 1.7
> > > Launch GPU ops. 1 16 15000001 814.410 35181.984
> > 2.3
> > > Force 1 16 15000001 2954.603 127637.010
> > 8.5
> > > PME mesh 1 16 15000001 11736.454 507007.492
> > 33.7
> > > Wait GPU local 1 16 15000001 11159.455 482081.496
> > 32.0
> > > NB X/F buffer ops. 1 16 29625001 1061.959 45875.952
> > 3.0
> > > Write traj. 1 16 39 5.207 224.956
> > 0.0
> > > --
> > > Gromacs Users mailing list
> > >
> > > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> > >
> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >
> > > * For (un)subscribe requests visit
> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> >
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>
More information about the gromacs.org_gmx-users
mailing list