[gmx-users] performance 1 gpu

Wed Oct 1 14:24:13 CEST 2014

That is surprising. I thought intel compiler is the best compiler for intel
cpu.

On Tue, Sep 30, 2014 at 5:40 PM, Szilárd Páll <pall.szilard at gmail.com>
wrote:

> Unfortunately this is an ill-balanced hardware setup (for GROMACS), so
> what you see is not unexpected.
>
> There are a couple of things you can try, but don't expect more than a
> few % improvement:
> - try to lower nstlist (unless you already get a close to 0 buffer);
> this will decrease the non-bonded time, hence the CPU waiting/idling,
> but it will also increase the search time (and DD time if applicable),
> so you'll have to see what wors best for you;
> - try to use the -nb gpu_cpu mode, this does a rather splitting the
> non-bonded workload, but if you are lucky (=you don't get too much
> non-local load which will be computed now on the CPU), you may be able
> to get a bit better performance.
>
> You may want to try gcc 4.8 or 4.9 and FFTW 3.3.x, you will most
> likely get better performance than with icc+MKL.
>
>
> On Thu, Sep 25, 2014 at 12:50 PM, Johnny Lu <johnny.lu128 at gmail.com>
> wrote:
> > Hi.
> >
> > I wonder if gromacs 4.6.7 can run faster on xsede.org because I see cpu
> > waits for gpu in the log.
> >
> > There is 16 cpu (2.7 GHz), 1 phi co-processor, and 1 GPU.
>
> Get that Phi swapped to a GPU and you'll be happier ;)
>
> > I compiled gromacs with gpu and without phi and with intel compiler and
> mkl.
> >
> > I didn't install for 5.0.1 because I worry this bug might mess up
> > equilibration when I switch from one ensemble to another one (
> > http://redmine.gromacs.org/issues/1603).
>
> It's been fixed, 5.0.2 will be released soon, so I suggest you wait for it.
>
> > Below are from the log:
> >
> > Gromacs version:    VERSION 4.6.7
> > Precision:          single
> > Memory model:       64 bit
> > MPI library:        thread_mpi
> > OpenMP support:     enabled
> > GPU support:        enabled
> > invsqrt routine:    gmx_software_invsqrt(x)
> > CPU acceleration:   AVX_256
> > FFT library:        MKL
> > Large file support: enabled
> > RDTSCP usage:       enabled
> > Built on:           Wed Sep 24 08:33:22 CDT 2014
> > Built by:           jlu128 at login2.stampede.tacc.utexas.edu [CMAKE]
> > Build OS/arch:      Linux 2.6.32-431.17.1.el6.x86_64 x86_64
> > Build CPU vendor:   GenuineIntel
> > Build CPU brand:    Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
> > Build CPU family:   6   Model: 45   Stepping: 7
> > Build CPU features: aes apic avx clfsh cmov cx8 cx16 htt lahf_lm mmx msr
> > nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdtscp sse2 sse3 sse4.1
> > sse4.2 ssse3 tdt x2apic
> > C compiler:
> > /opt/apps/intel/13/composer_xe_2013.3.163/bin/intel64/icc Intel icc (ICC)
> > 13.1.1 20130313
> > C compiler flags:   -mavx    -mkl=sequential -std=gnu99 -Wall   -ip
> > -funroll-all-loops  -O3 -DNDEBUG
> > C++ compiler:
> > /opt/apps/intel/13/composer_xe_2013.3.163/bin/intel64/icc Intel icc (ICC)
> > 13.1.1 20130313
> > C++ compiler flags: -mavx   -Wall   -ip -funroll-all-loops  -O3 -DNDEBUG
> > Linked with Intel MKL version 11.0.3.
> > CUDA compiler:      /opt/apps/cuda/6.0/bin/nvcc nvcc: NVIDIA (R) Cuda
> > compiler driver;Copyright (c) 2005-2013 NVIDIA Corporation;Built on
> > Thu_Mar_13_11:58:58_PDT_2014;Cuda compilation tools, release 6.0, V6.0.1
> > CUDA compiler
> >
> flags:-gencode;arch=compute_20,code=sm_20;-gencode;arch=compute_20,code=sm_21;-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_35,code=compute_35;-use_fast_math;;
> > -mavx;-Wall;-ip;-funroll-all-loops;-O3;-DNDEBUG
> > CUDA driver:        6.0
> > CUDA runtime:       6.0
> >
> > ...
> > Using 1 MPI thread
> > Using 16 OpenMP threads
> >
> > Detecting CPU-specific acceleration.
> > Present hardware specification:
> > Vendor: GenuineIntel
> > Brand:  Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
> > Family:  6  Model: 45  Stepping:  7
> > Features: aes apic avx clfsh cmov cx8 cx16 htt lahf_lm mmx msr
> nonstop_tsc
> > pcid pclmuldq pdcm pdpe1gb popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2
> ssse3
> > tdt x2apic
> > Acceleration most likely to fit this hardware: AVX_256
> > Acceleration selected at GROMACS compile time: AVX_256
> >
> >
> > 1 GPU detected:
> >   #0: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible
> >
> > 1 GPU auto-selected for this run.
> > Mapping of GPU to the 1 PP rank in this node: #0
> >
> > Will do PME sum in reciprocal space.
> >
> > ...
> >
> >  M E G A - F L O P S   A C C O U N T I N G
> >
> >  NB=Group-cutoff nonbonded kernels    NxN=N-by-N cluster Verlet kernels
> >  RF=Reaction-Field  VdW=Van der Waals  QSTab=quadratic-spline table
> >  W3=SPC/TIP3p  W4=TIP4p (single or pairs)
> >  V&F=Potential and force  V=Potential only  F=Force only
> >
> >  Computing:                               M-Number         M-Flops  %
> Flops
> >
> -----------------------------------------------------------------------------
> >  Pair Search distance check         1517304.154000    13655737.386
>  0.1
> >  NxN Ewald Elec. + VdW [F]        370461474.587968 24450457322.806
> 92.7
> >  NxN Ewald Elec. + VdW [V&F]        3742076.012672   400402133.356
>  1.5
> >  1,4 nonbonded interactions          101910.006794     9171900.611
>  0.0
> >  Calc Weights                       1343655.089577    48371583.225
>  0.2
> >  Spread Q Bspline                  28664641.910976    57329283.822
>  0.2
> >  Gather F Bspline                  28664641.910976   171987851.466
>  0.7
> >  3D-FFT                           141557361.449024  1132458891.592
>  4.3
> >  Solve PME                            61439.887616     3932152.807
>  0.0
> >  Shift-X                              11197.154859       67182.929
>  0.0
> >  Angles                               71010.004734    11929680.795
>  0.0
> >  Propers                             108285.007219    24797266.653
>  0.1
> >  Impropers                             8145.000543     1694160.113
>  0.0
> >  Virial                               44856.029904      807408.538
>  0.0
> >  Stop-CM                               4478.909718       44789.097
>  0.0
> >  Calc-Ekin                            89577.059718     2418580.612
>  0.0
> >  Lincs                                39405.002627     2364300.158
>  0.0
> >  Lincs-Mat                           852120.056808     3408480.227
>  0.0
> >  Constraint-V                        487680.032512     3901440.260
>  0.0
> >  Constraint-Vir                       44827.529885     1075860.717
>  0.0
> >  Settle                              136290.009086    44021672.935
>  0.2
> >
> -----------------------------------------------------------------------------
> >  Total                                             26384297680.107
>  100.0
> >
> -----------------------------------------------------------------------------
> >
> >
> >      R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
> >
> >  Computing:         Nodes   Th.     Count  Wall t (s)     G-Cycles
>  %
> >
> -----------------------------------------------------------------------------
> >  Neighbor search        1   16     375001     578.663    24997.882
>  1.7
> >  Launch GPU ops.        1   16   15000001     814.410    35181.984
>  2.3
> >  Force                  1   16   15000001    2954.603   127637.010
>  8.5
> >  PME mesh               1   16   15000001   11736.454   507007.492
> 33.7
> >  Wait GPU local         1   16   15000001   11159.455   482081.496
> 32.0
> >  NB X/F buffer ops.     1   16   29625001    1061.959    45875.952
>  3.0
> >  Write traj.            1   16         39       5.207      224.956
>  0.0
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>