[gmx-users] performance 1 gpu
Szilárd Páll
pall.szilard at gmail.com
Tue Sep 30 23:40:03 CEST 2014
Unfortunately this is an ill-balanced hardware setup (for GROMACS), so
what you see is not unexpected.
There are a couple of things you can try, but don't expect more than a
few % improvement:
- try to lower nstlist (unless you already get a close to 0 buffer);
this will decrease the non-bonded time, hence the CPU waiting/idling,
but it will also increase the search time (and DD time if applicable),
so you'll have to see what wors best for you;
- try to use the -nb gpu_cpu mode, this does a rather splitting the
non-bonded workload, but if you are lucky (=you don't get too much
non-local load which will be computed now on the CPU), you may be able
to get a bit better performance.
You may want to try gcc 4.8 or 4.9 and FFTW 3.3.x, you will most
likely get better performance than with icc+MKL.
On Thu, Sep 25, 2014 at 12:50 PM, Johnny Lu <johnny.lu128 at gmail.com> wrote:
> Hi.
>
> I wonder if gromacs 4.6.7 can run faster on xsede.org because I see cpu
> waits for gpu in the log.
>
> There is 16 cpu (2.7 GHz), 1 phi co-processor, and 1 GPU.
Get that Phi swapped to a GPU and you'll be happier ;)
> I compiled gromacs with gpu and without phi and with intel compiler and mkl.
>
> I didn't install for 5.0.1 because I worry this bug might mess up
> equilibration when I switch from one ensemble to another one (
> http://redmine.gromacs.org/issues/1603).
It's been fixed, 5.0.2 will be released soon, so I suggest you wait for it.
> Below are from the log:
>
> Gromacs version: VERSION 4.6.7
> Precision: single
> Memory model: 64 bit
> MPI library: thread_mpi
> OpenMP support: enabled
> GPU support: enabled
> invsqrt routine: gmx_software_invsqrt(x)
> CPU acceleration: AVX_256
> FFT library: MKL
> Large file support: enabled
> RDTSCP usage: enabled
> Built on: Wed Sep 24 08:33:22 CDT 2014
> Built by: jlu128 at login2.stampede.tacc.utexas.edu [CMAKE]
> Build OS/arch: Linux 2.6.32-431.17.1.el6.x86_64 x86_64
> Build CPU vendor: GenuineIntel
> Build CPU brand: Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
> Build CPU family: 6 Model: 45 Stepping: 7
> Build CPU features: aes apic avx clfsh cmov cx8 cx16 htt lahf_lm mmx msr
> nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdtscp sse2 sse3 sse4.1
> sse4.2 ssse3 tdt x2apic
> C compiler:
> /opt/apps/intel/13/composer_xe_2013.3.163/bin/intel64/icc Intel icc (ICC)
> 13.1.1 20130313
> C compiler flags: -mavx -mkl=sequential -std=gnu99 -Wall -ip
> -funroll-all-loops -O3 -DNDEBUG
> C++ compiler:
> /opt/apps/intel/13/composer_xe_2013.3.163/bin/intel64/icc Intel icc (ICC)
> 13.1.1 20130313
> C++ compiler flags: -mavx -Wall -ip -funroll-all-loops -O3 -DNDEBUG
> Linked with Intel MKL version 11.0.3.
> CUDA compiler: /opt/apps/cuda/6.0/bin/nvcc nvcc: NVIDIA (R) Cuda
> compiler driver;Copyright (c) 2005-2013 NVIDIA Corporation;Built on
> Thu_Mar_13_11:58:58_PDT_2014;Cuda compilation tools, release 6.0, V6.0.1
> CUDA compiler
> flags:-gencode;arch=compute_20,code=sm_20;-gencode;arch=compute_20,code=sm_21;-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_35,code=compute_35;-use_fast_math;;
> -mavx;-Wall;-ip;-funroll-all-loops;-O3;-DNDEBUG
> CUDA driver: 6.0
> CUDA runtime: 6.0
>
> ...
> Using 1 MPI thread
> Using 16 OpenMP threads
>
> Detecting CPU-specific acceleration.
> Present hardware specification:
> Vendor: GenuineIntel
> Brand: Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
> Family: 6 Model: 45 Stepping: 7
> Features: aes apic avx clfsh cmov cx8 cx16 htt lahf_lm mmx msr nonstop_tsc
> pcid pclmuldq pdcm pdpe1gb popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2 ssse3
> tdt x2apic
> Acceleration most likely to fit this hardware: AVX_256
> Acceleration selected at GROMACS compile time: AVX_256
>
>
> 1 GPU detected:
> #0: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible
>
> 1 GPU auto-selected for this run.
> Mapping of GPU to the 1 PP rank in this node: #0
>
> Will do PME sum in reciprocal space.
>
> ...
>
> M E G A - F L O P S A C C O U N T I N G
>
> NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels
> RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
> W3=SPC/TIP3p W4=TIP4p (single or pairs)
> V&F=Potential and force V=Potential only F=Force only
>
> Computing: M-Number M-Flops % Flops
> -----------------------------------------------------------------------------
> Pair Search distance check 1517304.154000 13655737.386 0.1
> NxN Ewald Elec. + VdW [F] 370461474.587968 24450457322.806 92.7
> NxN Ewald Elec. + VdW [V&F] 3742076.012672 400402133.356 1.5
> 1,4 nonbonded interactions 101910.006794 9171900.611 0.0
> Calc Weights 1343655.089577 48371583.225 0.2
> Spread Q Bspline 28664641.910976 57329283.822 0.2
> Gather F Bspline 28664641.910976 171987851.466 0.7
> 3D-FFT 141557361.449024 1132458891.592 4.3
> Solve PME 61439.887616 3932152.807 0.0
> Shift-X 11197.154859 67182.929 0.0
> Angles 71010.004734 11929680.795 0.0
> Propers 108285.007219 24797266.653 0.1
> Impropers 8145.000543 1694160.113 0.0
> Virial 44856.029904 807408.538 0.0
> Stop-CM 4478.909718 44789.097 0.0
> Calc-Ekin 89577.059718 2418580.612 0.0
> Lincs 39405.002627 2364300.158 0.0
> Lincs-Mat 852120.056808 3408480.227 0.0
> Constraint-V 487680.032512 3901440.260 0.0
> Constraint-Vir 44827.529885 1075860.717 0.0
> Settle 136290.009086 44021672.935 0.2
> -----------------------------------------------------------------------------
> Total 26384297680.107 100.0
> -----------------------------------------------------------------------------
>
>
> R E A L C Y C L E A N D T I M E A C C O U N T I N G
>
> Computing: Nodes Th. Count Wall t (s) G-Cycles %
> -----------------------------------------------------------------------------
> Neighbor search 1 16 375001 578.663 24997.882 1.7
> Launch GPU ops. 1 16 15000001 814.410 35181.984 2.3
> Force 1 16 15000001 2954.603 127637.010 8.5
> PME mesh 1 16 15000001 11736.454 507007.492 33.7
> Wait GPU local 1 16 15000001 11159.455 482081.496 32.0
> NB X/F buffer ops. 1 16 29625001 1061.959 45875.952 3.0
> Write traj. 1 16 39 5.207 224.956 0.0
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.
More information about the gromacs.org_gmx-users
mailing list