[gmx-users] problem with gpu performance
jagannath mondal
jm3745 at columbia.edu
Fri Sep 4 15:58:37 CEST 2015
Hi Peter
Thanks for your response. I also realized that GTX-610 is not able to
catch up with the faster cpu ( Intel(R) Core(TM) i7-5930K CPU @ 3.50GHz). I
tried cpu-gpu combination for -nb option. It improves it slightly but not
by much. So, we are planning to go for a replacement of GPU cards.
At this point, we have two plans: either go for single 4 GB GTX-970 or two
2 GB GTX-960 . I was wondering whether you can comment on which options
will be better as far as performance is concerned.
Thanks for your input
jagannath
On Fri, Sep 4, 2015 at 6:45 PM, Peter Kroon <p.c.kroon at rug.nl> wrote:
> Hi Jagannath,
>
> AFAIK GT610's are rather slow. What you could try is using both cpu and
> gpu for non-bonded interactions (-nb gpu_cpu)
>
> Peter
>
> On 04/09/15 15:01, jagannath mondal wrote:
> > Dear Gromacs Users
> >
> > I am trying to run gpu version of gromacs5.0.6 in a work-station which
> is
> > a hexacore processor that can be multithreaded to 12. The workstation
> has 2
> > Geforce GT 610 GPUs . I am finding the simulation using -nb gpu is
> > exceedingly slower than -nb cpu ( i,e turning off gpu)
> >
> > I installed cuda-7.0 and using this I could install gpu version of
> gromacs
> > 5.0.6 as follows.
> >
> > cmake ../ -DGMX_BUILD_OWN_FFTW=ON
> > -DCMAKE_INSTALL_PREFIX=/home/jmondal/UTIL/GROMACS_5.0.6_gpu/
> > -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ -DGMX_GPU=ON
> > -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda/
> >
> >
> > However, the performance with gpu is very weird. If I do mdrun using
> > following command:
> > 1) gmx mdrun -s topol. -nb gpu -v &>log_run
> >
> > and then repeat the same thing by turning of gpu usage
> >
> > 2) gmx mdrun -s topol -nb cpu -v >& log_run
> >
> > using gpus, the performance drops about 3 times !! Using both the GPUs
> > along with CPUs, the performance is: 1.620 ns/day
> > using only CPUs, the performance is 4.6 ns/day... usage of GPUs is
> > frustratingly slowing down the performance.
> >
> > when using -nb gpu option, gromacs md.log correctly detects gpu and cpu
> as
> > follows:
> >
> > Using 2 MPI threads
> > Using 6 OpenMP threads per tMPI thread
> >
> > Detecting CPU SIMD instructions.
> > Present hardware specification:
> > Vendor: GenuineIntel
> > Brand: Intel(R) Core(TM) i7-5930K CPU @ 3.50GHz
> > Family: 6 Model: 63 Stepping: 2
> > Features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt lahf_lm mmx
> > msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp sse2
> > sse3 sse4.1 sse4.2 ssse3 tdt x2apic
> > SIMD instructions most likely to fit this hardware: AVX2_256
> > SIMD instructions selected at GROMACS compile time: AVX2_256
> >
> >
> > 2 GPUs detected:
> > #0: NVIDIA GeForce GT 610, compute cap.: 2.1, ECC: no, stat:
> compatible
> > #1: NVIDIA GeForce GT 610, compute cap.: 2.1, ECC: no, stat:
> compatible
> >
> > 2 GPUs auto-selected for this run.
> > Mapping of GPUs to the 2 PP ranks in this node: #0, #1
> >
> >
> > However, when I look at the performance at the end of the simulation, the
> > 'wait GPU nonlocal' takes awfully long time.
> > I also tried few other options ( such as using only 1 gpu using gpu_id 0
> ).
> > Also played with ntmpi and ntomp option. But GPUs performance is
> > drastically poor ( surprisingly 3 times slower than only cpu-based
> > simulation),
> >
> > I am struggling to figure out whether it is a hardware issue or
> GPU-driver
> > issue or whether I am not using best optimal option.
> > Your suggestion will be useful in solving the issue.
> > Jagannath
> >
> >
> > R E A L C Y C L E A N D T I M E A C C O U N T I N G
> >
> > On 2 MPI ranks, each using 6 OpenMP threads
> >
> > Computing: Num Num Call Wall time Giga-Cycles
> > Ranks Threads Count (s) total sum %
> >
> -----------------------------------------------------------------------------
> > Domain decomp. 2 6 63 0.270 11.322
> 0.2
> > DD comm. load 2 6 13 0.000 0.002
> 0.0
> > Neighbor search 2 6 63 0.311 13.062
> 0.2
> > Launch GPU ops. 2 6 5002 0.205 8.614
> 0.2
> > Comm. coord. 2 6 2438 0.239 10.016
> 0.2
> > Force 2 6 2501 1.358 57.011
> 1.0
> > Wait + Comm. F 2 6 2501 0.404 16.954
> 0.3
> > PME mesh 2 6 2501 9.734 408.587
> 7.3
> > Wait GPU nonlocal 2 6 2501 117.798 4944.651
> 88.3
> > Wait GPU local 2 6 2501 0.005 0.206
> 0.0
> > NB X/F buffer ops. 2 6 9878 0.255 10.683
> 0.2
> > Write traj. 2 6 4 0.180 7.558
> 0.1
> > Update 2 6 2501 0.807 33.886
> 0.6
> > Constraints 2 6 2501 1.216 51.025
> 0.9
> > Comm. energies 2 6 126 0.001 0.055
> 0.0
> > Rest 0.609 25.573
> 0.5
> >
> -----------------------------------------------------------------------------
> > Total 133.392 5599.205
> 100.0
>
>
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>
More information about the gromacs.org_gmx-users
mailing list