[gmx-users] problem with gpu performance

Justin Lemkul jalemkul at vt.edu
Fri Sep 4 16:38:36 CEST 2015



On 9/4/15 10:35 AM, Peter Kroon wrote:
> Hi Jagannath,
>
> I don't dare comment on these specifics. There's probably some (gromacs
> specific) benchmarks out there *somewhere*, quite possibly on this list.
> But maybe someone else on the list knows what you should get :)
>

Quoting Carsten from a few days ago:

http://dx.doi.org/10.1002/jcc.24030
http://dx.doi.org/10.1007/978-3-319-15976-8_1
http://pubman.mpdl.mpg.de/pubman/item/escidoc:2037317/component/escidoc:2037318/2037317.pdf?mode=download



-Justin

> Peter
>
> On 04/09/15 15:58, jagannath mondal wrote:
>> Hi Peter
>>    Thanks for your response. I also realized that GTX-610 is not able to
>> catch up with the faster cpu ( Intel(R) Core(TM) i7-5930K CPU @ 3.50GHz). I
>> tried cpu-gpu combination for -nb option. It improves it slightly but not
>> by much. So, we are planning to go for a replacement of GPU cards.
>> At this point, we have two plans: either go for single 4 GB GTX-970 or two
>> 2 GB GTX-960 . I was wondering whether you can comment on which options
>> will be better as far as performance is concerned.
>> Thanks for your input
>> jagannath
>>
>> On Fri, Sep 4, 2015 at 6:45 PM, Peter Kroon <p.c.kroon at rug.nl> wrote:
>>
>>> Hi Jagannath,
>>>
>>> AFAIK GT610's are rather slow. What you could try is using both cpu and
>>> gpu for non-bonded interactions (-nb gpu_cpu)
>>>
>>> Peter
>>>
>>> On 04/09/15 15:01, jagannath mondal wrote:
>>>> Dear Gromacs Users
>>>>
>>>>    I am trying to run gpu version of gromacs5.0.6 in a work-station which
>>> is
>>>> a hexacore processor that can be multithreaded to 12. The workstation
>>> has 2
>>>> Geforce GT  610 GPUs . I am finding the simulation using -nb gpu is
>>>> exceedingly slower than -nb cpu ( i,e turning off gpu)
>>>>
>>>> I installed cuda-7.0 and using this I could install gpu version of
>>> gromacs
>>>> 5.0.6 as follows.
>>>>
>>>> cmake ../ -DGMX_BUILD_OWN_FFTW=ON
>>>> -DCMAKE_INSTALL_PREFIX=/home/jmondal/UTIL/GROMACS_5.0.6_gpu/
>>>> -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++  -DGMX_GPU=ON
>>>> -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda/
>>>>
>>>>
>>>> However,  the performance with gpu is very weird. If I do mdrun using
>>>> following command:
>>>> 1) gmx mdrun -s topol. -nb gpu -v &>log_run
>>>>
>>>> and then repeat the same thing by turning of gpu usage
>>>>
>>>> 2) gmx mdrun -s topol -nb cpu -v >& log_run
>>>>
>>>> using gpus, the performance drops about 3 times !! Using both the GPUs
>>>> along with CPUs, the performance is: 1.620 ns/day
>>>>    using only CPUs, the performance is 4.6 ns/day... usage of GPUs is
>>>> frustratingly slowing down the performance.
>>>>
>>>> when using -nb gpu option, gromacs md.log correctly detects gpu and cpu
>>> as
>>>> follows:
>>>>
>>>> Using 2 MPI threads
>>>> Using 6 OpenMP threads per tMPI thread
>>>>
>>>> Detecting CPU SIMD instructions.
>>>> Present hardware specification:
>>>> Vendor: GenuineIntel
>>>> Brand:  Intel(R) Core(TM) i7-5930K CPU @ 3.50GHz
>>>> Family:  6  Model: 63  Stepping:  2
>>>> Features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt lahf_lm mmx
>>>> msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp sse2
>>>> sse3 sse4.1 sse4.2 ssse3 tdt x2apic
>>>> SIMD instructions most likely to fit this hardware: AVX2_256
>>>> SIMD instructions selected at GROMACS compile time: AVX2_256
>>>>
>>>>
>>>> 2 GPUs detected:
>>>>    #0: NVIDIA GeForce GT 610, compute cap.: 2.1, ECC:  no, stat:
>>> compatible
>>>>    #1: NVIDIA GeForce GT 610, compute cap.: 2.1, ECC:  no, stat:
>>> compatible
>>>> 2 GPUs auto-selected for this run.
>>>> Mapping of GPUs to the 2 PP ranks in this node: #0, #1
>>>>
>>>>
>>>> However, when I look at the performance at the end of the simulation, the
>>>> 'wait GPU nonlocal' takes awfully long time.
>>>> I also tried few other options ( such as using only 1 gpu using gpu_id 0
>>> ).
>>>> Also played with ntmpi and ntomp option. But GPUs performance is
>>>> drastically poor ( surprisingly 3 times slower than only cpu-based
>>>> simulation),
>>>>
>>>> I am struggling to figure out whether it is a hardware issue or
>>> GPU-driver
>>>> issue or whether I am not using best optimal option.
>>>> Your suggestion will be useful in solving the issue.
>>>> Jagannath
>>>>
>>>>
>>>>     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
>>>>
>>>> On 2 MPI ranks, each using 6 OpenMP threads
>>>>
>>>>   Computing:          Num   Num      Call    Wall time         Giga-Cycles
>>>>                       Ranks Threads  Count      (s)         total sum    %
>>>>
>>> -----------------------------------------------------------------------------
>>>>   Domain decomp.         2    6         63       0.270         11.322
>>>   0.2
>>>>   DD comm. load          2    6         13       0.000          0.002
>>>   0.0
>>>>   Neighbor search        2    6         63       0.311         13.062
>>>   0.2
>>>>   Launch GPU ops.        2    6       5002       0.205          8.614
>>>   0.2
>>>>   Comm. coord.           2    6       2438       0.239         10.016
>>>   0.2
>>>>   Force                  2    6       2501       1.358         57.011
>>>   1.0
>>>>   Wait + Comm. F         2    6       2501       0.404         16.954
>>>   0.3
>>>>   PME mesh               2    6       2501       9.734        408.587
>>>   7.3
>>>>   Wait GPU nonlocal      2    6       2501     117.798       4944.651
>>> 88.3
>>>>   Wait GPU local         2    6       2501       0.005          0.206
>>>   0.0
>>>>   NB X/F buffer ops.     2    6       9878       0.255         10.683
>>>   0.2
>>>>   Write traj.            2    6          4       0.180          7.558
>>>   0.1
>>>>   Update                 2    6       2501       0.807         33.886
>>>   0.6
>>>>   Constraints            2    6       2501       1.216         51.025
>>>   0.9
>>>>   Comm. energies         2    6        126       0.001          0.055
>>>   0.0
>>>>   Rest                                           0.609         25.573
>>>   0.5
>>> -----------------------------------------------------------------------------
>>>>   Total                                        133.392       5599.205
>>> 100.0
>>>
>>>
>>>
>>> --
>>> Gromacs Users mailing list
>>>
>>> * Please search the archive at
>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>> posting!
>>>
>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>
>>> * For (un)subscribe requests visit
>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>> send a mail to gmx-users-request at gromacs.org.
>>>
>
>
>
>

-- 
==================================================

Justin A. Lemkul, Ph.D.
Ruth L. Kirschstein NRSA Postdoctoral Fellow

Department of Pharmaceutical Sciences
School of Pharmacy
Health Sciences Facility II, Room 629
University of Maryland, Baltimore
20 Penn St.
Baltimore, MD 21201

jalemkul at outerbanks.umaryland.edu | (410) 706-7441
http://mackerell.umaryland.edu/~jalemkul

==================================================


More information about the gromacs.org_gmx-users mailing list