[gmx-users] Problems with TI on GPUs

Szilárd Páll pall.szilard at gmail.com
Wed Jul 13 15:18:38 CEST 2016


Hi,

What's strange in your observations is the following:

Without having seen log files (which would be useful!), as far as i
can tell, the only difference between your working and failing runs
are the use of one vs two GPUs. Can you also reproduce the crash when
you manually launch a run that uses both GPUs (with various configs, I
assume the default one is 4 ranks?)?

I assume these GPUs are quite new, have you burn-in tested them?

Cheers,
--
Szilárd


On Tue, Jul 12, 2016 at 7:18 PM, Yannic Alber
<yannic.alber at tu-dortmund.de> wrote:
> Hi,
>
> In short, this works for every combination. But with the big disadvantage
> of being very slow and not using our precious little gpu´s.
>
> Thanks for your fast reply Mark.
>
>> Hi,
>>
>> What happens in your failing cases (or even all of them) when adding -nb
>> cpu, to force the run off the GPU?
>>
>> Mark
>>
>> On Tue, 12 Jul 2016 19:02 Yannic Alber <yannic.alber at tu-dortmund.de>
>> wrote:
>>
>>> Dear all,
>>>
>>> we struggle to get a TI on our computer running. The specifications are
>>> listed below. As you can see, its a two socket, two graphics cards
>>> machine. Therefore, the plan is to run two simulations in parallel. But
>>> we
>>> can't get a single one to run.
>>>
>>> Running on 1 node with total 20 cores, 20 logical cores, 2 compatible
>>> GPUs
>>> Hardware detected:
>>>   CPU info:
>>>     Vendor: GenuineIntel
>>>     Brand:  Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz
>>>     SIMD instructions most likely to fit this hardware: AVX2_256
>>>     SIMD instructions selected at GROMACS compile time: AVX2_256
>>>   GPU info:
>>>     Number of GPUs detected: 2
>>>     #0: NVIDIA GeForce GTX 1080, compute cap.: 6.1, ECC:  no, stat:
>>> compatible
>>>     #1: NVIDIA GeForce GTX 1080, compute cap.: 6.1, ECC:  no, stat:
>>> compatible
>>>
>>> The simulation system in question is a protein-ligand-complex in
>>> TIP3P-water and amber ff99SB as force field.
>>>
>>> Now lets get into the messy details. We tried different mdrun
>>> commandline
>>> argument rotations, for example:
>>>
>>> gmx mdrun -s md.tpr -pin on -ntomp 2 -ntmpi 5 -gpu_id 00000 -deffnm md
>>> (does not work)
>>> gmx mdrun -s md.tpr -pin on -ntomp 5 -ntmpi 2 -gpu_id 00 -deffnm md
>>> (does not work)
>>> gmx mdrun -s md.tpr -pin on -ntomp 10 -ntmpi 1 -gpu_id 0 -deffnm md
>>> (does not work)
>>> gmx mdrun -s md.tpr -deffnm md
>>> (does work, uses the complete compute node including the both gpu´s)
>>>
>>> The error which gromacs gives us, is rather irritating (explanation
>>> follows further down). Here a little excerpt:
>>>
>>> Step 191, time 0.382 (ps)  LINCS WARNING
>>> relative constraint deviation after LINCS:
>>> rms 0.000002, max 0.000010 (between atoms 3421 and 3424)
>>> bonds that rotated more than 30 degrees:
>>>  atom 1 atom 2  angle  previous, current, constraint length
>>>    3702   3703   31.0    0.1090   0.1090      0.1090
>>> Wrote pdb files with previous and current coordinates
>>>
>>> These errors vary, but refer all to a misplacement or unusual rotations.
>>> Gromacs states, that this is because of our "unstable" system. However,
>>> this explanation can be excluded, because the starting configuration of
>>> the
>>> simulations in question already ran 20 ns in gromacs on a CPU-Cluster.
>>>
>>> We also tested different commands for cmake. A example is shown here:
>>>
>>> cmake .. -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ -DGMX_GPU=on
>>> -DGMX_FFT_LIBRARY=fftw3 -DGMX_BUILD_OWN_FFTW=ON
>>> -DREGRESSIONTEST_DOWNLOAD=ON
>>>
>>> Compilerwise we tried gcc (v.4.8.5) and intel (v.15.0.1).
>>>
>>> I would really appreciate your help and thank you very much in advance.
>>>
>>>
>>> Sincerely,
>>> Yannic
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Gromacs Users mailing list
>>>
>>> * Please search the archive at
>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>> posting!
>>>
>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>
>>> * For (un)subscribe requests visit
>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>> send a mail to gmx-users-request at gromacs.org.
>>>
>>
>
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.


More information about the gromacs.org_gmx-users mailing list