[gmx-developers] mdrun 4.6.7 with GPU sharing between thread-MPI ranks yields crashes
Carsten Kutzner
ckutzne at gwdg.de
Thu Dec 11 18:09:32 CET 2014
On 11 Dec 2014, at 17:38, Berk Hess <hess at kth.se> wrote:
> Hi,
>
> We are also having some, as of yet, unexplainable issues that only seem to show up with GPU sharing. We have done a lot of checking, so a bug in Gromacs seems unlikely. Nvidia says this is officially not supported because of some issues, so this could be one of those.
Ah, maybe that is the case.
> PS Why are you not using 5.0? I don’t recall anything related to sharing has changed, but in such cases I would try the newest version.
Will try that in the hope to narrow it down. These runs were for a project started with 4.6,
and up to now there was no pressing argument for mixing versions.
Thanks!
Carsten
>
> Cheers,
>
> Berk
>
> On 12/11/2014 05:32 PM, Carsten Kutzner wrote:
>> Hi,
>>
>> we are seeing a weird problem here with 4.6.7 on GPU nodes.
>> A 146k atom system that already ran happily on a lot of different
>> nodes (with and without GPU) now often crashes on GPU nodes
>> with the error message:
>>
>> x particles communicated to PME node y are more than 2/3 times the cut-off … dimension x
>>
>> DD is 8 x 1 x 1 in all cases, mdrun is started with the somewhat unusual
>> (but best performing) options
>>
>> -ntmpi 8 -ntomp 5 -gpu_id 00001111 -dlb no
>>
>> on nodes with 2x GTX 780Ti and 40 logical cores. Out of 20 of these runs
>> approx 14 die in the first 100k time steps with a variation of the above
>> error message.
>>
>> Our solution for now is to run it with
>>
>> -ntmpi 2 -ntomp 20 -gpu_id 01 -dlb no
>>
>> (no crashes up to now) however, at a large performance penalty.
>>
>> Comments on how to debug this further are welcome.
>>
>> Thanks!
>> Carsten
>>
>>
>>
>> --
>> Dr. Carsten Kutzner
>> Max Planck Institute for Biophysical Chemistry
>> Theoretical and Computational Biophysics
>> Am Fassberg 11, 37077 Goettingen, Germany
>> Tel. +49-551-2012313, Fax: +49-551-2012302
>> http://www.mpibpc.mpg.de/grubmueller/kutzner
>> http://www.mpibpc.mpg.de/grubmueller/sppexa
>>
>
> --
> Gromacs Developers mailing list
>
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers or send a mail to gmx-developers-request at gromacs.org.
--
Dr. Carsten Kutzner
Max Planck Institute for Biophysical Chemistry
Theoretical and Computational Biophysics
Am Fassberg 11, 37077 Goettingen, Germany
Tel. +49-551-2012313, Fax: +49-551-2012302
http://www.mpibpc.mpg.de/grubmueller/kutzner
http://www.mpibpc.mpg.de/grubmueller/sppexa
More information about the gromacs.org_gmx-developers
mailing list