[gmx-developers] mdrun 4.6.7 with GPU sharing between thread-MPI ranks yields crashes

Carsten Kutzner ckutzne at gwdg.de
Thu Dec 11 18:09:32 CET 2014


On 11 Dec 2014, at 17:38, Berk Hess <hess at kth.se> wrote:

> Hi,
> 
> We are also having some, as of yet, unexplainable issues that only seem to show up with GPU sharing. We have done a lot of checking, so a bug in Gromacs seems unlikely. Nvidia says this is officially not supported because of some issues, so this could be one of those.
Ah, maybe that is the case.

> PS Why are you not using 5.0? I don’t recall anything related to sharing has changed, but in such cases I would try the newest version.
Will try that in the hope to narrow it down. These runs were for a project started with 4.6,
and up to now there was no pressing argument for mixing versions.

Thanks!
  Carsten


> 
> Cheers,
> 
> Berk
> 
> On 12/11/2014 05:32 PM, Carsten Kutzner wrote:
>> Hi,
>> 
>> we are seeing a weird problem here with 4.6.7 on GPU nodes.
>> A 146k atom system that already ran happily on a lot of different
>> nodes (with and without GPU) now often crashes on GPU nodes
>> with the error message:
>> 
>> x particles communicated to PME node y are more than 2/3 times the cut-off … dimension x
>> 
>> DD is 8 x 1 x 1 in all cases, mdrun is started with the somewhat unusual
>> (but best performing) options
>> 
>> -ntmpi 8 -ntomp 5 -gpu_id 00001111 -dlb no
>> 
>> on nodes with 2x GTX 780Ti and 40 logical cores. Out of 20 of these runs
>> approx 14 die in the first 100k time steps with a variation of the above
>> error message.
>> 
>> Our solution for now is to run it with
>> 
>> -ntmpi 2 -ntomp 20 -gpu_id 01 -dlb no
>> 
>> (no crashes up to now) however, at a large performance penalty.
>> 
>> Comments on how to debug this further are welcome.
>> 
>> Thanks!
>>   Carsten
>> 
>> 
>> 
>> --
>> Dr. Carsten Kutzner
>> Max Planck Institute for Biophysical Chemistry
>> Theoretical and Computational Biophysics
>> Am Fassberg 11, 37077 Goettingen, Germany
>> Tel. +49-551-2012313, Fax: +49-551-2012302
>> http://www.mpibpc.mpg.de/grubmueller/kutzner
>> http://www.mpibpc.mpg.de/grubmueller/sppexa
>> 
> 
> -- 
> Gromacs Developers mailing list
> 
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before posting!
> 
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> 
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers or send a mail to gmx-developers-request at gromacs.org.


--
Dr. Carsten Kutzner
Max Planck Institute for Biophysical Chemistry
Theoretical and Computational Biophysics
Am Fassberg 11, 37077 Goettingen, Germany
Tel. +49-551-2012313, Fax: +49-551-2012302
http://www.mpibpc.mpg.de/grubmueller/kutzner
http://www.mpibpc.mpg.de/grubmueller/sppexa



More information about the gromacs.org_gmx-developers mailing list