[gmx-developers] mdrun 4.6.7 with GPU sharing between thread-MPI ranks yields crashes
pall.szilard at gmail.com
Thu Dec 11 18:36:34 CET 2014
On Thu, Dec 11, 2014 at 5:38 PM, Berk Hess <hess at kth.se> wrote:
> We are also having some, as of yet, unexplainable issues that only seem to
> show up with GPU sharing. We have done a lot of checking, so a bug in
> Gromacs seems unlikely. Nvidia says this is officially not supported because
> of some issues, so this could be one of those.
What is not supported? To the best of my knowledge everything we do is
officially supported, moreover, even the (ex) Hyper-Q now CUDA
MPS-related Tesla-only features work because GPU contexts are by
definition shared between pthreads (=tMPI ranks).
> PS Why are you not using 5.0? I don't recall anything related to sharing has
> changed, but in such cases I would try the newest version.
> On 12/11/2014 05:32 PM, Carsten Kutzner wrote:
>> we are seeing a weird problem here with 4.6.7 on GPU nodes.
>> A 146k atom system that already ran happily on a lot of different
>> nodes (with and without GPU) now often crashes on GPU nodes
>> with the error message:
>> x particles communicated to PME node y are more than 2/3 times the cut-off
>> … dimension x
>> DD is 8 x 1 x 1 in all cases, mdrun is started with the somewhat unusual
>> (but best performing) options
>> -ntmpi 8 -ntomp 5 -gpu_id 00001111 -dlb no
>> on nodes with 2x GTX 780Ti and 40 logical cores. Out of 20 of these runs
>> approx 14 die in the first 100k time steps with a variation of the above
>> error message.
>> Our solution for now is to run it with
>> -ntmpi 2 -ntomp 20 -gpu_id 01 -dlb no
>> (no crashes up to now) however, at a large performance penalty.
>> Comments on how to debug this further are welcome.
>> Dr. Carsten Kutzner
>> Max Planck Institute for Biophysical Chemistry
>> Theoretical and Computational Biophysics
>> Am Fassberg 11, 37077 Goettingen, Germany
>> Tel. +49-551-2012313, Fax: +49-551-2012302
> Gromacs Developers mailing list
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers or
> send a mail to gmx-developers-request at gromacs.org.
More information about the gromacs.org_gmx-developers