[gmx-developers] cudaStreamSynchronize failed in cu_blockwait_nb

Shirts, Michael (mrs5pt) mrs5pt at eservices.virginia.edu
Mon Oct 22 17:47:06 CEST 2012


> This was just supposed to be a fast test system; then I must have forgotten to
> switch
> back to PME - triggering the fatal error. We do not use plain cutoff for
> serious things :)

But it is good practice that when things fail, even silly parameter choices,
they fail gracefully, as it does help find OTHER bugs.

Best,
~~~~~~~~~~~~
Michael Shirts
Assistant Professor
Department of Chemical Engineering
University of Virginia
michael.shirts at virginia.edu
(434)-243-1821


> From: Carsten Kutzner <ckutzne at gwdg.de>
> Reply-To: Discussion list for GROMACS development <gmx-developers at gromacs.org>
> Date: Mon, 22 Oct 2012 17:35:03 +0200
> To: Discussion list for GROMACS development <gmx-developers at gromacs.org>
> Subject: Re: [gmx-developers] cudaStreamSynchronize failed in cu_blockwait_nb
> 
> On Oct 22, 2012, at 5:25 PM, Berk Hess <hess at kth.se> wrote:
> 
>> Just curious, why are you running plain cut-off?
> This was just supposed to be a fast test system; then I must have forgotten to
> switch
> back to PME - triggering the fatal error. We do not use plain cutoff for
> serious things :)
> 
> Carsten
> 
>> (I didn't even make CPU kernels for that, the RF kernels is then used)
>> 
>> Cheers,
>> 
>> Berk
>> 
>> On 10/22/2012 05:23 PM, Carsten Kutzner wrote:
>>> Hi Szilárd,
>>> 
>>> thanks a lot for fixing it!
>>> 
>>> Carsten
>>> 
>>> 
>>> On Oct 22, 2012, at 5:20 PM, Szilárd Páll <szilard.pall at cbr.su.se> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> The CUDA plain cut-off kernel's pointer was incorrectly assigned (stupid
>>>> copy-paste bug). Just pushed a bugfix: https://gerrit.gromacs.org/#/c/1553/
>>>> 
>>>> Cheers,
>>>> --
>>>> Szilárd
>>>> 
>>>> 
>>>> On Fri, Oct 19, 2012 at 3:20 PM, Szilárd Páll <szilard.pall at cbr.su.se>
>>>> wrote:
>>>> Hi,
>>>> 
>>>> That sounds like a nasty bug that I have not seen for quite a while. This
>>>> happens generally when some serious memory corruption puts the GPU in a
>>>> "bad state". For the future, you could try to reset the GPU by reloading
>>>> the driver, but if that does not help you will have to reboot.
>>>> 
>>>> I was able to reproduce the bug and in fact on our development machine the
>>>> NVIDIA driver seems to get into a messed up state in which mdrun will hang,
>>>> no matter whether I launch in on the GTX 580 or 680. Reloading the driver
>>>> seems to fix this issue.
>>>> 
>>>> Thanks for the report, I'll looking into this bug and will give you an
>>>> update!
>>>> 
>>>> Cheers,
>>>> --
>>>> Szilárd
>>>> 
>>>> 
>>>> 
>>>> On Fri, Oct 19, 2012 at 12:01 PM, Carsten Kutzner <ckutzne at gwdg.de> wrote:
>>>> Hi,
>>>> 
>>>> we updated to the newest driver, but later I found that this crash is
>>>> caused by
>>>> a .tpr file with Coulomb-type=cutoff instead of PME:
>>>> 
>>>> - I start with a PME .tpr file that runs with the recent 4.6 on both a
>>>> GTX580 and 680,
>>>>   and even using both
>>>> - I change to cutoff setting (no other changes!); this tpr still runs on
>>>> the 580,
>>>>   but on the 680 produces the fatal error:
>>>>   "cudaStreamSynchronize failed in cu_blockwait_nb: unspecified launch
>>>> failure"
>>>>   Moreover, after that any other mdrun using any GPU on that node will read
>>>> in the
>>>>   previously working, PME) .tpr file and then hang. After rebooting, I can
>>>> again
>>>>   run the PME .tpr file.
>>>> 
>>>> Carsten
>>>> 
>>>> 
>>>> 
>>>> On Oct 17, 2012, at 3:10 PM, Szilárd Páll <szilard.pall at cbr.su.se> wrote:
>>>> 
>>>>> HI,
>>>>> 
>>>>> Your driver might be simply too old for a GTX680. You'll need at least a
>>>>> very late 295.xx driver and preferably the 304.54 (or later).
>>>>> 
>>>>> Cheers,
>>>>> --
>>>>> Szilárd
>>>>> 
>>>>> 
>>>>> On Wed, Oct 17, 2012 at 2:10 PM, Carsten Kutzner <ckutzne at gwdg.de> wrote:
>>>>> BTW this executable works on a GTX580, but shows the fatal error
>>>>> on a GTX680 - both mounted in the same workstation.
>>>>> 
>>>>> Carsten
>>>>> 
>>>>> 
>>>>> On Oct 17, 2012, at 12:05 PM, Carsten Kutzner <ckutzne at gwdg.de> wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> what am I doing wrong if I get this error code:
>>>>>> 
>>>>>> -------------------------------------------------------
>>>>>> Program mdrun_threads, VERSION 4.6-dev-20121016-4af4561
>>>>>> Source code file:
>>>>>> /home/ckutzne/installations/git-gromacs-4-6-department/src/mdlib/nbnxn_cu
>>>>>> da/nbnxn_cuda.cu, line: 558
>>>>>> 
>>>>>> Fatal error:
>>>>>> cudaStreamSynchronize failed in cu_blockwait_nb: unspecified launch
>>>>>> failure
>>>>>> 
>>>>>> For more information and tips for troubleshooting, please check the
>>>>>> GROMACS
>>>>>> website at http://www.gromacs.org/Documentation/Errors
>>>>>> -------------------------------------------------------
>>>>>> 
>>>>>> Thanks,
>>>>>> Carsten
>>>>>> --
>>>>>> gmx-developers mailing list
>>>>>> gmx-developers at gromacs.org
>>>>>> http://lists.gromacs.org/mailman/listinfo/gmx-developers
>>>>>> Please don't post (un)subscribe requests to the list. Use the www
>>>>>> interface or send it to gmx-developers-request at gromacs.org.
>>>>> 
>>>>> --
>>>>> Dr. Carsten Kutzner
>>>>> Max Planck Institute for Biophysical Chemistry
>>>>> Theoretical and Computational Biophysics
>>>>> Am Fassberg 11, 37077 Goettingen, Germany
>>>>> Tel. +49-551-2012313, Fax: +49-551-2012302
>>>>> http://www.mpibpc.mpg.de/grubmueller/kutzner
>>>>> 
>>>>> --
>>>>> gmx-developers mailing list
>>>>> gmx-developers at gromacs.org
>>>>> http://lists.gromacs.org/mailman/listinfo/gmx-developers
>>>>> Please don't post (un)subscribe requests to the list. Use the
>>>>> www interface or send it to gmx-developers-request at gromacs.org.
>>>>> 
>>>>> --
>>>>> gmx-developers mailing list
>>>>> gmx-developers at gromacs.org
>>>>> http://lists.gromacs.org/mailman/listinfo/gmx-developers
>>>>> Please don't post (un)subscribe requests to the list. Use the
>>>>> www interface or send it to gmx-developers-request at gromacs.org.
>>>> 
>>>> --
>>>> Dr. Carsten Kutzner
>>>> Max Planck Institute for Biophysical Chemistry
>>>> Theoretical and Computational Biophysics
>>>> Am Fassberg 11, 37077 Goettingen, Germany
>>>> Tel. +49-551-2012313, Fax: +49-551-2012302
>>>> http://www.mpibpc.mpg.de/grubmueller/kutzner
>>>> 
>>>> --
>>>> gmx-developers mailing list
>>>> gmx-developers at gromacs.org
>>>> http://lists.gromacs.org/mailman/listinfo/gmx-developers
>>>> Please don't post (un)subscribe requests to the list. Use the
>>>> www interface or send it to gmx-developers-request at gromacs.org.
>>>> 
>>>> 
>>>> -- 
>>>> gmx-developers mailing list
>>>> gmx-developers at gromacs.org
>>>> http://lists.gromacs.org/mailman/listinfo/gmx-developers
>>>> Please don't post (un)subscribe requests to the list. Use the
>>>> www interface or send it to gmx-developers-request at gromacs.org.
>>> 
>>> --
>>> Dr. Carsten Kutzner
>>> Max Planck Institute for Biophysical Chemistry
>>> Theoretical and Computational Biophysics
>>> Am Fassberg 11, 37077 Goettingen, Germany
>>> Tel. +49-551-2012313, Fax: +49-551-2012302
>>> http://www.mpibpc.mpg.de/grubmueller/kutzner
>>> 
>> 
>> -- 
>> gmx-developers mailing list
>> gmx-developers at gromacs.org
>> http://lists.gromacs.org/mailman/listinfo/gmx-developers
>> Please don't post (un)subscribe requests to the list. Use the www interface
>> or send it to gmx-developers-request at gromacs.org.
> 
> 
> --
> Dr. Carsten Kutzner
> Max Planck Institute for Biophysical Chemistry
> Theoretical and Computational Biophysics
> Am Fassberg 11, 37077 Goettingen, Germany
> Tel. +49-551-2012313, Fax: +49-551-2012302
> http://www.mpibpc.mpg.de/grubmueller/kutzner
> 
> -- 
> gmx-developers mailing list
> gmx-developers at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-developers
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-developers-request at gromacs.org.




More information about the gromacs.org_gmx-developers mailing list