[gmx-users] GPU job often stopped

Szilárd Páll szilard.pall at cbr.su.se
Mon Apr 29 15:47:21 CEST 2013


In that case, while it isn't very likely, the issue could be caused by
some implementation detail which aims to avoid performance loss caused
by an issue in the NVIDIA drivers.

Try running with the GMX_CUDA_STREAMSYNC environment variable set.

Btw, were there any other processes using the GPU while mdrun was running?

Cheers,
--
Szilárd


On Mon, Apr 29, 2013 at 3:32 PM, Albert <mailmd2011 at gmail.com> wrote:
> On 04/29/2013 03:31 PM, Szilárd Páll wrote:
>>
>> The segv indicates that mdrun crashed and not that the machine was
>> restarted. The GPU detection output (both on stderr and log) should
>> show whether ECC is "on" (and so does the nvidia-smi tool).
>>
>> Cheers,
>> --
>> Szilárd
>
>
> yes it was on:
>
>
> Reading file heavy.tpr, VERSION 4.6.1 (single precision)
> Using 4 MPI threads
> Using 8 OpenMP threads per tMPI thread
>
> 5 GPUs detected:
>   #0: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible
>   #1: NVIDIA GeForce GTX 650, compute cap.: 3.0, ECC:  no, stat: compatible
>   #2: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible
>   #3: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible
>   #4: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible
>
> 4 GPUs user-selected for this run: #0, #2, #3, #4
>
>
> --
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the www
> interface or send it to gmx-users-request at gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists



More information about the gromacs.org_gmx-users mailing list