[gmx-users] Gromacs GPU got hang
M Teguh Satria
mteguhsat at gmail.com
Thu Oct 1 02:20:57 CEST 2015
Hi Stéphane,
Thanks for your reply.
Actually everything is fine if we run shorter gromacs gpu job. Only when we
run longer gromacs gpu job (requires 20+ hours running) we got this problem.
I recorded nvidia-smi every 10 minutes. From these records, I doubt if
temperature was the cause.
Before drop:
Tue Sep 29 11:59:59 2015
+------------------------------------------------------+
| NVIDIA-SMI 346.46 Driver Version: 346.46 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr.
ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute
M. |
| 0 Tesla K40m Off | 0000:82:00.0 Off |
0 |
| N/A 41C P0 110W / 235W | 139MiB / 11519MiB | 72%
Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU
Memory |
| GPU PID Type Process name Usage
|
| 0 17500 C mdrun_mpi
82MiB |
+-----------------------------------------------------------------------------+
After drop to 0%:
Tue Sep 29 12:09:59 2015
+------------------------------------------------------+
| NVIDIA-SMI 346.46 Driver Version: 346.46 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr.
ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute
M. |
| 0 Tesla K40m Off | 0000:82:00.0 Off |
0 |
| N/A 34C P0 62W / 235W | 139MiB / 11519MiB | 0%
Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU
Memory |
| GPU PID Type Process name Usage
|
| 0 17500 C mdrun_mpi
82MiB |
+-----------------------------------------------------------------------------+
On Wed, Sep 30, 2015 at 5:43 PM, Téletchéa Stéphane <
stephane.teletchea at univ-nantes.fr> wrote:
> Le 29/09/2015 23:40, M Teguh Satria a écrit :
>
>> Any of you experiencing similar problem ? Is there any way to
>> troubleshoot/debug to see the cause ? Because I didn't get any warning or
>> error message.
>>
>
> Hello,
>
> This can be a driver issue (or hardware, think of temperature, dust, ...),
> and happens to me from time to time.
>
> The only solution I found was to reset the GPU (see nvidia-smi options),
> if this is not sufficient you will have to reboot (and use the cold boot:
> turn off the computer for more than 30s, and then boot again).
>
> If this happens too often, you may have a defective card, see your vendor
> in that
> case...
>
> Best,
>
> Stéphane Téletchéa
>
> --
> Assistant Professor, UFIP, UMR 6286 CNRS, Team Protein Design In Silico
> UFR Sciences et Techniques, 2, rue de la Houssinière, Bât. 25, 44322
> Nantes cedex 03, France
> Tél : +33 251 125 636 / Fax : +33 251 125 632
> http://www.ufip.univ-nantes.fr/ - http://www.steletch.org
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>
--
-----------------------------------------------------------------------------------
Regards,
*Teguh* <http://www.linkedin.com/in/mteguhsatria>
More information about the gromacs.org_gmx-users
mailing list