[gmx-users] Fw: cudaFuncGetAttributes failed: out of memory
bonjour899
bonjour899 at 126.com
Sun Feb 23 07:49:15 CET 2020
I think I've temporarily solved this problem. Only when I use CUDA_VISIABLE_DEVICE to block the memory-almost-fully-occupied GPUs, I can run GROMACS smoothly (using -gpu_id only is useless). I think there may be some bug in GROMACS's GPU usage model in a multi-GPU environment (It seems like as long as one of the GPUs is fully occupied, GROMACS cannot submit to any GPUs and return an error with "cudaFuncGetAttributes failed: out of memory").
Best regards,
W
-------- Forwarding messages --------
From: "bonjour899" <bonjour899 at 126.com>
Date: 2020-02-23 11:32:53
To: gromacs.org_gmx-users at maillist.sys.kth.se
Subject: [gmx-users] cudaFuncGetAttributes failed: out of memory
I also tried to restricting to different GPU using -gpu_id, but still with the same error. I've also posting my question on https://devtalk.nvidia.com/default/topic/1072038/cuda-programming-and-performance/cudafuncgetattributes-failed-out-of-memory/
Following is the output of nvidia-smi:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... On | 00000000:04:00.0 Off | 0 |
| N/A 35C P0 34W / 250W | 16008MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P100-PCIE... On | 00000000:06:00.0 Off | 0 |
| N/A 35C P0 28W / 250W | 10MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla P100-PCIE... On | 00000000:07:00.0 Off | 0 |
| N/A 35C P0 33W / 250W | 16063MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla P100-PCIE... On | 00000000:08:00.0 Off | 0 |
| N/A 36C P0 29W / 250W | 10MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 Quadro P4000 On | 00000000:0B:00.0 Off | N/A |
| 46% 27C P8 8W / 105W | 12MiB / 8119MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 20497 C /usr/bin/python3 5861MiB |
| 0 24503 C /usr/bin/python3 10137MiB |
| 2 23162 C /home/appuser/Miniconda3/bin/python 16049MiB |
+-----------------------------------------------------------------------------+
-------- Forwarding messages --------
From: "bonjour899" <bonjour899 at 126.com>
Date: 2020-02-20 10:30:36
To: "gromacs.org_gmx-users at maillist.sys.kth.se" <gromacs.org_gmx-users at maillist.sys.kth.se>
Subject: cudaFuncGetAttributes failed: out of memory
Hello,
I have encountered a weird problem. I've been using GROMACS with GPU on a server and always performance good. However when I just reran a job today and suddenly got this error:
Command line:
gmx mdrun -deffnm pull -ntmpi 1 -nb gpu -pme gpu -gpu_id 3
Back Off! I just backed up pull.log to ./#pull.log.1#
-------------------------------------------------------
Program: gmx mdrun, version 2019.4
Source file: src/gromacs/gpu_utils/gpu_utils.cu (line 100)
Fatal error:
cudaFuncGetAttributes failed: out of memory
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------
It seems the GPU is 0 occupied and I can run other apps with GPU, but I cannot run GROMACS mdrun anymore, even if doing energy minimization.
--
Gromacs Users mailing list
* Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.
More information about the gromacs.org_gmx-users
mailing list