[gmx-users] Fw: cudaFuncGetAttributes failed: out of memory

bonjour899 bonjour899 at 126.com
Sun Feb 23 07:49:15 CET 2020


I think I've temporarily solved this problem. Only when I use CUDA_VISIABLE_DEVICE to block the memory-almost-fully-occupied GPUs, I can run GROMACS smoothly (using -gpu_id only is useless). I think there may be some bug in GROMACS's GPU usage model in a multi-GPU environment (It seems like as long as one of the GPUs is fully occupied, GROMACS cannot submit to any GPUs and return an error with "cudaFuncGetAttributes failed: out of memory").



Best regards,
W




-------- Forwarding messages --------
From: "bonjour899" <bonjour899 at 126.com>
Date: 2020-02-23 11:32:53
To:  gromacs.org_gmx-users at maillist.sys.kth.se
Subject: [gmx-users] cudaFuncGetAttributes failed: out of memory
I also tried to restricting to different GPU using -gpu_id, but still with the same error. I've also posting my question on https://devtalk.nvidia.com/default/topic/1072038/cuda-programming-and-performance/cudafuncgetattributes-failed-out-of-memory/ 
Following is the output of nvidia-smi:

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |

|-------------------------------+----------------------+----------------------+

| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |

| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |

|===============================+======================+======================|

| 0 Tesla P100-PCIE... On | 00000000:04:00.0 Off | 0 |

| N/A 35C P0 34W / 250W | 16008MiB / 16280MiB | 0% Default |

+-------------------------------+----------------------+----------------------+

| 1 Tesla P100-PCIE... On | 00000000:06:00.0 Off | 0 |

| N/A 35C P0 28W / 250W | 10MiB / 16280MiB | 0% Default |

+-------------------------------+----------------------+----------------------+

| 2 Tesla P100-PCIE... On | 00000000:07:00.0 Off | 0 |

| N/A 35C P0 33W / 250W | 16063MiB / 16280MiB | 0% Default |

+-------------------------------+----------------------+----------------------+

| 3 Tesla P100-PCIE... On | 00000000:08:00.0 Off | 0 |

| N/A 36C P0 29W / 250W | 10MiB / 16280MiB | 0% Default |

+-------------------------------+----------------------+----------------------+

| 4 Quadro P4000 On | 00000000:0B:00.0 Off | N/A |

| 46% 27C P8 8W / 105W | 12MiB / 8119MiB | 0% Default |

+-------------------------------+----------------------+----------------------+

 

+-----------------------------------------------------------------------------+

| Processes: GPU Memory |

| GPU PID Type Process name Usage |

|=============================================================================|

| 0 20497 C /usr/bin/python3 5861MiB |

| 0 24503 C /usr/bin/python3 10137MiB |

| 2 23162 C /home/appuser/Miniconda3/bin/python 16049MiB |

+-----------------------------------------------------------------------------+







-------- Forwarding messages --------
From: "bonjour899" <bonjour899 at 126.com>
Date: 2020-02-20 10:30:36
To: "gromacs.org_gmx-users at maillist.sys.kth.se" <gromacs.org_gmx-users at maillist.sys.kth.se>
Subject: cudaFuncGetAttributes failed: out of memory

Hello,


I have encountered a weird problem. I've been using GROMACS with GPU on a server and always performance good. However when I just reran a job today and suddenly got this error:



Command line:

gmx mdrun -deffnm pull -ntmpi 1 -nb gpu -pme gpu -gpu_id 3 

Back Off! I just backed up pull.log to ./#pull.log.1#

-------------------------------------------------------

Program: gmx mdrun, version 2019.4

Source file: src/gromacs/gpu_utils/gpu_utils.cu (line 100)

 

Fatal error:

cudaFuncGetAttributes failed: out of memory

 

For more information and tips for troubleshooting, please check the GROMACS

website at http://www.gromacs.org/Documentation/Errors

-------------------------------------------------------




It seems the GPU is 0 occupied and I can run other apps with GPU, but I cannot run GROMACS mdrun anymore, even if doing energy minimization.











 
-- 
Gromacs Users mailing list

* Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.


More information about the gromacs.org_gmx-users mailing list