[gmx-users] slurm, gres:gpu, only 1 GPU out of 4 is detected

Tamas Hegedus tamas at hegelab.org
Wed Nov 13 19:21:30 CET 2019


I had the misconception that I have to set gpuid by CUDA_VISIBLE_DEVICES 
set by slurm.
However, slurm exposes the gpu for gromacs by a different mechanism.

On 11/13/19 4:55 PM, Tamas Hegedus wrote:
> Hi,
>
> I run gmx 2019 using GPU
> There are 4 GPUs in my GPU hosts.
> I have slurm and configured gres=gpu
>
> 1. If I submit a job with --gres=gpu:1 then GPU#0 is identified and 
> used (-gpu_id $CUDA_VISIBLE_DEVICES).
> 2. If I submit a second job, it fails: the $CUDA_VISIBLE_DEVICES is 1 
> and selected, but GPU #0 is identified by gmx as a compatible gpu.
> From the output:
>
> gmx mdrun -v -pin on -deffnm equi_nvt -nt 8 -gpu_id 1 -nb gpu -pme gpu 
> -npme 1 -ntmpi 4
>
>   GPU info:
>     Number of GPUs detected: 1
>     #0: NVIDIA GeForce GTX 1080 Ti, compute cap.: 6.1, ECC:  no, stat: 
> compatible
>
> Fatal error:
> You limited the set of compatible GPUs to a set that included ID #1, 
> but that
> ID is not for a compatible GPU. List only compatible GPUs.
>
> 3. If I login to that node and run the mdrun command written into the 
> output in the previous step then it selects the right gpu and runs as 
> expected.
>
> $CUDA_DEVICE_ORDER is set to PCI_BUS_ID
>
> I can not decide if this is a slurm config error or something with 
> gromacs, as $CUDA_VISIBLE_DEVICES is set correctly by slurm and I 
> expect gromacs to detect all 4GPUs.
>
> Thanks for your help and suggestions,
> Tamas
>

-- 
Tamas Hegedus, PhD
Senior Research Fellow
Department of Biophysics and Radiation Biology
Semmelweis University     | phone: (36) 1-459 1500/60233
Tuzolto utca 37-47        | mailto:tamas at hegelab.org
Budapest, 1094, Hungary   | http://www.hegelab.org



More information about the gromacs.org_gmx-users mailing list