[gmx-users] performance issues running gromacs with more than 1 gpu card in slurm

Tue Jul 23 01:34:30 CEST 2019

No one can give me an idea of what can be happening? Or how I can solve it?
Best regards,
Carlos

——————
Carlos Navarro Retamal
Bioinformatic Engineering. PhD.
Postdoctoral Researcher in Center of Bioinformatics and Molecular
Simulations
Universidad de Talca
Av. Lircay S/N, Talca, Chile
E: carlos.navarro87 at gmail.com or cnavarro at utalca.cl

On July 19, 2019 at 2:20:41 PM, Carlos Navarro (carlos.navarro87 at gmail.com)
wrote:

Dear gmx-users,
I’m currently working in a server where each node posses 40 physical cores
(40 threads) and 4 Nvidia-V100.
When I launch a single job (1 simulation using a single gpu card) I get a
performance of about ~35ns/day in a system of about 300k atoms. Looking
into the usage of the video card during the simulation I notice that the
card is being used about and ~80%.
The problems arise when I increase the number of jobs running at the same
time. If for instance 2 jobs are running at the same time, the performance
drops to ~25ns/day each and the usage of the video cards also drops during
the simulation to about a ~30-40% (and sometimes dropping to less than 5%).
Clearly there is a communication problem between the gpu cards and the cpu
during the simulations, but I don’t know how to solve this.
Here is the script I use to run the simulations:

#!/bin/bash -x
#SBATCH --job-name=testAtTPC1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=20
#SBATCH --account=hdd22
#SBATCH --nodes=1
#SBATCH --mem=0
#SBATCH --output=sout.%j
#SBATCH --error=s4err.%j
#SBATCH --time=00:10:00
#SBATCH --partition=develgpus
#SBATCH --gres=gpu:4

module use /gpfs/software/juwels/otherstages
module load Stages/2018b
module load Intel/2019.0.117-GCC-7.3.0
module load IntelMPI/2019.0.117
module load GROMACS/2018.3

WORKDIR1=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/1
WORKDIR2=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/2
WORKDIR3=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/3
WORKDIR4=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/4

DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 "
EXE=" gmx mdrun "

cd $WORKDIR1
$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 0
-ntomp 20 &>log &
cd $WORKDIR2
$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 10
-ntomp 20 &>log &
cd $WORKDIR3
$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20  -nmpi 1 -pin on -pinoffset 20
-ntomp 20 &>log &
cd $WORKDIR4
$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 30
-ntomp 20 &>log &

Regarding to pinoffset, I first tried using 20 cores for each job but then
also tried with 8 cores (so pinoffset 0 for job 1, pinoffset 4 for job 2,
pinoffset 8 for job 3 and pinoffset 12 for job) but at the end the problem
persist.

Currently in this machine I’m not able to use more than 1 gpu per job, so
this is my only choice to use properly the whole node.
If you need more information please just let me know.
Best regards.
Carlos

——————
Carlos Navarro Retamal
Bioinformatic Engineering. PhD.
Postdoctoral Researcher in Center of Bioinformatics and Molecular
Simulations
Universidad de Talca
Av. Lircay S/N, Talca, Chile
E: carlos.navarro87 at gmail.com or cnavarro at utalca.cl