[gmx-users] Running on multiple GPUs

Hollingsworth, Bobby louishollingsworth at g.harvard.edu
Fri Apr 20 20:53:35 CEST 2018


Hi,

Your choice of MPI ranks with -np $SLURM_NPROCS is giving each processor
that you requested its own MPI rank (-np) with 1 OpenMP (-ntomp) thread
each, which is inefficient. It would be best to tune manually using a
series of manual launch configurations, as you can often achieve many fold
better performance using the best launch configuration. With 32 processors
and 2 GPUs, there are many different permutations of launch configurations,
but you'll probably get optimal performance with 4-8 ranks.

Consider the following options for benchmarking:

mpirun -np 4  gmx_mpi mdrun -ntomp 8 -pme gpu -npme 1 -nb gpu -gputasks
0011 -deffnm  -nsteps 50000 -resetstep 25000

This will launch 4 ranks, 1 of which is a PME rank on GPU 1. A variation of
this launch configuration gets me ~3X performance compared to PME on the
CPU.

Others to consider:
mpirun -np 8  gmx_mpi mdrun -ntomp 4 -pme gpu -npme 1 -nb gpu -gputasks
00001111 -deffnm  -nsteps 50000 -resetstep 25000

mpirun -np 4  gmx_mpi mdrun -ntomp 8 -pme cpu -nb gpu -gputasks 0011
-deffnm  -nsteps 50000 -resetstep 25000

Best,
Bobby

On Fri, Apr 20, 2018 at 2:11 PM, Searle Duay <searle.duay at uconn.edu> wrote:

> Hello,
>
> I am trying to run a simulation using Gromacs 2018 on 2 GPUs of PSC
> Bridges. I submitted the following SLURM bash script:
>
> #!/bin/bash
>
>
> #SBATCH
> -J p100_1n_2g
> #SBATCH -o %j.out
> #SBATCH -N 1
>    #SBATCH
> -n 32
> #SBATCH --ntasks-per-node=32
> #SBATCH -p GPU
> #SBATCH --gres=gpu:p100:2
> #SBATCH -t 01:00:00
> #SBATCH --mail-type=BEGIN,END,FAIL
> #SBATCH --mail-user=searle.duay at uconn.edu
>
> set echo
> set -x
>
> source /opt/packages/gromacs-GPU-2018/bin/GMXRC
> module load intel/18.0.0.128 gcc/4.8.4 cuda/9.0 icc/16.0.3 mpi/intel_mpi
>
> echo SLURM_NPROCS= $SLURM_NPROCS
>
> cd $SCRATCH/prot_umbrella/gromacs/conv
>
> gmx_mpi mdrun -deffnm umbrella0 -pf pullf-umbrella0.xvg -px
> pullx-umbrella0.xvg -v
>
> exit
>
> It was running but I noticed that it only uses one GPU on a node that has 2
> GPUs. I tried changing the command to:
>
> mpirun -np $SLURM_NPROCS  gmx_mpi mdrun -v -deffnm umbrella0 ...
>
> But it says that:
>
> Fatal error:
> Your choice of number of MPI ranks and amount of resources results in using
> 1
> OpenMP threads per rank, which is most likely inefficient. The optimum is
> usually between 2 and 6 threads per rank. If you want to run with this
> setup,
> specify the -ntomp option. But we suggest to change the number of MPI
> ranks.
>
> I am wondering for the right command to use the 2 GPUs available on one
> node that is available, or if GROMACS decides automatically for the number
> of GPUs that it will use.
>
> Thank you!
>
> --
> Searle Aichelle S. Duay
> Ph.D. Student
> Chemistry Department, University of Connecticut
> searle.duay at uconn.edu
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/
> Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>



-- 
Louis "Bobby" Hollingsworth
Ph.D. Student, Biological and Biomedical Sciences, Harvard University
B.S. Chemical Engineering, B.S. Biochemistry, B.A. Chemistry, Virginia Tech
Honors College '17
<http://www.linkedin.com/pub/louis-hollingsworth/77/aaa/a47>


More information about the gromacs.org_gmx-users mailing list