[gmx-users] Can only find GPU'S on first node

Thu Feb 12 20:46:39 CET 2015

I'm having difficulty using GPU's across multiple nodes. I'm using OpenMPI to run GROMACS (5.0.4) across multiple nodes. Each node has 20 cpu cores and 2 GPU's. When I try to run GROMACS across multiple nodes (2 in this case) it only detects the cpu cores and GPU's from the first node.

Running GROMACS on 1 node with OpenMPI and utilizing the 2 GPUS works fine. Additionally, running GROMACS on multiple nodes with OpenMPI and setting -nb to cpu also works fine, as GROMACS utilizes all cpu cores in that case. It's just when running with GPU's across multiple nodes where I have the problem.

At the top of my PBS log file I see that two nodes are allocated for it:

PBS has allocated the following nodes:

qb140
qb144

A total of 40 processors on 2 nodes allocated

However GROMACS gives the following warning and error indicating it has only found 20 cpu cores and 2 GPU's:

Using 4 MPI processes
Using 10 OpenMP threads per MPI process

WARNING: Oversubscribing the available 20 logical CPU cores with 40 threads.
         This will cause considerable performance loss!

2 GPUs detected on host qb140:
  #0: NVIDIA Tesla K20Xm, compute cap.: 3.5, ECC: yes, stat: compatible
  #1: NVIDIA Tesla K20Xm, compute cap.: 3.5, ECC: yes, stat: compatible

2 GPUs user-selected for this run.
Mapping of GPUs to the 4 PP ranks in this node: #0, #1

-------------------------------------------------------
Program mdrun, VERSION 5.0.4
Source code file: /home/wes/gromacs-5.0.4/src/gromacs/gmxlib/gmx_detect_hardware.c, line: 359

Fatal error:
Incorrect launch configuration: mismatching number of PP MPI processes and GPUs per node.
mdrun was started with 4 PP MPI processes per node, but you provided 2 GPUs.
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors

Here is my run command I have been using, where I try to indicate to use 2 pp ranks per node (in this case 2 nodes):

mdrun_command=$(which mdrun)
mpirun_command=$(which mpirun)

$mpirun_command -np 4 -x LD_LIBRARY_PATH -v -hostfile $PBS_NODEFILE $mdrun_command -deffnm eqlA$i

I've tried to follow this article in running with GPU's: http://www.gromacs.org/Documentation/Acceleration_and_parallelization#Heterogenous_parallelization.3a_using_GPUs

Again, running with one node works fine:

$mpirun_command -np 2 -x LD_LIBRARY_PATH -v -hostfile $PBS_NODEFILE $mdrun_command -deffnm eqlA$

Running across multiple nodes specifying not to use GPU's also works fine:

$mpirun_command -np 40 -x LD_LIBRARY_PATH -v -hostfile $PBS_NODEFILE $mdrun_command -nb cpu -deffnm eqlA$

Thanks again for any advice or direction you can give on this.

James "Wes" Barnett

Ph.D. Candidate

Chemical and Biomolecular Engineering

Tulane University

Boggs Center for Energy and Biotechnology, Room 341-B