[gmx-users] Can only find GPU'S on first node
Barnett, James W
jbarnet4 at tulane.edu
Thu Feb 12 20:46:39 CET 2015
I'm having difficulty using GPU's across multiple nodes. I'm using OpenMPI to run GROMACS (5.0.4) across multiple nodes. Each node has 20 cpu cores and 2 GPU's. When I try to run GROMACS across multiple nodes (2 in this case) it only detects the cpu cores and GPU's from the first node.
Running GROMACS on 1 node with OpenMPI and utilizing the 2 GPUS works fine. Additionally, running GROMACS on multiple nodes with OpenMPI and setting -nb to cpu also works fine, as GROMACS utilizes all cpu cores in that case. It's just when running with GPU's across multiple nodes where I have the problem.
At the top of my PBS log file I see that two nodes are allocated for it:
PBS has allocated the following nodes:
A total of 40 processors on 2 nodes allocated
However GROMACS gives the following warning and error indicating it has only found 20 cpu cores and 2 GPU's:
Using 4 MPI processes
Using 10 OpenMP threads per MPI process
WARNING: Oversubscribing the available 20 logical CPU cores with 40 threads.
This will cause considerable performance loss!
2 GPUs detected on host qb140:
#0: NVIDIA Tesla K20Xm, compute cap.: 3.5, ECC: yes, stat: compatible
#1: NVIDIA Tesla K20Xm, compute cap.: 3.5, ECC: yes, stat: compatible
2 GPUs user-selected for this run.
Mapping of GPUs to the 4 PP ranks in this node: #0, #1
Program mdrun, VERSION 5.0.4
Source code file: /home/wes/gromacs-5.0.4/src/gromacs/gmxlib/gmx_detect_hardware.c, line: 359
Incorrect launch configuration: mismatching number of PP MPI processes and GPUs per node.
mdrun was started with 4 PP MPI processes per node, but you provided 2 GPUs.
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
Here is my run command I have been using, where I try to indicate to use 2 pp ranks per node (in this case 2 nodes):
$mpirun_command -np 4 -x LD_LIBRARY_PATH -v -hostfile $PBS_NODEFILE $mdrun_command -deffnm eqlA$i
I've tried to follow this article in running with GPU's: http://www.gromacs.org/Documentation/Acceleration_and_parallelization#Heterogenous_parallelization.3a_using_GPUs
Again, running with one node works fine:
$mpirun_command -np 2 -x LD_LIBRARY_PATH -v -hostfile $PBS_NODEFILE $mdrun_command -deffnm eqlA$
Running across multiple nodes specifying not to use GPU's also works fine:
$mpirun_command -np 40 -x LD_LIBRARY_PATH -v -hostfile $PBS_NODEFILE $mdrun_command -nb cpu -deffnm eqlA$
Thanks again for any advice or direction you can give on this.
James "Wes" Barnett
Chemical and Biomolecular Engineering
Boggs Center for Energy and Biotechnology, Room 341-B
More information about the gromacs.org_gmx-users