[gmx-users] MPI GPU job failed

Albert mailmd2011 at gmail.com
Thu Aug 11 15:37:27 CEST 2016


Here is what I got for command:
mpirun -np 2 gmx_mpi mdrun -v -s 62.tpr -gpu_id 0

It seems that it still used 1 GPU instead of 2. I don't understand why.....
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Running on 1 node with total 10 cores, 20 logical cores, 2 compatible GPUs
Hardware detected on host cudaB (the node of MPI rank 0):
   CPU info:
     Vendor: GenuineIntel
     Brand:  Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz
     SIMD instructions most likely to fit this hardware: AVX_256
     SIMD instructions selected at GROMACS compile time: AVX_256
   GPU info:
     Number of GPUs detected: 2
     #0: NVIDIA GeForce GTX 780 Ti, compute cap.: 3.5, ECC:  no, stat: 
compatible
     #1: NVIDIA GeForce GTX 780 Ti, compute cap.: 3.5, ECC:  no, stat: 
compatible

Reading file 62.tpr, VERSION 5.1.3 (single precision)
Reading file 62.tpr, VERSION 5.1.3 (single precision)
Using 1 MPI process
Using 20 OpenMP threads

1 GPU user-selected for this run.
Mapping of GPU ID to the 1 PP rank in this node: 0

Using 1 MPI process
Using 20 OpenMP threads

1 GPU user-selected for this run.
Mapping of GPU ID to the 1 PP rank in this node: 0
---------------------------------------------------------------------------------------------------------------------------------------------------------------------



Here is what I got for command:

mpirun -np 2 gmx_mpi mdrun -ntomp 10 -v -s 62.tpr -gpu_id 01


It stilled failed.................

-------------------------------------------------------
Program gmx mdrun, VERSION 5.1.3
Source code file: 
/home/albert/Downloads/gromacs/gromacs-5.1.3/src/gromacs/gmxlib/gmx_detect_hardware.cpp, 
line: 458

Fatal error:
Incorrect launch configuration: mismatching number of PP MPI processes 
and GPUs per node.
gmx_mpi was started with 1 PP MPI process per node, but you provided 2 GPUs.
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------

Halting program gmx mdrun
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
Using 1 MPI process
Using 10 OpenMP threads

2 GPUs user-selected for this run.
Mapping of GPU IDs to the 1 PP rank in this node: 0,1


-------------------------------------------------------



On 08/11/2016 03:33 PM, Justin Lemkul wrote:
> So you're trying to run on two nodes, each of which has one GPU?  I 
> haven't done such a run, but perhaps mpirun -np 2 gmx_mpi mdrun -v -s 
> 62.tpr -gpu_id 0 would do the trick, by finding the first GPU on each 
> node?
>
> -Justin 



More information about the gromacs.org_gmx-users mailing list