[gmx-users] MPI GPU job failed

Justin Lemkul jalemkul at vt.edu
Thu Aug 11 15:39:39 CEST 2016



On 8/11/16 9:37 AM, Albert wrote:
> Here is what I got for command:
> mpirun -np 2 gmx_mpi mdrun -v -s 62.tpr -gpu_id 0
>
> It seems that it still used 1 GPU instead of 2. I don't understand why.....
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Running on 1 node with total 10 cores, 20 logical cores, 2 compatible GPUs

Then this is inconsistent with my first question in the last reply.  You have 
two GPU on a single, physical node.  For this, you should not need an external 
mpirun.

gmx mdrun -ntmpi 2 -v -s 62.tpr -gpu_id 01

-Justin

> Hardware detected on host cudaB (the node of MPI rank 0):
>   CPU info:
>     Vendor: GenuineIntel
>     Brand:  Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz
>     SIMD instructions most likely to fit this hardware: AVX_256
>     SIMD instructions selected at GROMACS compile time: AVX_256
>   GPU info:
>     Number of GPUs detected: 2
>     #0: NVIDIA GeForce GTX 780 Ti, compute cap.: 3.5, ECC:  no, stat: compatible
>     #1: NVIDIA GeForce GTX 780 Ti, compute cap.: 3.5, ECC:  no, stat: compatible
>
> Reading file 62.tpr, VERSION 5.1.3 (single precision)
> Reading file 62.tpr, VERSION 5.1.3 (single precision)
> Using 1 MPI process
> Using 20 OpenMP threads
>
> 1 GPU user-selected for this run.
> Mapping of GPU ID to the 1 PP rank in this node: 0
>
> Using 1 MPI process
> Using 20 OpenMP threads
>
> 1 GPU user-selected for this run.
> Mapping of GPU ID to the 1 PP rank in this node: 0
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>
>
>
> Here is what I got for command:
>
> mpirun -np 2 gmx_mpi mdrun -ntomp 10 -v -s 62.tpr -gpu_id 01
>
>
> It stilled failed.................
>
> -------------------------------------------------------
> Program gmx mdrun, VERSION 5.1.3
> Source code file:
> /home/albert/Downloads/gromacs/gromacs-5.1.3/src/gromacs/gmxlib/gmx_detect_hardware.cpp,
> line: 458
>
> Fatal error:
> Incorrect launch configuration: mismatching number of PP MPI processes and GPUs
> per node.
> gmx_mpi was started with 1 PP MPI process per node, but you provided 2 GPUs.
> For more information and tips for troubleshooting, please check the GROMACS
> website at http://www.gromacs.org/Documentation/Errors
> -------------------------------------------------------
>
> Halting program gmx mdrun
> --------------------------------------------------------------------------
> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
> with errorcode 1.
>
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> --------------------------------------------------------------------------
> Using 1 MPI process
> Using 10 OpenMP threads
>
> 2 GPUs user-selected for this run.
> Mapping of GPU IDs to the 1 PP rank in this node: 0,1
>
>
> -------------------------------------------------------
>
>
>
> On 08/11/2016 03:33 PM, Justin Lemkul wrote:
>> So you're trying to run on two nodes, each of which has one GPU?  I haven't
>> done such a run, but perhaps mpirun -np 2 gmx_mpi mdrun -v -s 62.tpr -gpu_id 0
>> would do the trick, by finding the first GPU on each node?
>>
>> -Justin
>

-- 
==================================================

Justin A. Lemkul, Ph.D.
Ruth L. Kirschstein NRSA Postdoctoral Fellow

Department of Pharmaceutical Sciences
School of Pharmacy
Health Sciences Facility II, Room 629
University of Maryland, Baltimore
20 Penn St.
Baltimore, MD 21201

jalemkul at outerbanks.umaryland.edu | (410) 706-7441
http://mackerell.umaryland.edu/~jalemkul

==================================================


More information about the gromacs.org_gmx-users mailing list