[gmx-developers] Running Gromacs on GPUs on multiple machines

Thu May 29 13:17:28 CEST 2014

2014-05-29 11:17 GMT+02:00 Mark Abraham <mark.j.abraham at gmail.com>:
> That's not going to do what you probably think it would do. See (from mdrun
> -h) "The argument of -gpu_id is a string of digits (without delimiter)
> representing device id-s of the GPUs to be used. For example, ”02” specifies
> using GPUs 0 and 2 in the first and second
> PP ranks per compute node respectively." The "per compute node" is critical
> here. Your -gpu_id requires there will be five PP ranks *per node* to
> address. That your MPI hostfile made this possible is a separate question
> for you to consider. You sound like you want to use
>
> mpirun -np 5 -hostfile ... mdrun_mpi -v -deffnm ... -gpu_id 0
>

I see. My hostfile contains hostnames of 5 machines, that is all.
Eventually I want to run 6 OpenMP processes per MPI process, but
OpenMP part looks easy. So for now I just try to get 5 MPI processes
running, one per machine, each assigned to its own GPU. Running as you
suggest gets me:

Using 5 MPI processes
Using 1 OpenMP thread per MPI process

1 GPU detected on host akston:
  #0: NVIDIA GeForce GTX 660, compute cap.: 3.0, ECC:  no, stat: compatible

1 GPU user-selected for this run.
Mapping of GPU ID to the 5 PP ranks in this node: 0

-------------------------------------------------------
Program gmx mdrun, VERSION 5.1-dev-20140527-eb2cc07
Source code file:
/home/vedranm/workspace/gromacs/src/gromacs/gmxlib/gmx_detect_hardware.cpp,
line: 404

Fatal error:
Incorrect launch configuration: mismatching number of PP MPI processes
and GPUs per node.
mdrun_mpi was started with 5 PP MPI processes per node, but you provided 1 GPU.
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------

Anything more I could try?

>
> We do it this way because it is a much more scaleable way of expressing the
> underlying requirement that a PP rank on a node maps to a GPU on a node,
> when there's more than one GPU per node.
> http://www.gromacs.org/Documentation/Acceleration_and_parallelization#Using_multi-simulations_and_GPUs
> hints at this, but that page doesn't cover your case. I'll add it.
>

Great, thank you.

>
> Actually you did such an assignment, as covered above. We only report the
> detection from the lowest PP rank on the lowest-ranked node, because we
> haven't bothered to serialize the data to check things are sane. Usually
> such a machine is sufficiently homogeneous that this is not an issue.
>

Makes sense.

>
> Works fine, but the output could be more specific. Documenting all cases
> sanely where someone will find it is hard.
>

Undertstood.

Regards,
Vedran