[gmx-users] Running job on GPUs

Fri Jul 11 15:07:20 CEST 2014

On Fri, Jul 11, 2014 at 12:18 PM, Nidhi Katyal
<nidhikatyal1989 at gmail.com> wrote:
> Hello all
>
> I am trying to run my job on 2 nodes by utilizing all available cores. On
> each node of the cluster, we have two GPUs and two sockets with 8 cores
> each.
> Every time I am submitting the job, we find that it is running on one node.
> How to make use of the other node?
>
> Till now, I have used following trial commands as suggested in
> http://www.gromacs.org/Documentation/Acceleration_and_parallelization
>
> 1)  mpirun -n 2 mdrun_mpi -v -deffnm nvt -ntomp 16
>
> output:
>
> Using 2 MPI processes
> Using 16 OpenMP threads per MPI process
>
> WARNING: Oversubscribing the available 16 logical CPU cores with 32 threads.
>          This will cause considerable performance loss!

You are starting two ranks and these get placed on the first node
(presumably because of the job scheduler commands you used). That's
why mdrun warns that you are oversubscribing because the 32 threads
you meant to spread out across two nodes got started on only one.

Note that this will still not work with two GPUs per node, though; as
the docs indicate, you need (at least) as many PP ranks as GPUs.

> 2)  mpirun -n 4 mdrun_mpi -v -deffnm nvt -ntomp 8
>
> output:
>
> Incorrect launch configuration: mismatching number of PP MPI processes and
> GPUs per node.
> mdrun_mpi was started with 4 PP MPI processes per node, but only 2 GPUs
> were detected.
>
> I understand that the above error comes when number of MPI ranks is not a
> multiple of number of GPUs intended to be used. But in my case 4 is a
> multiple of 2.

The automatic PP rank to GPU mapping only works if the number of PP
ranks is equal to the number of GPUs detected. Otherwise, you need to
specify manual mapping. The above command should work if you were
running those four ranks on two compute nodes, but as before, these
get launched on a single node.

> 3) mpirun -n 4 -npernode 2 mdrun_mpi -v -deffnm nvt

"-npernode" is not an mdrun argument.

> The job still runs on 1 node.
>
> How can I run my job on 2 nodes utilizing all cores and GPUs?

I'm afraid this is not an issue with mdrun, but with the way your
ranks get places on compute nodes. Are you using a job scheduler?

Cheers,
--
Szilárd

> Thanks
> Nidhi
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.