[gmx-users] multiple nodes, GPUs on a Cray system

Smith, Micholas D. smithmd at ornl.gov
Tue Jan 19 21:17:11 CET 2016


Michael,

One thing to check when using such a big machine anyway is to do some more scaling tests (i.e. run short jobs, 10min or so, with 1 nodes, 2 nodes, 4, nodes, 8 nodes, 16 nodes, 32 nodes, etc..) and make sure the performance of your simulation is scaling as well as you think it does. Sometimes using a smaller number of nodes, may be helpful to overcome slow-downs due to inter-node communication, also, oddly enough, check how the scaling does with and without the gpus (use the flag -nb cpu to avoid using the gpu). Sometimes not using the gpus may enhance your performance scaling (its odd when this happens, but I have seen it from time to time).

-Micholas

===================
Micholas Dean Smith, PhD.
Post-doctoral Research Associate
University of Tennessee/Oak Ridge National Laboratory
Center for Molecular Biophysics

________________________________________
From: gromacs.org_gmx-users-bounces at maillist.sys.kth.se <gromacs.org_gmx-users-bounces at maillist.sys.kth.se> on behalf of Szilárd Páll <pall.szilard at gmail.com>
Sent: Tuesday, January 19, 2016 2:38 PM
To: Discussion list for GROMACS users
Cc: Discussion list for GROMACS users
Subject: Re: [gmx-users] multiple nodes, GPUs on a Cray system

On Tue, Jan 19, 2016 at 8:19 PM, Michael Weiner <mdw224 at cornell.edu> wrote:

> Micholas,
> Thanks for the quick reply.  This solved part of my problem, in that the
> system does now run, but it seems like the domain decomposition uses only
> the number of nodes, not the total number of processors (unless it’s being
> further divided later in a way not recorded in the log file).


Multi-threaded (OpenMP) parallelization makes sure that mdrun can use all
cores in a node. Unless you or the job launcher told mdrun to use a certain
number of threads, mdrun will divide all available cores among the MPI
ranks placed onto a node. So if you start one MPI rank per node
(as Micholas suggested), mdrun should be using 8 threads/rank and the log
should clearly state this.


> As a result, the simulation seems to run much more slowly than expected.


Slow simulation could be caused by other factors too, but without seeing a
log file, it's hard to tell what the reason is.


>   Do I need to manually set the domain decomposition rather than allowing
> it to happen automatically?
>

No. You should be setting the launch configuration manually, the domain
decomposition will be set based on that.


> Michael
>
>
> Smith, Micholas D. smithmd at ornl.gov  <mailto:
> gromacs.org_gmx-users%40maillist.sys.kth.se
> ?Subject=Re:%20Re%3A%20%5Bgmx-users%5D%20multiple%20nodes%2C%20GPUs%20on%20a%20Cray%20system&In-Reply-To=%3C1453225487318.55448%
> 40ornl.gov%3E>
> Tue Jan 19 18:44:54 CET 2016
>
> Try this:
>
> aprun -n (number of nodes) -N 1 (1 mpi-process per node) mdrun_mpi -gpu_id
> 0 (now your job stuff here)
>
>
>
> ===================
> Micholas Dean Smith, PhD.
> Post-doctoral Research Associate
> University of Tennessee/Oak Ridge National Laboratory
> Center for Molecular Biophysics
>
> ________________________________________
> From: gromacs.org_gmx-users-bounces at maillist.sys.kth.se <
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users>
> <gromacs.org_gmx-users-bounces at maillist.sys.kth.se <
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users>> on
> behalf of Michael Weiner <mdw224 at cornell.edu <
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users>>
> Sent: Tuesday, January 19, 2016 12:14 PM
> To: gromacs.org_gmx-users at maillist.sys.kth.se <
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users>
> Subject: [gmx-users] multiple nodes, GPUs on a Cray system
>
> Hello.  I am trying to run Gromacs on ORNL’s Titan supercomputer (Cray
> XK7).  Specifically, I wish to run a large system across multiple nodes
> while employing GPUs with Gromacs 4.6.6.  However, I cannot figure out the
> appropriate way to call mdrun in order for this to work.  Within a
> submission script, I have tried many combinations of flags for aprun and
> mdrun, none of which work.
> Most specifically, if I request 50 nodes of 16 cores each and use this
> command:
> aprun -n 800 -N 16 mdrun_mpi
> I get the following error: “Incorrect launch configuration: mismatching
> number of PP MPI processes and GPUs per node.  mdrun_mpi was started with
> 16 PP MPI processes per node, but only 1 GPU were detected.”
> I have also tried with the -n flag equal to the number of nodes, but then
> it decomposes only to that many domains, rather than one per processor.
> The “mdrun_mpi” command seems to be the only way of calling mdrun on the
> machine.
> I would appreciate any help with figuring out the appropriate flags to use
> for aprun and mdrun.
> Thank you.
> Michael Weiner
>
>
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>
--
Gromacs Users mailing list

* Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.


More information about the gromacs.org_gmx-users mailing list