[gmx-users] multiple nodes, GPUs on a Cray system

Tue Jan 19 20:19:24 CET 2016

Micholas,
Thanks for the quick reply.  This solved part of my problem, in that the system does now run, but it seems like the domain decomposition uses only the number of nodes, not the total number of processors (unless it’s being further divided later in a way not recorded in the log file).  As a result, the simulation seems to run much more slowly than expected.  Do I need to manually set the domain decomposition rather than allowing it to happen automatically?
Michael

Smith, Micholas D. smithmd at ornl.gov  <mailto:gromacs.org_gmx-users%40maillist.sys.kth.se?Subject=Re:%20Re%3A%20%5Bgmx-users%5D%20multiple%20nodes%2C%20GPUs%20on%20a%20Cray%20system&In-Reply-To=%3C1453225487318.55448%40ornl.gov%3E>
Tue Jan 19 18:44:54 CET 2016

Try this:

aprun -n (number of nodes) -N 1 (1 mpi-process per node) mdrun_mpi -gpu_id 0 (now your job stuff here)

===================
Micholas Dean Smith, PhD.
Post-doctoral Research Associate
University of Tennessee/Oak Ridge National Laboratory
Center for Molecular Biophysics

________________________________________
From: gromacs.org_gmx-users-bounces at maillist.sys.kth.se <https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users> <gromacs.org_gmx-users-bounces at maillist.sys.kth.se <https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users>> on behalf of Michael Weiner <mdw224 at cornell.edu <https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users>>
Sent: Tuesday, January 19, 2016 12:14 PM
To: gromacs.org_gmx-users at maillist.sys.kth.se <https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users>
Subject: [gmx-users] multiple nodes, GPUs on a Cray system

Hello.  I am trying to run Gromacs on ORNL’s Titan supercomputer (Cray XK7).  Specifically, I wish to run a large system across multiple nodes while employing GPUs with Gromacs 4.6.6.  However, I cannot figure out the appropriate way to call mdrun in order for this to work.  Within a submission script, I have tried many combinations of flags for aprun and mdrun, none of which work.
Most specifically, if I request 50 nodes of 16 cores each and use this command:
aprun -n 800 -N 16 mdrun_mpi
I get the following error: “Incorrect launch configuration: mismatching number of PP MPI processes and GPUs per node.  mdrun_mpi was started with 16 PP MPI processes per node, but only 1 GPU were detected.”
I have also tried with the -n flag equal to the number of nodes, but then it decomposes only to that many domains, rather than one per processor.  The “mdrun_mpi” command seems to be the only way of calling mdrun on the machine.
I would appreciate any help with figuring out the appropriate flags to use for aprun and mdrun.
Thank you.
Michael Weiner