[gmx-users] Seeking advice on running Gromacs on SDSC's new Comet cluster

Wed May 13 19:35:38 CEST 2015

Hi everyone,

I was hoping someone might be able to suggest the best option(s) for
getting the most performance out of Gromacs on the new Comet cluster at the
San Diego Supercomputing Center.

The GPU-enabled nodes have 2 NVIDIA K80's (which will act as 4 GPUs as I
understand it) and 2 sockets with 12 cores each. At least to start with I
will only be using a single node.

I have been advised by SDSC support staff that I should be using ibrun
rather than mpirun, but I am not at all certain what is the ideal
combination of MPI ranks/OpenMP threads.

Right now I am using a run command like the following:

ibrun -np 4 mdrun_mpi -dlb yes -deffnm md_5ns_equil

In the Gromacs log file I see the following:

---------------------------------
Using 4 MPI processes
Using 6 OpenMP threads per MPI process

4 GPUs detected on host comet-30-15.sdsc.edu:
  #0: NVIDIA Tesla K80, compute cap.: 3.7, ECC: yes, stat: compatible
  #1: NVIDIA Tesla K80, compute cap.: 3.7, ECC: yes, stat: compatible
  #2: NVIDIA Tesla K80, compute cap.: 3.7, ECC: yes, stat: compatible
  #3: NVIDIA Tesla K80, compute cap.: 3.7, ECC: yes, stat: compatible

4 GPUs auto-selected for this run.
Mapping of GPUs to the 4 PP ranks in this node: #0, #1, #2, #3
---------------------------------

My test simulation is running, but I am seeing warnings about fairly large
load imbalance, ~15% on average. Could anyone advise a better configuration
for Gromas on a system with hardware like this?

Best Wishes,
J. Nathan Scott