[gmx-users] Improving GPU performance on Bridges HPC cluster
Benjamin Joseph Coscia
Benjamin.Coscia at colorado.edu
Tue Sep 6 20:39:51 CEST 2016
Hello Gromacs users,
Our group has begun running simulations on the XSEDE resource, Bridges, and
we are trying to maximize our performance on the GPU nodes. The nodes are
configured so that there are two Tesla K80 accelerators each consisting of
2 GK210 GPUs. Additionally, there are two CPU's on the node, each with 14
Generally when I run on any node, I've found the best performance occurs
when I assign 7 cores per MPI process. On the GPU nodes, I have been giving
each MPI process one GPU to work with. A representative slurm submission
script (Run.sh) which used one full GPU node (4 GPUs, 28 CPU cores) is
contained in the folder shared through dropbox at the end of this email.
I've turned dynamic load balancing on, although I think it turns on by
default so I didn't see a performance difference there.
The systems for which I have scaling data are both membrane systems. One is
an ordered membrane and the unit cell is heterogeneous. There is vacuum on
the top and bottom of the system. It has ~65000 atoms. The second system is
an amorphous membrane which is homogeneous and contains about 139000 atoms.
The dropbox link contains the results of the scaling studies
(Bridges_GPU_scaling.ods) I've done on a single node with varied numbers of
GPUs (7 CPU cores allocated per GPU) and varied PP:PME loads. Generally, I
did not see any significant performance increase (usually a decrease) from
varying the PP:PME ranks. Also, the scaling from 1 to 4 GPUs does not seem
to be too great.
I've also included select .log files and .out files from slurm along with
input files which can be used to reproduce both systems.
Maybe GROMACS algorithms are doing a good job figuring out the optimal run
conditions given that I am unable to beat performance using the default
settings, but I would think there is a way to get better performance since
I know a lot about the systems.
Please let me know if you have any suggestions on how to further increase
performance. We'd like to implement recommendations into our own systems
and also pass the information on to people who work on Bridges so that they
can put forth some best practices.
Please use this link:
https://www.dropbox.com/s/kmy7d15dijvtr2j/GPU_jobs.tgz?dl=0 to access all
files necessary to reproduce my simulations.
More information about the gromacs.org_gmx-users