[gmx-users] How to improve the performance of simulation in HPC (finding optimal number of nodes and processors)

Santhosh Kumar Nagarajan santhoshrajan90 at gmail.com
Fri Jun 2 07:49:49 CEST 2017


Dear users,

I am simulating a protein having 285 residues in a Gromacs environment
installed in our University's HPC. Gromacs version: gmx, version 2016.3.

I have tried to run the simulation using the standard
"select=1:ncpus=28:mpiprocs=28" provided by our HPC admin (in pbs script).
The same number of nodes and processors is given by the users of various
other software (like VASP, Mathematica), which run perfectly. But when I
tried to give the same number, the simulation was running too slow
(approximately one ns per day). So I tried to change the number of nodes
and processors, but nothing seems to improve the performance.

For example:
Many times I got fatal errors, saying,

###Using 6 MPI threads
Using 28 OpenMPI threads per tMPI thread

WARNING: Oversubscribing the available 28 logical CPU cores with 168
threads. This will cause considerable performance loss!

Fatal error: Your choice of number of MPI ranks and amount of resources
result in using 28 OpenMP threads per rank, which is most likely
insufficient. The optimum is usually between 1 and 6 threads per rank. If
you want to run with this setup, specify the -ntomp option. But we suggest
to change the number of MPI ranks (option -ntmpi).###

After this, I have added the "-ntomp option", which skipped the warning.
But didn't improve the performance.
Recently I tried "select=1:ncpus=6:mpiprocs=56, which runs the simulation
using 4 MPI threads and 6 OpenMP threads. I think this is a too low
performance for our HPC, as other software runs with better performance.

Below, I am providing the pbs.sh file which I have used to run the
simulation in HPC. Can anyone please help me, what I am doing wrong.

###PBS file used

#i/bin/bash
#PBS -N my_protein_name
#PBS -q work-01
#PBS -l select=1:ncpus=28:mpiprocs=28
#PBS -j oe
#PBS -V
cd $PBS_O_WORKDIR
cat $PBS_NODEFILE>./pbsnodelist
CORES='cat./pbsnodelist|wc -1'
source /opt/software/intel/parallel_studi_xe_2017.2.050/psxevars.sh intel64
gmx mdrun -ntmpi 28 -deffnm md_0_1

###


Specifications of the HPC:

One master node+40 computer nodes
40X2x Intel Xeon E5-2680v4
(28 threads for one E5-2680)
10 TB RAM
40 TB Hard disk

Master
1X2XE5-2650v4
(24 threads for E5-2650)
128 GB RAM
80TB Hard Disk

Thank you

Regards
-- 
Santhosh Kumar Nagarajan
PhD Research Scholar
Department of Genetic Engineering
SRM University
Kattankulathur
Chennai - 603203


More information about the gromacs.org_gmx-users mailing list