[gmx-users] How to improve the performance of simulation in HPC (finding optimal number of nodes and processors)

Fri Jun 2 12:28:03 CEST 2017

Perhaps try something like:

(in your PBS Script)
-l nodes=8:ppn=28 (request for 8 nodes with all cores)
source (your source stuff here)

mpirun -n 56 -npernode 7 gmx_mpi mdrun -ntomp 4 (your run stuff here).

This should give you 7 mpi ranks per node, with each rank using 5 openMP threads, which give reasonable speed. Let me know if that works.

===================
Micholas Dean Smith, PhD.
Post-doctoral Research Associate
University of Tennessee/Oak Ridge National Laboratory
Center for Molecular Biophysics

________________________________________
From: gromacs.org_gmx-users-bounces at maillist.sys.kth.se <gromacs.org_gmx-users-bounces at maillist.sys.kth.se> on behalf of Santhosh Kumar Nagarajan <santhoshrajan90 at gmail.com>
Sent: Friday, June 02, 2017 1:49 AM
To: gromacs.org_gmx-users at maillist.sys.kth.se
Subject: [gmx-users] How to improve the performance of simulation in HPC (finding optimal number of nodes and processors)

Dear users,

I am simulating a protein having 285 residues in a Gromacs environment
installed in our University's HPC. Gromacs version: gmx, version 2016.3.

I have tried to run the simulation using the standard
"select=1:ncpus=28:mpiprocs=28" provided by our HPC admin (in pbs script).
The same number of nodes and processors is given by the users of various
other software (like VASP, Mathematica), which run perfectly. But when I
tried to give the same number, the simulation was running too slow
(approximately one ns per day). So I tried to change the number of nodes
and processors, but nothing seems to improve the performance.

For example:
Many times I got fatal errors, saying,

###Using 6 MPI threads
Using 28 OpenMPI threads per tMPI thread

WARNING: Oversubscribing the available 28 logical CPU cores with 168
threads. This will cause considerable performance loss!

Fatal error: Your choice of number of MPI ranks and amount of resources
result in using 28 OpenMP threads per rank, which is most likely
insufficient. The optimum is usually between 1 and 6 threads per rank. If
you want to run with this setup, specify the -ntomp option. But we suggest
to change the number of MPI ranks (option -ntmpi).###

After this, I have added the "-ntomp option", which skipped the warning.
But didn't improve the performance.
Recently I tried "select=1:ncpus=6:mpiprocs=56, which runs the simulation
using 4 MPI threads and 6 OpenMP threads. I think this is a too low
performance for our HPC, as other software runs with better performance.

Below, I am providing the pbs.sh file which I have used to run the
simulation in HPC. Can anyone please help me, what I am doing wrong.

###PBS file used

#i/bin/bash
#PBS -N my_protein_name
#PBS -q work-01
#PBS -l select=1:ncpus=28:mpiprocs=28
#PBS -j oe
#PBS -V
cd $PBS_O_WORKDIR
cat $PBS_NODEFILE>./pbsnodelist
CORES='cat./pbsnodelist|wc -1'
source /opt/software/intel/parallel_studi_xe_2017.2.050/psxevars.sh intel64
gmx mdrun -ntmpi 28 -deffnm md_0_1

###

Specifications of the HPC:

One master node+40 computer nodes
40X2x Intel Xeon E5-2680v4
(28 threads for one E5-2680)
10 TB RAM
40 TB Hard disk

Master
1X2XE5-2650v4
(24 threads for E5-2650)
128 GB RAM
80TB Hard Disk

Thank you

Regards
--
Santhosh Kumar Nagarajan
PhD Research Scholar
Department of Genetic Engineering
SRM University
Kattankulathur
Chennai - 603203
--
Gromacs Users mailing list

* Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.