[gmx-users] Problems running on multiple nodes

Mon Oct 23 23:08:34 CEST 2017

Hi everyone,

This is my first time running Gromacs using multiple nodes. Currently, I don’t quite understand the output generated by my run. Can you please take a look at the script and output and tell me how to improve?

The HPC I am currently using has 72 nodes; each node has 28 CPUs. 

The script is: 

#!/bin/bash

#SBATCH --job-name=Gromacs78

#SBATCH -o Gromacs_result.out

#SBATCH -n 140 -N 5

#SBATCH --tasks-per-node=28

module purge

module load gromacs-mvapich2-2.2 mvapich2-2.2/gnu-4.8.5

source /opt/gromacs/bin/GMXRC

dm=/home/blustig/perl5/simulation/78

dmdp=${dm}/mdpfiles

vt=rna-protein

dw=${dm}/${vt}

mkdir ${dw}

cd ${dw}

########### produce 100ns mdrun: 1st trajectory

echo "0" > inputall

trj=1

let tm=trj*20

vp=md_npt

gmx trjconv -s md_npt.tpr -f md_npt.xtc -pbc mol -ur compact -o md_npt_trj20ps.gro < inputall

gmx grompp -f ${dmdp}/md.mdp -c md_npt.gro -t md_npt.cpt -p rna-protein.top -n rna-protein.ndx -o md1micros.tpr -maxwarn 1

gmx mdrun -ntmpi 140 -pin on -s md1micros.tpr -o md1micros.trr -e md1micros.edr -g md1micros.log -c md1micros.gro -x md1micros.xtc -cpo md1micros.cpt

The output is: 

Back Off! I just backed up md1micros.log to ./#md1micros.log.14#

NOTE: Error occurred during GPU detection:

      CUDA driver version is insufficient for CUDA runtime version

      Can not use GPU acceleration, will fall back to CPU kernels.

Running on 1 node with total 28 cores, 28 logical cores, 0 compatible GPUs

Hardware detected:

  CPU info:

    Vendor: Intel

    Brand:  Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz

    SIMD instructions most likely to fit this hardware: AVX2_256

    SIMD instructions selected at GROMACS compile time: AVX2_256

  Hardware topology: Basic

Reading file md1micros.tpr, VERSION 2016.3 (single precision)

Changing nstlist from 10 to 25, rlist from 1.4 to 1.435

Will use 120 particle-particle and 20 PME only ranks

This is a guess, check the performance at the end of the log file

Using 140 MPI threads

Using 1 OpenMP thread per tMPI thread

NOTE: Oversubscribing a CPU, will not pin threads.

NOTE: Thread affinity setting failed. This can cause performance degradation.

      If you think your settings are correct, ask on the gmx-users list.

Back Off! I just backed up md1micros.xtc to ./#md1micros.xtc.12#

Back Off! I just backed up md1micros.edr to ./#md1micros.edr.12#

WARNING: This run will generate roughly 12227 Mb of data

starting mdrun 'Protein in water'

500000000 steps, 1000000.0 ps.

step 87500 Turning on dynamic load balancing, because the performance loss due to load imbalance is 2.2 %.

I don’t understand why it is taking quite a long time to run.

Any advice is greatly appreciated.

Thanks,

Thanh Le