[gmx-users] Hybrid acceleration with "well-tempered parallel tempering metadynamics"

Tue Sep 29 15:39:18 CEST 2015

good afternoon all,
I'm having problems in running a "well-tempered parallel tempering
metadynamics" with gromacs 5.

The problem is that I don't understand how to set -multi based on the cpu I
have (the cluster has both 12 and 16 cpu x node).

If I want to run 10 replicas spread over 160 cores I set:

mpirun -np 160 mdrun_mpi -ntomp 16 -v -s input.tpr -plumed -multi 10
-replex 500 -o wt_20t1 2>gmx_error.err

This should start 10 MPI thread with 16 OpenMP each.
Instead I get 16 MPI with 16 OpenMPI and in the log I read:

"Number of hardware threads detected (16) does not match the number
reported by OpenMP (1).
Consider setting the launch configuration manually!
Using 16 MPI processes
Using 16 OpenMP threads per MPI process
WARNING: Oversubscribing the available 16 logical CPU cores with 256
threads.
         This will cause considerable performance loss!
Non-default thread affinity set probably by the OpenMP library,
disabling internal thread affinity"

In the "multi-level parallelization" section of the documentation (
http://www.gromacs.org/Documentation/Acceleration_and_parallelization#Running_simulations)
I found that I should use "mpirun -np M mdrun_mpi -ntomp N" with M number
of total cores and N number of OpenMP.
Because I have 16 cores x node, setting -ntomp to 16 is correct as 16x1
(OpenMPx MPI) gives 16, which is a multiple of the cores I have per node.

I tried setting export "OMP_NUM_THREADS=16" and then -ntomp
$OMP_NUM_THREADS, but nothing changed.

I also tried to let "gerun" handle all the setting and using OpenMP
mutithreading only (gerun mdrun_mpi -v -s input.tpr -plumed -multi 10
-replex 500 -o wt_20t1 2>gmx_error.err).
In this case after the same messages (Number of hardware threads detected
(16) does not match the number reported by OpenMP (1). Consider setting the
launch configuration manually!) I get 1 MPI and 16 OpenMPI, but in this
case the predicted running time is huge compare to when I have "performance
loss!" (from 9th October as predicted end to 21st of January for a 50ns
with ~190.000 atoms ).

How is it possible? how does it work?

this is the script I'm using (I tried several combination of mdrun_mpi):

#!/bin/bash -l

#$ -S /bin/bash
#$ -l h_rt=0:15:0
#$ -l mem=1G
#$ -l tmpfs=15G
#$ -N ciao
#$ -pe qlc 160
#$ -wd /home/ucbtca4/Scratch/metadynamics/ciao

module load gcc-libs/4.9.2
module load compilers/intel/2015/update2
module load mpi/intel/2015/update3/intel
module load openblas/0.2.14/intel-2015-update2
module load plumed/2.1.2/intel-2015-update2
module load gromacs/5.0.4/plumed/intel-2015-update2

export OMP_NUM_THREADS=16
export MPIRUN="/shared/ucl/apps/intel/2015/impi/5.0.3.048/intel64/bin/mpirun
"

$MPIRUN -np 160 mdrun_mpi -ntomp $OMP_NUM_THREADS -v -s input.tpr -plumed
-multi 10 -replex 500 -o wt_20t1 2>gmx_error.err

#gerun mdrun_mpi -v -s input.tpr -plumed -multi 10 -replex 500 -o wt_20t1
2>gmx_error.err
#gerun mdrun_mpi -ntomp $OMP_NUM_THREADS -v -s input.tpr -plumed -multi 10
-replex 500 -o wt_20t1 2>gmx_error.err
#gerun mdrun_mpi -v -s input.tpr -plumed -multi 10 -replex 500 -o wt_20t1
2>gmx_error.err
#mpirun -np 100 mdrun_mpi -ntomp 10 -v -s input.tpr -plumed -multi 10
-replex 500 -o wt_20t1 2>gmx_error.err
#mpirun -np 160 mdrun_mpi -ntomp 16 -v -s input.tpr -plumed -multi 10
-replex 500 -o wt_20t1 2>gmx_error.err

If I have gpus I set -np to the number of gpus and everything woks like a
charm, but I don't understand how to run it with cpus only.

thank you for your time,

Regards,

Francesco