[gmx-users] openmpi execution using sbatch & mpirun every command executed several times

Vedat Durmaz durmaz at zib.de
Tue Dec 12 21:33:48 CET 2017


hi everybody,

i'm working on an ubuntu 16.04 system with gromacs-openmpi (5.1.2) installed from the ubuntu repos. everything works fine when i submit job.slurm using sbatch where job.slurm roughly looks like this

-------------------------
#!/bin/bash -l

#SBATCH -N 1
#SBATCH -n 24
#SBATCH -t 02:00:00

cd $4

cp file1 file2
grompp ...
mpirun -np 24 mdrun_mpi ...   ### WORKS FINE
-------------------------

and the mdrun_mpi output contains the following statements:

-------------------------
Using 24 MPI processes
Using 1 OpenMP thread per MPI process

  mdrun_mpi -v -deffnm em

Number of logical cores detected (24) does not match the number reported by OpenMP (12).
Consider setting the launch configuration manually!

Running on 1 node with total 24 cores, 24 logical cores
-------------------------


now, i want to put the last 3 commands of job.slurm into an extra script (run_gmx.sh)

-------------------------
#!/bin/bash

cp file1 file2
grompp ...
mdrun_mpi ...
-------------------------

that i start with mpirun

-------------------------
#!/bin/bash -l

#SBATCH -N 1
#SBATCH -n 24
#SBATCH -t 02:00:00

cd $4
mpirun -np 24 run_gmx.sh
-------------------------

but now i get strange errors because every non-mpi programm (cp, grompp) is executed 24 times which gives a desaster in the file system.


if i change job.slurm to

-------------------------
#!/bin/bash -l

#SBATCH -N 1
#SBATCH --ntasks-per-node=1
#SBATCH -t 02:00:00

cd $4
mpirun run_gmx.sh
-------------------------

i get the following error in the output:

Number of logical cores detected (24) does not match the number reported by OpenMP (1).
Consider setting the launch configuration manually!

Running on 1 node with total 24 cores, 24 logical cores
Using 1 MPI process
Using 24 OpenMP threads

Fatal error:
Your choice of 1 MPI rank and the use of 24 total threads leads to the use of 24 OpenMP threads, whereas we expect the optimum to be with more MPI ranks with 1 to 6 OpenMP threads. If you want to run with this many OpenMP threads, specify the -ntomp option. But we suggest to increase the number of MPI ranks.


and if i use -ntomp
-------------------------
#!/bin/bash

cp file1 file2
grompp ...
mdrun_mpi -ntomp 24 ...
-------------------------

things seem to work fine but mdrun_mpi is extremely slow as if it was running on one core only. and that's the output:

  mdrun_mpi -ntomp 24 -v -deffnm em

Number of logical cores detected (24) does not match the number reported by OpenMP (1).
Consider setting the launch configuration manually!

Running on 1 node with total 24 cores, 24 logical cores

The number of OpenMP threads was set by environment variable OMP_NUM_THREADS to 24 (and the command-line setting agreed with that)
Using 1 MPI process
Using 24 OpenMP threads

NOTE: You requested 24 OpenMP threads, whereas we expect the optimum to be with more MPI ranks with 1 to 6 OpenMP threads.

Non-default thread affinity set probably by the OpenMP library,
disabling internal thread affinity



what am i doing wrong? what a the proper setting for my goal? i need to use an extra script executed with mpirun and i somehow need to reduce 24 executions of serial commands to just one!

any useful hint is appreciated.

take care,
vedat




More information about the gromacs.org_gmx-users mailing list