[gmx-users] GROMACS 4.6.7 not running on more than 16 MPI threads

Thu Feb 26 22:12:20 CET 2015

Dear Users

I am running GROMACS 4.6.7 on my university cluster. Its salient
specifications are :-

http://hpcgroup.public.iastate.edu/HPC/CyEnce/description.html

*I compiled GROMACS 4.6.7 as follows :-*

work/gb_lab/agosai/GROMACS/cmake-2.8.11/bin/cmake .. -DGMX_GPU=OFF
-DGMX_MPI=ON -DGMX_OPENMP=ON -DGMX_THREAD_MPI=OFF -DGMX_OPENMM=OFF
-DCMAKE_C_COMPILER=icc -DCMAKE_CXX_COMPILER=icpc -DGMX_BUILD_OWN_FFTW=ON
-DCMAKE_INSTALL_PREFIX=/work/gb_lab/agosai/gmx467ag -DGMX_DOUBLE=OFF

*My mdrun command in a PBS script is as follows :-*

mpirun -np 16 -f $PBS_NODEFILE mdrun_mpi -s ex.tpr deffnm -v , with
lnodes=1 and ppn = 16.

*This is part of a standard 'log file' of a mdrun command running on 1 node
and 16 processes :-*

Log file opened on Mon Feb 23 15:29:39 2015
Host: node021  pid: 13159  nodeid: 0  nnodes:  16
Gromacs version:    VERSION 4.6.7
Precision:          single
Memory model:       64 bit
MPI library:        MPI
OpenMP support:     enabled
GPU support:        disabled
invsqrt routine:    gmx_software_invsqrt(x)
CPU acceleration:   AVX_256
FFT library:        fftw-3.3.2-sse2
Large file support: enabled
RDTSCP usage:       enabled
Built on:           Fri Nov 21 12:55:48 CST 2014
Built by:           agosai at share [CMAKE]
Build OS/arch:      Linux 2.6.32-279.19.1.el6.x86_64 x86_64
Build CPU vendor:   GenuineIntel
Build CPU brand:    Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
Build CPU family:   6   Model: 45   Stepping: 7
Build CPU features: aes apic avx clfsh cmov cx8 cx16 htt lahf_lm mmx msr
nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdtscp sse2 sse3 sse4.1
sse4.2 ssse3 tdt x2apic
C compiler:         /shared/intel/impi/4.1.0.024/intel64/bin/mpiicc Intel
icc (ICC) 13.0.1 20121010
C compiler flags:   -mavx    -std=gnu99 -Wall   -ip -funroll-all-loops  -O3
-DNDEBUG

............................................................................................................................................
Using 16 MPI processes

Detecting CPU-specific acceleration.
Present hardware specification:
Vendor: GenuineIntel
Brand:  Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
Family:  6  Model: 45  Stepping:  7
Features: aes apic avx clfsh cmov cx8 cx16 htt lahf_lm mmx msr nonstop_tsc
pcid pclmuldq pdcm pdpe1gb popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2 ssse3
tdt x2apic
Acceleration most likely to fit this hardware: AVX_256
Acceleration selected at GROMACS compile time: AVX_256

*This is found in the standard PBS error file :*
...................................
Back Off! I just backed up smdelec1.log to ./#smdelec1.log.1#

Number of CPUs detected (16) does not match the number reported by OpenMP
(1).
Consider setting the launch configuration manually!
Reading file smdelec1.tpr, VERSION 4.6.7 (double precision)
Using 16 MPI processes

Non-default thread affinity set probably by the OpenMP library,
disabling internal thread affinity

........................
*The program runs successfully and speed is around 7 ns / day for my
particular biomolecule.*

However , the mdrun command *fails t*o run when I use *more than 1 node and
keep ppn = 16*. I observed that it can run on 2 nodes with 4 processes or
on 2 nodes with 8 processes. Similarly it can run on 4 nodes with 4
processes. That is np = 16 is the limit for the command in my case.

*For lnodes = 3 and ppn =3, I have a message like this :-*

Number of CPUs detected (16) does not match the number reported by OpenMP
(1).
Consider setting the launch configuration manually!
Reading file pull1.tpr, VERSION 4.6.7 (double precision)
Using 9 MPI processes
..............................................................................................
=>> PBS: job killed: walltime 50 exceeded limit 30. I killed the job.
*For lnodes = 4 and ppn = 2, I get this :-*

Number of CPUs detected (16) does not match the number reported by OpenMP
(2).
Consider setting the launch configuration manually!
Reading file pull1.tpr, VERSION 4.6.7 (double precision)
Using 8 MPI processes

........................................................................................................
=>> PBS: job killed: walltime 50 exceeded limit 30 . I killed the job.

In the above test cases my walltime was 00:30:00 , arbitrarily chosen so as
to see if they run or not.

*If I use , say, lnode = 2 , ppn = 16 and np = 32 , the program runs but no
output is generated. If I cancel it then this error comes :-*

[mpiexec at node094] HYD_pmcd_pmiserv_send_signal (./pm/pmiserv/pmiserv_cb.c:221):
assert (!closed) failed
[mpiexec at node094] ui_cmd_cb (./pm/pmiserv/pmiserv_pmci.c:128): unable to
send SIGUSR1 downstream
[mpiexec at node094] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77):
callback returned error status
[mpiexec at node094] HYD_pmci_wait_for_completion
(./pm/pmiserv/pmiserv_pmci.c:388):
error waiting for event
[mpiexec at node094] main (./ui/mpich/mpiexec.c:718): process manager error
waiting for completion

Can anyone please help with this???? I am waiting for a reply in this forum
after which I will take it up with the cluster admins.

Thanks & Regards
Agnivo Gosai
Grad Student, Iowa State University.