[gmx-users] Gromacs 2016.3 orte error while running on cluster

Rainer Rutka rainer.rutka at uni-konstanz.de
Wed Jan 17 13:04:25 CET 2018


HI!
Just a question.

We try to start a MPIed job with Gromacs 2016.3 on
our cluster-system here in Germany.

Unfortunately we get this error:

An ORTE daemon has unexpectedly failed after lunch...

See more in the attached gromacs-run-error.txt file.
Our submit-script is attached, too: gromacs-run-pbs.txt

THANKS IN ADVANCE!

-- 
Rainer Rutka
University of Konstanz
Communication, Information, Media Centre (KIM)
  * High-Performance-Computing (HPC)
  * KIM-Support and -Base-Services
Room: V511
78457 Konstanz, Germany
+49 7531 88-5413
-------------- next part --------------
****************************************************************************
* hwloc 1.11.2 has encountered an incorrect PCI locality information.
* PCI bus 0000:80 is supposedly close to 2nd NUMA node of 1st package,
* however hwloc believes this is impossible on this architecture.
* Therefore the PCI bus will be moved to 1st NUMA node of 2nd package.
*
* If you feel this fixup is wrong, disable it by setting in your environment
* HWLOC_PCI_0000_80_LOCALCPUS= (empty value), and report the problem
* to the hwloc's user mailing list together with the XML output of lstopo.
*
* You may silence this message by setting HWLOC_HIDE_ERRORS=1 in your environment.
****************************************************************************
--------------------------------------------------------------------------
An ORTE daemon has unexpectedly failed after launch and before
communicating back to mpirun. This could be caused by a number
of factors, including an inability to create a connection back
to mpirun due to a lack of common network interfaces and/or no
route found between them. Please check network connectivity
(including firewalls and network routing requirements).
--------------------------------------------------------------------------
-------------- next part --------------
#!/bin/bash
#MSUB -joe
#MSUB -N XADpr-1
#MSUB -l walltime=48:00:00
#MSUB -l nodes=5:ppn=16
export OMP_NUM_THREADS=1

export PATH=/opt/bwhpc/common/mpi/openmpi/2.1.1-gnu-7.1/bin:/opt/bwhpc/common/compiler/gnu/7.1.0/bin:/opt/bwhpc/common/chem/gromacs/2016.3_gnu7.1/bin:/software/all/bin:/usr/lib64/qt-3.3/bin:/opt/moab/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/st/st_st/st_ac128541/.local/bin:/home/st/st_st/st_ac128541/bin
export LD_BIND_NOW=1

cd /pfs/work2/workspace/scratch/st_ac128541-Lipase-0/1EDB

module purge
module load chem/gromacs/2016.3_gnu7.1

if [ 1 == 1 ] ; then
# Start simulation 

gmx_mpi grompp -maxwarn 10 -f md.mdp -c XADpr.gro -p XADpr.top -o XADpr-1.tpr -po XADpr-1.mdp > XADpr-1.grompp.out 2>&1

mpirun -n 80 gmx_mpi mdrun -s XADpr-1.tpr -maxh 47 -npme 20 -cpo XADpr-1.cpt -o XADpr-1.trr -x XADpr-1.xtc -c XADpr-1.gro -e XADpr-1.edr -g XADpr-1.log > XADpr-1.mdrun.out 2>&1

else
# Continue simulation using the checkpoint feature

mpirun -n 80 gmx_mpi mdrun -cpi XADpr-0.cpt -cpo XADpr-1.cpt -s XADpr-1.tpr -maxh 47 -npme 20 -o XADpr-1.trr -x XADpr-1.xtc -c XADpr-1.gro -e XADpr-1.edr -g XADpr-1.log > XADpr-1.mdrun.out 2>&1

fi  


More information about the gromacs.org_gmx-users mailing list