[gmx-users] Re: [gmx-users] Signal 11 error with parallel runs
sophie.vilarem@laposte.net
sophie.vilarem at laposte.net
Tue Nov 4 16:19:01 CET 2003
sophie.vilarem at laposte.net wrote:
> Hi everybody,
>
> I am trying to run the d.poly-ch2 benchmark, on a dual xeon
> cluster.
> I have installed the latest LAM version of MPI, and the latest
> version of Gromacs.
> It works all right on 1 CPU, but it fails on 2 CPUs (or
> more...) with the following error :
>
> ...
> MPI_Send: process in local group is dead (rank 1,
MPI_COMM_WORLD)
The most relevant error probably already occurs before this. Did
mdrun for example complain about the number of CPU's the .tpr file
was made for?
I don't think so... but here is the total output (run on 2 CPUs) :
Run ? 2 CPU..
:-) G R O M A C S (-:
GROwing Monsters And Cloning Shrimps
:-) VERSION 3.1.4 (-:
Copyright (c) 1991-2002, University of Groningen, The
Netherlands
This program is free software; you can redistribute
it and/or
modify it under the terms of the GNU General Public
License
as published by the Free Software Foundation; either
version 2
of the License, or (at your option) any later
version.
:-) grompp (-:
Option Filename Type Description
------------------------------------------------------------
-f grompp.mdp Input, Opt. grompp input file with MD
parameters
-po mdout.mdp Output grompp input file with MD
parameters
-c conf.gro Input Generic structure: gro g96
pdb tpr tpb tpa
-r conf.gro Input, Opt. Generic structure: gro g96
pdb tpr tpb tpa
-n index.ndx Input, Opt. Index file
-deshuf deshuf.ndx Output, Opt. Index file
-p topol.top Input Topology file
-pp processed.top Output, Opt. Topology file
-o topol.tpr Output Generic run input: tpr tpb tpa
-t traj.trr Input, Opt. Full precision trajectory:
trr trj
Option Type Value Description
------------------------------------------------------
-[no]h bool no Print help info and quit
-[no]X bool no Use dialog box GUI to edit command
line options
-nice int 0 Set the nicelevel
-[no]v bool yes Be loud and noisy
-time real -1 Take frame at or first after this
time.
-np int 2 Generate statusfile for # nodes
-[no]shuffle bool no Shuffle molecules over nodes
-[no]sort bool no Sort molecules according to X
coordinate
-[no]rmdumbds bool yes Remove constant bonded
interactions with dummies
-load string Releative load capacity of each
node on a parallel
machine. Be sure to use quotes
around the string,
which should contain a number for
each node
-maxwarn int 10 Number of warnings after which
input processing
stops
-[no]check14 bool no Remove 1-4 interactions without
Van der Waals
creating statusfile for 2 nodes...
Back Off! I just backed up mdout.mdp to ./#mdout.mdp.2#
Warning: as of GMX v 2.0 unit of compressibility is truly 1/bar
checking input for internal consistency...
calling /lib/cpp...
processing topology...
Generated 3 of the 3 non-bonded parameter combinations
Excluding 3 bonded neighbours for PE6000 1
processing coordinates...
double-checking input for internal consistency...
Cleaning up constraints and constant bonded interactions with
dummy particles
renumbering atomtypes...
converting bonded parameters...
# BONDS: 17997
# ANGLES: 23992
# RBDIHS: 29985
# DUMMY3FD: 29990
# DUMMY3FAD: 10
Setting particle type to Dummy for dummy atoms
initialising group options...
processing index file...
Analysing residue names:
Opening library file
/works/work6/theogone/Gromacs/usr/local/Gromacs/share/gromacs/top/aminoacids.dat
There are: 1 OTHER residues
There are: 0 PROTEIN residues
There are: 0 DNA residues
Analysing Other...
Making dummy/rest group for Acceleration containing 12000 elements
Making dummy/rest group for Freeze containing 12000 elements
Making dummy/rest group for Energy Mon. containing 12000 elements
Making dummy/rest group for VCM containing 12000 elements
Number of degrees of freedom in T-Coupling group System is
17997.00
Making dummy/rest group for User1 containing 12000 elements
Making dummy/rest group for User2 containing 12000 elements
Making dummy/rest group for XTC containing 12000 elements
Making dummy/rest group for Or. Res. Fit containing 12000 elements
T-Coupling has 1 element(s): System
Energy Mon. has 1 element(s): rest
Acceleration has 1 element(s): rest
Freeze has 1 element(s): rest
User1 has 1 element(s): rest
User2 has 1 element(s): rest
VCM has 1 element(s): rest
XTC has 1 element(s): rest
Or. Res. Fit has 1 element(s): rest
Checking consistency between energy and charge groups...
splitting topology...
There are 6000 charge group borders and 12000 shake borders
There are 6000 total borders
Division over nodes in atoms:
6000 6000
writing run input file...
Back Off! I just backed up topol.tpr to ./#topol.tpr.2#
gcq#262: "Disturb the Peace of a John Q Citizen" (Urban Dance
Squad)
NNODES=2, MYRANK=1, HOSTNAME=lx05
NNODES=2, MYRANK=0, HOSTNAME=lx05
NODEID=1 argc=3
NODEID=0 argc=3
:-) G R O M A C S (-:
Gravel Rubs Often Many Awfully Cauterized Sores
:-) VERSION 3.1.4 (-:
Copyright (c) 1991-2002, University of Groningen, The
Netherlands
This program is free software; you can redistribute
it and/or
modify it under the terms of the GNU General Public
License
as published by the Free Software Foundation; either
version 2
of the License, or (at your option) any later
version.
:-)
/works/work6/theogone/Gromacs/usr/local/Gromacs/i686-pc-linux-gnu/bin/mdrun
(-:
Option Filename Type Description
------------------------------------------------------------
-s topol.tpr Input Generic run input: tpr tpb tpa
-o traj.trr Output Full precision trajectory:
trr trj
-x traj.xtc Output, Opt. Compressed trajectory
(portable xdr format)
-c confout.gro Output Generic structure: gro g96 pdb
-e ener.edr Output Generic energy: edr ene
-g pc2_2.log Output Log file
-dgdl dgdl.xvg Output, Opt. xvgr/xmgr file
-table table.xvg Input, Opt. xvgr/xmgr file
-rerun rerun.xtc Input, Opt. Generic trajectory: xtc trr
trj gro g96 pdb
-ei sam.edi Input, Opt. ED sampling input
-eo sam.edo Output, Opt. ED sampling output
-j wham.gct Input, Opt. General coupling stuff
-jo bam.gct Input, Opt. General coupling stuff
-ffout gct.xvg Output, Opt. xvgr/xmgr file
-devout deviatie.xvg Output, Opt. xvgr/xmgr file
-runav runaver.xvg Output, Opt. xvgr/xmgr file
-pi pull.ppa Input, Opt. Pull parameters
-po pullout.ppa Output, Opt. Pull parameters
-pd pull.pdo Output, Opt. Pull data output
-pn pull.ndx Input, Opt. Index file
-mtx nm.mtx Output, Opt. Hessian matrix
Option Type Value Description
------------------------------------------------------
-[no]h bool no Print help info and quit
-[no]X bool no Use dialog box GUI to edit command
line options
-nice int 19 Set the nicelevel
-deffnm string Set the default filename for all
file options
-np int 1 Number of nodes, must be the same
as used for
grompp
-[no]v bool no Be loud and noisy
-[no]compact bool yes Write a compact log file
-[no]multi bool no Do multiple simulations in
parallel (only with -np
> 1)
-[no]glas bool no Do glass simulation with special
long range
corrections
-[no]ionize bool no Do a simulation including the
effect of an X-Ray
bombardment on your system
Back Off! I just backed up pc2_21.log to ./#pc2_21.log.1#
Back Off! I just backed up pc2_20.log to ./#pc2_20.log.1#
Reading file topol.tpr, VERSION 3.1.4 (single precision)
Reading file topol.tpr, VERSION 3.1.4 (single precision)
Back Off! I just backed up ener.edr to ./#ener.edr.1#
starting mdrun 'pe'
5000 steps, 5.0 ps.
MPI_Send: process in local group is dead (rank 1, MPI_COMM_WORLD)
Rank (1, MPI_COMM_WORLD): Call stack within LAM:
Rank (1, MPI_COMM_WORLD): - MPI_Send()
Rank (1, MPI_COMM_WORLD): - MPI_Sendrecv()
Rank (1, MPI_COMM_WORLD): - main()
-----------------------------------------------------------------------------
One of the processes started by mpirun has exited with a
nonzero exit
code. This typically indicates that the process finished in
error.
If your process did not finish in error, be sure to include a
"return
0" or "exit(0)" in your C code before exiting the application.
PID 8505 failed on node n0 (10.10.100.5) due to signal 11.
-----------------------------------------------------------------------------
Fin du run
Does it help?
Accédez au courrier électronique de La Poste : www.laposte.net ;
3615 LAPOSTENET (0,34/mn) ; tél : 08 92 68 13 50 (0,34/mn)
More information about the gromacs.org_gmx-users
mailing list