[gmx-users] Re: [gmx-users] Signal 11 error with parallel runs

sophie.vilarem@laposte.net sophie.vilarem at laposte.net
Tue Nov 4 16:19:01 CET 2003


sophie.vilarem at laposte.net wrote:

> Hi everybody,
> 
> I am trying to run the d.poly-ch2 benchmark, on a dual xeon
> cluster.
> I have installed the latest LAM version of MPI, and the latest
> version of Gromacs.
> It works all right on 1 CPU, but it fails on 2 CPUs (or
> more...) with the following error :
> 
> ...
> MPI_Send: process in local group is dead (rank 1,
MPI_COMM_WORLD)

The most relevant error probably already occurs before this. Did
mdrun for example complain about the number of CPU's the .tpr file
was made for?





I don't think so... but here is the total output (run on 2 CPUs) :

Run ? 2 CPU..
                         :-)  G  R  O  M  A  C  S  (-:
 
                      GROwing Monsters And Cloning Shrimps
 
                            :-)  VERSION 3.1.4  (-:
 
 
       Copyright (c) 1991-2002, University of Groningen, The
Netherlands
         This program is free software; you can redistribute
it and/or
          modify it under the terms of the GNU General Public
License
         as published by the Free Software Foundation; either
version 2
             of the License, or (at your option) any later
version.
 
                                :-)  grompp  (-:
 
Option     Filename  Type          Description
------------------------------------------------------------
  -f     grompp.mdp  Input, Opt.   grompp input file with MD
parameters
 -po      mdout.mdp  Output        grompp input file with MD
parameters
  -c       conf.gro  Input         Generic structure: gro g96
pdb tpr tpb tpa
  -r       conf.gro  Input, Opt.   Generic structure: gro g96
pdb tpr tpb tpa
  -n      index.ndx  Input, Opt.   Index file
-deshuf  deshuf.ndx  Output, Opt.  Index file
  -p      topol.top  Input         Topology file
 -pp  processed.top  Output, Opt.  Topology file
  -o      topol.tpr  Output        Generic run input: tpr tpb tpa
  -t       traj.trr  Input, Opt.   Full precision trajectory:
trr trj
 
      Option   Type  Value  Description
------------------------------------------------------
      -[no]h   bool     no  Print help info and quit
      -[no]X   bool     no  Use dialog box GUI to edit command
line options
       -nice    int      0  Set the nicelevel
      -[no]v   bool    yes  Be loud and noisy
       -time   real     -1  Take frame at or first after this
time.
         -np    int      2  Generate statusfile for # nodes
-[no]shuffle   bool     no  Shuffle molecules over nodes
   -[no]sort   bool     no  Sort molecules according to X
coordinate
-[no]rmdumbds  bool    yes  Remove constant bonded
interactions with dummies
       -load string         Releative load capacity of each
node on a parallel
                            machine. Be sure to use quotes
around the string,
                            which should contain a number for
each node
    -maxwarn    int     10  Number of warnings after which
input processing
                            stops
-[no]check14   bool     no  Remove 1-4 interactions without
Van der Waals
 
creating statusfile for 2 nodes...
 
Back Off! I just backed up mdout.mdp to ./#mdout.mdp.2#
Warning: as of GMX v 2.0 unit of compressibility is truly 1/bar
checking input for internal consistency...
calling /lib/cpp...
processing topology...
Generated 3 of the 3 non-bonded parameter combinations
Excluding 3 bonded neighbours for PE6000 1
processing coordinates...
double-checking input for internal consistency...
Cleaning up constraints and constant bonded interactions with
dummy particles
renumbering atomtypes...
converting bonded parameters...
#      BONDS:   17997
#     ANGLES:   23992
#     RBDIHS:   29985
#   DUMMY3FD:   29990
#  DUMMY3FAD:   10
Setting particle type to Dummy for dummy atoms
initialising group options...
processing index file...
Analysing residue names:
Opening library file
/works/work6/theogone/Gromacs/usr/local/Gromacs/share/gromacs/top/aminoacids.dat
There are:     1      OTHER residues
There are:     0    PROTEIN residues
There are:     0        DNA residues
Analysing Other...
Making dummy/rest group for Acceleration containing 12000 elements
Making dummy/rest group for Freeze containing 12000 elements
Making dummy/rest group for Energy Mon. containing 12000 elements
Making dummy/rest group for VCM containing 12000 elements
Number of degrees of freedom in T-Coupling group System is
17997.00
Making dummy/rest group for User1 containing 12000 elements
Making dummy/rest group for User2 containing 12000 elements
Making dummy/rest group for XTC containing 12000 elements
Making dummy/rest group for Or. Res. Fit containing 12000 elements
T-Coupling       has 1 element(s): System
Energy Mon.      has 1 element(s): rest
Acceleration     has 1 element(s): rest
Freeze           has 1 element(s): rest
User1            has 1 element(s): rest
User2            has 1 element(s): rest
VCM              has 1 element(s): rest
XTC              has 1 element(s): rest
Or. Res. Fit     has 1 element(s): rest
Checking consistency between energy and charge groups...
splitting topology...
There are 6000 charge group borders and 12000 shake borders
There are 6000 total borders
Division over nodes in atoms:
  6000  6000
writing run input file...
 
Back Off! I just backed up topol.tpr to ./#topol.tpr.2#
 
gcq#262: "Disturb the Peace of a John Q Citizen" (Urban Dance
Squad)
 
NNODES=2, MYRANK=1, HOSTNAME=lx05
NNODES=2, MYRANK=0, HOSTNAME=lx05
NODEID=1 argc=3
NODEID=0 argc=3
                         :-)  G  R  O  M  A  C  S  (-:
 
                Gravel Rubs Often Many Awfully Cauterized Sores
 
                            :-)  VERSION 3.1.4  (-:
 
 
       Copyright (c) 1991-2002, University of Groningen, The
Netherlands
         This program is free software; you can redistribute
it and/or
          modify it under the terms of the GNU General Public
License
         as published by the Free Software Foundation; either
version 2
             of the License, or (at your option) any later
version.
 
  :-) 
/works/work6/theogone/Gromacs/usr/local/Gromacs/i686-pc-linux-gnu/bin/mdrun
 (-:
 
Option     Filename  Type          Description
------------------------------------------------------------
  -s      topol.tpr  Input         Generic run input: tpr tpb tpa
  -o       traj.trr  Output        Full precision trajectory:
trr trj
  -x       traj.xtc  Output, Opt.  Compressed trajectory
(portable xdr format)
  -c    confout.gro  Output        Generic structure: gro g96 pdb
  -e       ener.edr  Output        Generic energy: edr ene
  -g      pc2_2.log  Output        Log file
-dgdl      dgdl.xvg  Output, Opt.  xvgr/xmgr file
-table    table.xvg  Input, Opt.   xvgr/xmgr file
-rerun    rerun.xtc  Input, Opt.   Generic trajectory: xtc trr
trj gro g96 pdb
 -ei        sam.edi  Input, Opt.   ED sampling input
 -eo        sam.edo  Output, Opt.  ED sampling output
  -j       wham.gct  Input, Opt.   General coupling stuff
 -jo        bam.gct  Input, Opt.   General coupling stuff
-ffout      gct.xvg  Output, Opt.  xvgr/xmgr file
-devout   deviatie.xvg  Output, Opt.  xvgr/xmgr file
-runav  runaver.xvg  Output, Opt.  xvgr/xmgr file
 -pi       pull.ppa  Input, Opt.   Pull parameters
 -po    pullout.ppa  Output, Opt.  Pull parameters
 -pd       pull.pdo  Output, Opt.  Pull data output
 -pn       pull.ndx  Input, Opt.   Index file
-mtx         nm.mtx  Output, Opt.  Hessian matrix
 
      Option   Type  Value  Description
------------------------------------------------------
      -[no]h   bool     no  Print help info and quit
      -[no]X   bool     no  Use dialog box GUI to edit command
line options
       -nice    int     19  Set the nicelevel
     -deffnm string         Set the default filename for all
file options
         -np    int      1  Number of nodes, must be the same
as used for
                            grompp
      -[no]v   bool     no  Be loud and noisy
-[no]compact   bool    yes  Write a compact log file
  -[no]multi   bool     no  Do multiple simulations in
parallel (only with -np
                            > 1)
   -[no]glas   bool     no  Do glass simulation with special
long range
                            corrections
 -[no]ionize   bool     no  Do a simulation including the
effect of an X-Ray
                            bombardment on your system
 
 
Back Off! I just backed up pc2_21.log to ./#pc2_21.log.1#
 
Back Off! I just backed up pc2_20.log to ./#pc2_20.log.1#
Reading file topol.tpr, VERSION 3.1.4 (single precision)
Reading file topol.tpr, VERSION 3.1.4 (single precision)
 
Back Off! I just backed up ener.edr to ./#ener.edr.1#
starting mdrun 'pe'
5000 steps,      5.0 ps.
 
MPI_Send: process in local group is dead (rank 1, MPI_COMM_WORLD)
Rank (1, MPI_COMM_WORLD): Call stack within LAM:
Rank (1, MPI_COMM_WORLD):  - MPI_Send()
Rank (1, MPI_COMM_WORLD):  - MPI_Sendrecv()
Rank (1, MPI_COMM_WORLD):  - main()
-----------------------------------------------------------------------------
One of the processes started by mpirun has exited with a
nonzero exit
code.  This typically indicates that the process finished in
error.
If your process did not finish in error, be sure to include a
"return
0" or "exit(0)" in your C code before exiting the application.
 
PID 8505 failed on node n0 (10.10.100.5) due to signal 11.
-----------------------------------------------------------------------------
Fin du run




Does it help?


Accédez au courrier électronique de La Poste : www.laposte.net ; 
3615 LAPOSTENET (0,34€/mn) ; tél : 08 92 68 13 50 (0,34€/mn)






More information about the gromacs.org_gmx-users mailing list