[gmx-users] MPI-based errors on the power6 at large parallelization

chris.neale at utoronto.ca chris.neale at utoronto.ca
Wed Mar 4 17:58:50 CET 2009


Hello,

This is a more detailed description of a problem that I previously  
reported under the title "MPI_Recv invalid count and system explodes  
for large but not small parallelization on power6 but not opterons"  
http://www.gromacs.org/pipermail/gmx-users/2009-March/040158.html but  
focuses solely on the MPI-related problems that I see for N=196  
parallelization.

In summary, I see a variety of MPI-based errors:
a. ERROR: 0032-117 User pack or receive buffer is too small  (24) in  
MPI_Sendrecv, task 183
b. ERROR: 0032-103 Invalid count  (-8388608) in MPI_Recv, task 37
c. Or the system crashes without giving an MPI-based error message.

So I gather that there is some problem with the MPI. Is this something  
that I should try to solve by changing the way that MPI is set up? I  
would have thought that gromacs would be responsible for ensuring that  
buffers are large enough, etc.

System contains 500,000 real atoms (not including Tip4p MW), and  
consists of All-atom OPLS protein, Tip4P water, ions, and united atom  
detergent. I suppose the united atom detergent may be causing some  
problem if gromacs assumes a uniform density when determining how many  
atoms are likely to be found in any one grid and that this may become  
more of a problem as the grids get smaller?

######

Here is the detailed information.

gromacs version 4.0.4
cluster of 32 x Power6 boxes @ 4.3 GHz running SMT to yield 64 tasks per box.
Compiled using

Compilation Information:

export  
PATH=/usr/lpp/ppe.hpct/bin:/usr/vacpp/bin:.:/usr/bin:/etc:/usr/sbin:/usr/ucb:/usr/bin/X11:/sbin:/usr/java14/jre/bin:/u
sr/java14/bin:/usr/lpp/LoadL/full/bin:/usr/local/bin
export F77=xlf_r
export CC=xlc_r
export CXX=xlc++_r
export FFLAGS="-O2 -qarch=pwr6 -qtune=pwr6"
export CFLAGS="-O2 -qarch=pwr6 -qtune=pwr6"
export CXXFLAGS="-O2 -qarch=pwr6 -qtune=pwr6"

export FFTW_LOCATION=/scratch/cneale/exe/fftw-3.1.2_aix/exec
export GROMACS_LOCATION=/scratch/cneale/exe/gromacs-4.0.4_aix_o2/exec
export CPPFLAGS=-I$FFTW_LOCATION/include
export LDFLAGS=-L$FFTW_LOCATION/lib

cd /scratch/cneale/exe/gromacs-4.0.4_aix_o2
mkdir exec

./configure --prefix=$GROMACS_LOCATION --without-motif-includes  
 >output.configure 2>&1
make  >output.make 2>&1
make install  >output.make_install 2>&1
make distclean

######

[cneale at tcs-f11n05]$ cat incubator1.mdp
title               =  seriousMD
cpp                 =  cpp
integrator          =  md
nsteps              =  500
tinit               =  0
dt                  =  0.002
comm_mode           =  linear
nstcomm             =  1
comm_grps           =  System
nstxout             =  5000
nstvout             =  5000
nstfout             =  5000
nstlog              =  5000
nstlist             =  10
nstenergy           =  5000
nstxtcout           =  5000
ns_type             =  grid
pbc                 =  xyz
coulombtype         =  PME
rcoulomb            =  0.9
fourierspacing      =  0.12
pme_order           =  4
vdwtype             =  cut-off
rvdw_switch         =  0
rvdw                =  1.4
rlist               =  0.9
DispCorr            =  no
Pcoupl              =  Berendsen
pcoupltype          =  isotropic
compressibility     =  4.5e-5
ref_p               =  1.
tau_p               =  4.0
tcoupl              =  Berendsen
tc_grps             =  Protein      DPC_LDA     SOL_NA+
tau_t               =  0.1          0.1         0.1
ref_t               =  300.         300.        300.
annealing           =  no
gen_vel             =  yes
unconstrained-start =  no
gen_temp            =  300.
gen_seed            =  9896
constraints         =  all-bonds
constraint_algorithm=  lincs
lincs-iter          =  1
lincs-order         =  4
;EOF

#################

[cneale at tcs-f11n05]$ cat temp.log
Log file opened on Wed Mar  4 11:38:00 2009
Host: tcs-f09n10  pid: 279344  nodeid: 0  nnodes:  196
The Gromacs distribution was built Tue Mar  3 11:49:04 EST 2009 by
cneale at tcs-f03n07 (AIX 3 00CA27F24C00)


                          :-)  G  R  O  M  A  C  S  (-:

                           GROtesk MACabre and Sinister

                             :-)  VERSION 4.0.4  (-:


       Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
        Copyright (c) 1991-2000, University of Groningen, The Netherlands.
              Copyright (c) 2001-2008, The GROMACS development team,
             check out http://www.gromacs.org for more information.

          This program is free software; you can redistribute it and/or
           modify it under the terms of the GNU General Public License
          as published by the Free Software Foundation; either version 2
              of the License, or (at your option) any later version.

      :-)  /scratch/cneale/exe/gromacs-4.0.4_aix_o2/exec/bin/mdrun_mpi  (-:


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable
molecular simulation
J. Chem. Theory Comput. 4 (2008) pp. 435-447
-------- -------- --- Thank You --- -------- --------


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C.
Berendsen
GROMACS: Fast, Flexible and Free
J. Comp. Chem. 26 (2005) pp. 1701-1719
-------- -------- --- Thank You --- -------- --------


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
E. Lindahl and B. Hess and D. van der Spoel
GROMACS 3.0: A package for molecular simulation and trajectory analysis
J. Mol. Mod. 7 (2001) pp. 306-317
-------- -------- --- Thank You --- -------- --------


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
H. J. C. Berendsen, D. van der Spoel and R. van Drunen
GROMACS: A message-passing parallel molecular dynamics implementation
Comp. Phys. Comm. 91 (1995) pp. 43-56
-------- -------- --- Thank You --- -------- --------

parameters of the run:
    integrator           = md
    nsteps               = 500
    init_step            = 0
    ns_type              = Grid
    nstlist              = 10
    ndelta               = 2
    nstcomm              = 1
    comm_mode            = Linear
    nstlog               = 5000
    nstxout              = 5000
    nstvout              = 5000
    nstfout              = 5000
    nstenergy            = 5000
    nstxtcout            = 5000
    init_t               = 0
    delta_t              = 0.002
    xtcprec              = 1000
    nkx                  = 175
    nky                  = 175
    nkz                  = 175
    pme_order            = 4
    ewald_rtol           = 1e-05
    ewald_geometry       = 0
    epsilon_surface      = 0
    optimize_fft         = FALSE
    ePBC                 = xyz
    bPeriodicMols        = FALSE
    bContinuation        = FALSE
    bShakeSOR            = FALSE
    etc                  = Berendsen
    epc                  = Berendsen
    epctype              = Isotropic
    tau_p                = 4
    ref_p (3x3):
       ref_p[    0]={ 1.00000e+00,  0.00000e+00,  0.00000e+00}
       ref_p[    1]={ 0.00000e+00,  1.00000e+00,  0.00000e+00}
       ref_p[    2]={ 0.00000e+00,  0.00000e+00,  1.00000e+00}
    compress (3x3):
       compress[    0]={ 4.50000e-05,  0.00000e+00,  0.00000e+00}
       compress[    1]={ 0.00000e+00,  4.50000e-05,  0.00000e+00}
       compress[    2]={ 0.00000e+00,  0.00000e+00,  4.50000e-05}
    refcoord_scaling     = No
    posres_com (3):
       posres_com[0]= 0.00000e+00
       posres_com[1]= 0.00000e+00
       posres_com[2]= 0.00000e+00
    posres_comB (3):
       posres_comB[0]= 0.00000e+00
       posres_comB[1]= 0.00000e+00
       posres_comB[2]= 0.00000e+00
    andersen_seed        = 815131
    rlist                = 0.9
    rtpi                 = 0.05
    coulombtype          = PME
    rcoulomb_switch      = 0
    rcoulomb             = 0.9
    vdwtype              = Cut-off
    rvdw_switch          = 0
    rvdw                 = 1.4
    epsilon_r            = 1
    epsilon_rf           = 1
    tabext               = 1
    implicit_solvent     = No
    gb_algorithm         = Still
    gb_epsilon_solvent   = 80
    nstgbradii           = 1
    rgbradii             = 2
    gb_saltconc          = 0
    gb_obc_alpha         = 1
    gb_obc_beta          = 0.8
    gb_obc_gamma         = 4.85
    sa_surface_tension   = 2.092
    DispCorr             = No
    free_energy          = no
    init_lambda          = 0
    sc_alpha             = 0
    sc_power             = 0
    sc_sigma             = 0.3
    delta_lambda         = 0
    nwall                = 0
    wall_type            = 9-3
    wall_atomtype[0]     = -1
    wall_atomtype[1]     = -1
    wall_density[0]      = 0
    wall_density[1]      = 0
    wall_ewald_zfac      = 3
    pull                 = no
    disre                = No
    disre_weighting      = Conservative
    disre_mixed          = FALSE
    dr_fc                = 1000
    dr_tau               = 0
    nstdisreout          = 100
    orires_fc            = 0
    orires_tau           = 0
    nstorireout          = 100
    dihre-fc             = 1000
    em_stepsize          = 0.01
    em_tol               = 10
    niter                = 20
    fc_stepsize          = 0
    nstcgsteep           = 1000
    nbfgscorr            = 10
    ConstAlg             = Lincs
    shake_tol            = 0.0001
    lincs_order          = 4
    lincs_warnangle      = 30
    lincs_iter           = 1
    bd_fric              = 0
    ld_seed              = 1993
    cos_accel            = 0
    deform (3x3):
       deform[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
       deform[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
       deform[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
    userint1             = 0
    userint2             = 0
    userint3             = 0
    userint4             = 0
    userreal1            = 0
    userreal2            = 0
    userreal3            = 0
    userreal4            = 0
grpopts:
    nrdf:     40031.9     67943.8      987429
    ref_t:         300         300         300
    tau_t:         0.1         0.1         0.1
anneal:          No          No          No
ann_npoints:           0           0           0
    acc:            0           0           0
    nfreeze:           N           N           N
    energygrp_flags[  0]: 0
    efield-x:
       n = 0
    efield-xt:
       n = 0
    efield-y:
       n = 0
    efield-yt:
       n = 0
    efield-z:
       n = 0
    efield-zt:
       n = 0
    bQMMM                = FALSE
    QMconstraints        = 0
    QMMMscheme           = 0
    scalefactor          = 1
qm_opts:
    ngQM                 = 0

Initializing Domain Decomposition on 196 nodes
Dynamic load balancing: auto
Will sort the charge groups at every domain (re)decomposition
Initial maximum inter charge-group distances:
     two-body bonded interactions: 0.556 nm, LJ-14, atoms 25035 25038
   multi-body bonded interactions: 0.556 nm, Proper Dih., atoms 25035 25038
Minimum cell size due to bonded interactions: 0.612 nm
Maximum distance for 5 constraints, at 120 deg. angles, all-trans: 0.820 nm
Estimated maximum distance required for P-LINCS: 0.820 nm
This distance will limit the DD cell size, you can override this with -rcon
Guess for relative PME load: 0.37
Will use 108 particle-particle and 88 PME only nodes
This is a guess, check the performance at the end of the log file
Using 88 separate PME nodes
Scaling the initial minimum size with 1/0.8 (option -dds) = 1.25
Optimizing the DD grid for 108 cells with a minimum initial size of 1.025 nm
The maximum allowed number of cells is: X 16 Y 16 Z 14
Domain decomposition grid 6 x 6 x 3, separate PME nodes 88
Interleaving PP and PME nodes
This is a particle-particle only node

Domain decomposition nodeid 0, coordinates 0 0 0

Using two step summing over 4 groups of on average 27.0 processes

Table routines are used for coulomb: TRUE
Table routines are used for vdw:     FALSE
Will do PME sum in reciprocal space.

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
U. Essman, L. Perela, M. L. Berkowitz, T. Darden, H. Lee and L. G. Pedersen
A smooth particle mesh Ewald method
J. Chem. Phys. 103 (1995) pp. 8577-8592
-------- -------- --- Thank You --- -------- --------

Using a Gaussian width (1/beta) of 0.288146 nm for Ewald
Cut-off's:   NS: 0.9   Coulomb: 0.9   LJ: 1.4
System total charge: 0.000
Generated table with 1200 data points for Ewald.
Tabscale = 500 points/nm
Generated table with 1200 data points for LJ6.
Tabscale = 500 points/nm
Generated table with 1200 data points for LJ12.
Tabscale = 500 points/nm
Generated table with 1200 data points for 1-4 COUL.
Tabscale = 500 points/nm
Generated table with 1200 data points for 1-4 LJ6.
Tabscale = 500 points/nm
Generated table with 1200 data points for 1-4 LJ12.
Tabscale = 500 points/nm

Enabling TIP4p water optimization for 164564 molecules.

Configuring nonbonded kernels...


Removing pbc first time

Initializing Parallel LINear Constraint Solver

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
B. Hess
P-LINCS: A Parallel Linear Constraint Solver for molecular simulation
J. Chem. Theory Comput. 4 (2008) pp. 116-122
-------- -------- --- Thank You --- -------- --------

The number of constraints is 52488
There are inter charge-group constraints,
will communicate selected coordinates each lincs iteration

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Miyamoto and P. A. Kollman
SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for Rigid
Water Models
J. Comp. Chem. 13 (1992) pp. 952-962
-------- -------- --- Thank You --- -------- --------


Linking all bonded interactions to atoms
There are 156384 inter charge-group exclusions,
will use an extra communication step for exclusion forces for PME

The initial number of communication pulses is: X 1 Y 1 Z 1
The initial domain decomposition cell size is: X 2.81 nm Y 2.81 nm Z 4.86 nm

The maximum allowed distance for charge groups involved in interactions is:
                  non-bonded interactions           1.400 nm
(the following are initial values, they could change due to box deformation)
             two-body bonded interactions  (-rdd)   1.400 nm
           multi-body bonded interactions  (-rdd)   1.400 nm
   atoms separated by up to 5 constraints  (-rcon)  2.806 nm

When dynamic load balancing gets turned on, these settings will change to:
The maximum number of communication pulses is: X 1 Y 1 Z 1
The minimum size for domain decomposition cells is 1.400 nm
The requested allowed shrink of DD cells (option -dds) is: 0.80
The allowed shrink of domain decomposition cells is: X 0.50 Y 0.50 Z 0.29
The maximum allowed distance for charge groups involved in interactions is:
                  non-bonded interactions           1.400 nm
             two-body bonded interactions  (-rdd)   1.400 nm
           multi-body bonded interactions  (-rdd)   1.400 nm
   atoms separated by up to 5 constraints  (-rcon)  1.400 nm


Making 3D domain decomposition grid 6 x 6 x 3, home cell index 0 0 0

Center of mass motion removal mode is Linear
We have the following groups for center of mass motion removal:
   0:  System

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
H. J. C. Berendsen, J. P. M. Postma, A. DiNola and J. R. Haak
Molecular dynamics with coupling to an external bath
J. Chem. Phys. 81 (1984) pp. 3684-3690
-------- -------- --- Thank You --- -------- --------

There are: 547196 Atoms
There are: 164564 VSites
Charge group distribution at step 0: 1835 1783 1808 1763 1765 1723  
1796 1810 1783 1889 1773 1802 1777 1813 1726 1765 1801 1790 1766 1759  
1733 1767 1738 1761 1744 1728 1775 1785 1759 1758 1754 1807 1738 1781  
1716 1751 1775 1771 1794 1764 1765 1777 1798 1776 1827 1819 1752 1800  
1751 1809 1768 1849 1788 1847 1824 1767 1834 1745 1737 1753 1776 1778  
1785 1829 1788 1863 1725 1785 1778 1800 1754 1820 1747 1757 1773 1773  
1819 1726 1745 1784 1772 1753 1771 1788 1781 1784 1734 1761 1769 1733  
1801 1767 1798 1781 1779 1758 1796 1800 1856 1795 1740 1828 1736 1779  
1783 1795 1776 1849
Grid: 13 x 13 x 10 cells

Constraining the starting coordinates (step 0)

Constraining the coordinates at t0-dt (step 0)
RMS relative constraint deviation after constraining: 3.34e-05
Initial temperature: 300.336 K

Started mdrun on node 0 Wed Mar  4 11:38:03 2009

            Step           Time         Lambda
               0        0.00000        0.00000

    Energies (kJ/mol)
           Angle    Proper Dih. Ryckaert-Bell.          LJ-14     Coulomb-14
     6.76081e+04    1.72586e+04    2.77334e+04    4.21216e+04    2.71567e+05
         LJ (SR)        LJ (LR)   Coulomb (SR)   Coul. recip.      Potential
     1.55700e+06   -4.61765e+04   -9.26002e+06   -1.91687e+06   -9.23978e+06
     Kinetic En.   Total Energy    Temperature Pressure (bar)  Cons. rmsd ()
     1.37042e+06   -7.86936e+06    3.00934e+02   -2.95959e+03    5.41981e-05

<this is the end of the log file>

###############

[cneale at tcs-f11n05]$cat my.stderr
ATTENTION: 0031-408  196 tasks allocated by LoadLeveler, continuing...
NNODES=196, MYRANK=192, HOSTNAME=tcs-f04n08
...
<snip>
...
NNODES=196, MYRANK=4, HOSTNAME=tcs-f09n10
NODEID=64 argc=3
...
<snip>
...
NODEID=63 argc=3
                          :-)  G  R  O  M  A  C  S  (-:

                Gromacs Runs One Microsecond At Cannonball Speeds

                             :-)  VERSION 4.0.4  (-:


       Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
        Copyright (c) 1991-2000, University of Groningen, The Netherlands.
              Copyright (c) 2001-2008, The GROMACS development team,
             check out http://www.gromacs.org for more information.

          This program is free software; you can redistribute it and/or
           modify it under the terms of the GNU General Public License
          as published by the Free Software Foundation; either version 2
              of the License, or (at your option) any later version.

      :-)  /scratch/cneale/exe/gromacs-4.0.4_aix_o2/exec/bin/mdrun_mpi  (-:

Option     Filename  Type         Description
------------------------------------------------------------
   -s       temp.tpr  Input        Run input file: tpr tpb tpa
   -o       temp.trr  Output       Full precision trajectory: trr trj cpt
   -x       temp.xtc  Output, Opt. Compressed trajectory (portable xdr format)
-cpi       temp.cpt  Input, Opt.  Checkpoint file
-cpo       temp.cpt  Output, Opt. Checkpoint file
   -c       temp.gro  Output       Structure file: gro g96 pdb
   -e       temp.edr  Output       Energy file: edr ene
   -g       temp.log  Output       Log file
-dgdl      temp.xvg  Output, Opt. xvgr/xmgr file
-field     temp.xvg  Output, Opt. xvgr/xmgr file
-table     temp.xvg  Input, Opt.  xvgr/xmgr file
-tablep    temp.xvg  Input, Opt.  xvgr/xmgr file
-tableb    temp.xvg  Input, Opt.  xvgr/xmgr file
-rerun     temp.xtc  Input, Opt.  Trajectory: xtc trr trj gro g96 pdb cpt
-tpi       temp.xvg  Output, Opt. xvgr/xmgr file
-tpid      temp.xvg  Output, Opt. xvgr/xmgr file
  -ei       temp.edi  Input, Opt.  ED sampling input
  -eo       temp.edo  Output, Opt. ED sampling output
   -j       temp.gct  Input, Opt.  General coupling stuff
  -jo       temp.gct  Output, Opt. General coupling stuff
-ffout     temp.xvg  Output, Opt. xvgr/xmgr file
-devout    temp.xvg  Output, Opt. xvgr/xmgr file
-runav     temp.xvg  Output, Opt. xvgr/xmgr file
  -px       temp.xvg  Output, Opt. xvgr/xmgr file
  -pf       temp.xvg  Output, Opt. xvgr/xmgr file
-mtx       temp.mtx  Output, Opt. Hessian matrix
  -dn       temp.ndx  Output, Opt. Index file

Option       Type   Value   Description
------------------------------------------------------
-[no]h       bool   no      Print help info and quit
-nice        int    0       Set the nicelevel
-deffnm      string temp    Set the default filename for all file options
-[no]xvgr    bool   yes     Add specific codes (legends etc.) in the output
                             xvg files for the xmgrace program
-[no]pd      bool   no      Use particle decompostion
-dd          vector 0 0 0   Domain decomposition grid, 0 is optimize
-npme        int    -1      Number of separate nodes to be used for PME, -1
                             is guess
-ddorder     enum   interleave  DD node order: interleave, pp_pme or cartesian
-[no]ddcheck bool   yes     Check for all bonded interactions with DD
-rdd         real   0       The maximum distance for bonded interactions with
                             DD (nm), 0 is determine from initial coordinates
-rcon        real   0       Maximum distance for P-LINCS (nm), 0 is estimate
-dlb         enum   auto    Dynamic load balancing (with DD): auto, no or yes
-dds         real   0.8     Minimum allowed dlb scaling of the DD cell size
-[no]sum     bool   yes     Sum the energies at every step
-[no]v       bool   no      Be loud and noisy
-[no]compact bool   yes     Write a compact log file
-[no]seppot  bool   no      Write separate V and dVdl terms for each
                             interaction type and node to the log file(s)
-pforce      real   -1      Print all forces larger than this (kJ/mol nm)
-[no]reprod  bool   no      Try to avoid optimizations that affect binary
                             reproducibility
-cpt         real   15      Checkpoint interval (minutes)
-[no]append  bool   no      Append to previous output files when continuing
                             from checkpoint
-[no]addpart bool   yes     Add the simulation part number to all output
                             files when continuing from checkpoint
-maxh        real   -1      Terminate after 0.99 times this time (hours)
-multi       int    0       Do multiple simulations in parallel
-replex      int    0       Attempt replica exchange every # steps
-reseed      int    -1      Seed for replica exchange, -1 is generate a seed
-[no]glas    bool   no      Do glass simulation with special long range
                             corrections
-[no]ionize  bool   no      Do a simulation including the effect of an X-Ray
                             bombardment on your system

Reading file temp.tpr, VERSION 4.0.4 (single precision)

Will use 108 particle-particle and 88 PME only nodes
This is a guess, check the performance at the end of the log file

NOTE: For optimal PME load balancing at high parallelization
       PME grid_x (175) and grid_y (175) should be divisible by #PME_nodes (88)

Making 3D domain decomposition 6 x 6 x 3

starting mdrun 'Big Box'
500 steps,      1.0 ps.
ERROR: 0032-117 User pack or receive buffer is too small  (24) in  
MPI_Sendrecv, task 183


#########

Then from the IBM website:
(http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=/com.ibm.cluster.pe_linux43.messages.doc/am105_mpimsgs.html)

0032-117

Parallel Environment for Linux V4.3 Messages
SA38-0648-01

User pack or receive buffer too small (number) in string, task number
Explanation

The buffer specified for the operation was too small to hold the  
message. In the PACK and UNPACK cases it is the space between current  
position and buffer end which is too small.
User response

Increase the size of the buffer or reduce the size of the message.  
Error Class: MPI_ERR_TRUNCATE

### And the error that I previously reported was:

0032-103

Parallel Environment for Linux V4.3 Messages
SA38-0648-01

Invalid count (number) in string, task number
Explanation

The value of count (element count) is out of range.
User response

Make sure that the count is greater than or equal to zero.

Error Class: MPI_ERR_COUNT

### And if I run it a third time, I don't get an MPI based error, but a crash:

$tail temp.log
...
Reading file temp.tpr, VERSION 4.0.4 (single precision)

Will use 108 particle-particle and 88 PME only nodes
This is a guess, check the performance at the end of the log file

NOTE: For optimal PME load balancing at high parallelization
       PME grid_x (175) and grid_y (175) should be divisible by #PME_nodes (88)

Making 3D domain decomposition 6 x 6 x 3

starting mdrun 'Big Box'
500 steps,      1.0 ps.

t = 0.222 ps: Water molecule starting at atom 105813 can not be settled.
Check for bad contacts and/or reduce the timestep.
Wrote pdb files with previous and current coordinates

Step 112, time 0.224 (ps)  LINCS WARNING
relative constraint deviation after LINCS:
rms 18077115.836165, max 343959040.000000 (between atoms 24714 and 24713)
bonds that rotated more than 30 degrees:
  atom 1 atom 2  angle  previous, current, constraint length
   24715  24714   90.3    0.1530 14177746.0000      0.1530
   24716  24715   93.1    0.1530 2540901.0000      0.1530
   24717  24716   91.1    0.1530 473629.9375      0.1530
   24718  24717  103.1    0.1530 63926.0820      0.1530
   24719  24718  113.0    0.1530 3528.4724      0.1530
   24720  24719   33.0    0.1530   0.1829      0.1530
   24706  24703   94.5    0.1470 72767.5391      0.1470
   24706  24704   91.5    0.1470 207189.8906      0.1470
   24706  24705   94.1    0.1470 73579.9141      0.1470
   24707  24706   97.4    0.1470 71872.3125      0.1470
   24708  24707   94.8    0.1530 282677.5000      0.1530
   24709  24708   90.6    0.1430 3258115.2500      0.1430
   24710  24709   90.4    0.1610 15580227.0000      0.1610
   24713  24710   90.2    0.1610 50052624.0000      0.1610
   24712  24710   90.7    0.1480 15122392.0000      0.1480
   24711  24710   90.8    0.1480 14735100.0000      0.1480
   24714  24713   90.5    0.1430 49186144.0000      0.1430

t = 0.224 ps: Water molecule starting at atom 164033 can not be settled.
Check for bad contacts and/or reduce the timestep.

t = 0.224 ps: Water molecule starting at atom 163709 can not be settled.
Check for bad contacts and/or reduce the timestep.
Wrote pdb files with previous and current coordinates
Wrote pdb files with previous and current coordinates
ERROR: 0031-250  task 145: Segmentation fault
ERROR: 0031-250  task 151: Segmentation fault
ERROR: 0031-250  task 153: Segmentation fault
ERROR: 0031-250  task 112: Segmentation fault
ERROR: 0031-250  task 118: Segmentation fault
ERROR: 0031-250  task 119: Segmentation fault

##############

Many thanks,
Chris.





More information about the gromacs.org_gmx-users mailing list