This is a more detailed description of a problem that I previously
reported under the title "MPI_Recv invalid count and system explodes
for large but not small parallelization on power6 but not opterons"
http://www.gromacs.org/pipermail/gmx-users/2009-March/040158.html but
focuses solely on the MPI-related problems that I see for N=196
In summary, I see a variety of MPI-based errors:
a. ERROR: 0032-117 User pack or receive buffer is too small (24) in
MPI_Sendrecv, task 183
b. ERROR: 0032-103 Invalid count (-8388608) in MPI_Recv, task 37
c. Or the system crashes without giving an MPI-based error message.
So I gather that there is some problem with the MPI. Is this something
that I should try to solve by changing the way that MPI is set up? I
would have thought that gromacs would be responsible for ensuring that
buffers are large enough, etc.
System contains 500,000 real atoms (not including Tip4p MW), and
consists of All-atom OPLS protein, Tip4P water, ions, and united atom
detergent. I suppose the united atom detergent may be causing some
problem if gromacs assumes a uniform density when determining how many
atoms are likely to be found in any one grid and that this may become
more of a problem as the grids get smaller?
Here is the detailed information.
gromacs version 4.0.4
cluster of 32 x Power6 boxes @ 4.3 GHz running SMT to yield 64 tasks per box.
Compiled using
Compilation Information:
export F77=xlf_r
export CC=xlc_r
export CXX=xlc++_r
export FFLAGS="-O2 -qarch=pwr6 -qtune=pwr6"
export CFLAGS="-O2 -qarch=pwr6 -qtune=pwr6"
export CXXFLAGS="-O2 -qarch=pwr6 -qtune=pwr6"
export FFTW_LOCATION=/scratch/cneale/exe/fftw-3.1.2_aix/exec
export GROMACS_LOCATION=/scratch/cneale/exe/gromacs-4.0.4_aix_o2/exec
cd /scratch/cneale/exe/gromacs-4.0.4_aix_o2
mkdir exec
./configure --prefix=$GROMACS_LOCATION --without-motif-includes
>output.configure 2>&1
make >output.make 2>&1
make install >output.make_install 2>&1
make distclean
[cneale at tcs-f11n05]$ cat incubator1.mdp
title = seriousMD
cpp = cpp
integrator = md
nsteps = 500
tinit = 0
dt = 0.002
comm_mode = linear
nstcomm = 1
comm_grps = System
nstxout = 5000
nstvout = 5000
nstfout = 5000
nstlog = 5000
nstlist = 10
nstenergy = 5000
nstxtcout = 5000
ns_type = grid
pbc = xyz
coulombtype = PME
rcoulomb = 0.9
fourierspacing = 0.12
pme_order = 4
vdwtype = cut-off
rvdw_switch = 0
rvdw = 1.4
rlist = 0.9
DispCorr = no
Pcoupl = Berendsen
pcoupltype = isotropic
compressibility = 4.5e-5
ref_p = 1.
tau_p = 4.0
tcoupl = Berendsen
tc_grps = Protein DPC_LDA SOL_NA+
tau_t = 0.1 0.1 0.1
ref_t = 300. 300. 300.
annealing = no
gen_vel = yes
unconstrained-start = no
gen_temp = 300.
gen_seed = 9896
constraints = all-bonds
constraint_algorithm= lincs
lincs-iter = 1
lincs-order = 4
[cneale at tcs-f11n05]$ cat temp.log
Log file opened on Wed Mar 4 11:38:00 2009
Host: tcs-f09n10 pid: 279344 nodeid: 0 nnodes: 196
The Gromacs distribution was built Tue Mar 3 11:49:04 EST 2009 by
cneale at tcs-f03n07 (AIX 3 00CA27F24C00)
parameters of the run:
integrator = md
nsteps = 500
init_step = 0
ns_type = Grid
nstlist = 10
ndelta = 2
nstcomm = 1
comm_mode = Linear
nstlog = 5000
nstxout = 5000
nstvout = 5000
nstfout = 5000
nstenergy = 5000
nstxtcout = 5000
init_t = 0
delta_t = 0.002
xtcprec = 1000
nkx = 175
nky = 175
nkz = 175
pme_order = 4
ewald_rtol = 1e-05
ewald_geometry = 0
epsilon_surface = 0
optimize_fft = FALSE
ePBC = xyz
bPeriodicMols = FALSE
bContinuation = FALSE
etc = Berendsen
epc = Berendsen
epctype = Isotropic
tau_p = 4
ref_p (3x3):
ref_p[ 0]={ 1.00000e+00, 0.00000e+00, 0.00000e+00}
ref_p[ 1]={ 0.00000e+00, 1.00000e+00, 0.00000e+00}
ref_p[ 2]={ 0.00000e+00, 0.00000e+00, 1.00000e+00}
compress (3x3):
compress[ 0]={ 4.50000e-05, 0.00000e+00, 0.00000e+00}
compress[ 1]={ 0.00000e+00, 4.50000e-05, 0.00000e+00}
compress[ 2]={ 0.00000e+00, 0.00000e+00, 4.50000e-05}
refcoord_scaling = No
posres_com (3):
posres_com[0]= 0.00000e+00
posres_com[1]= 0.00000e+00
posres_com[2]= 0.00000e+00
posres_comB (3):
posres_comB[0]= 0.00000e+00
posres_comB[1]= 0.00000e+00
posres_comB[2]= 0.00000e+00
andersen_seed = 815131
rlist = 0.9
rtpi = 0.05
coulombtype = PME
rcoulomb_switch = 0
rcoulomb = 0.9
vdwtype = Cut-off
rvdw_switch = 0
rvdw = 1.4
epsilon_r = 1
epsilon_rf = 1
tabext = 1
implicit_solvent = No
gb_algorithm = Still
gb_epsilon_solvent = 80
nstgbradii = 1
rgbradii = 2
gb_saltconc = 0
gb_obc_alpha = 1
gb_obc_beta = 0.8
gb_obc_gamma = 4.85
sa_surface_tension = 2.092
DispCorr = No
free_energy = no
init_lambda = 0
sc_alpha = 0
sc_power = 0
sc_sigma = 0.3
delta_lambda = 0
nwall = 0
wall_type = 9-3
wall_atomtype[0] = -1
wall_atomtype[1] = -1
wall_density[0] = 0
wall_density[1] = 0
wall_ewald_zfac = 3
pull = no
disre = No
disre_weighting = Conservative
disre_mixed = FALSE
dr_fc = 1000
dr_tau = 0
nstdisreout = 100
orires_fc = 0
orires_tau = 0
nstorireout = 100
dihre-fc = 1000
em_stepsize = 0.01
em_tol = 10
niter = 20
fc_stepsize = 0
nstcgsteep = 1000
nbfgscorr = 10
ConstAlg = Lincs
shake_tol = 0.0001
lincs_order = 4
lincs_warnangle = 30
lincs_iter = 1
bd_fric = 0
ld_seed = 1993
cos_accel = 0
deform (3x3):
deform[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
deform[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
deform[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
userint1 = 0
userint2 = 0
userint3 = 0
userint4 = 0
userreal1 = 0
userreal2 = 0
userreal3 = 0
userreal4 = 0
nrdf: 40031.9 67943.8 987429
ref_t: 300 300 300
tau_t: 0.1 0.1 0.1
anneal: No No No
ann_npoints: 0 0 0
acc: 0 0 0
nfreeze: N N N
energygrp_flags[ 0]: 0
n = 0
n = 0
n = 0
n = 0
n = 0
n = 0
QMconstraints = 0
QMMMscheme = 0
scalefactor = 1
ngQM = 0
Initializing Domain Decomposition on 196 nodes
Dynamic load balancing: auto
Will sort the charge groups at every domain (re)decomposition
Initial maximum inter charge-group distances:
two-body bonded interactions: 0.556 nm, LJ-14, atoms 25035 25038
multi-body bonded interactions: 0.556 nm, Proper Dih., atoms 25035 25038
Minimum cell size due to bonded interactions: 0.612 nm
Maximum distance for 5 constraints, at 120 deg. angles, all-trans: 0.820 nm
Estimated maximum distance required for P-LINCS: 0.820 nm
This distance will limit the DD cell size, you can override this with -rcon
Guess for relative PME load: 0.37
Will use 108 particle-particle and 88 PME only nodes
This is a guess, check the performance at the end of the log file
Using 88 separate PME nodes
Scaling the initial minimum size with 1/0.8 (option -dds) = 1.25
Optimizing the DD grid for 108 cells with a minimum initial size of 1.025 nm
The maximum allowed number of cells is: X 16 Y 16 Z 14
Domain decomposition grid 6 x 6 x 3, separate PME nodes 88
Interleaving PP and PME nodes
This is a particle-particle only node
Domain decomposition nodeid 0, coordinates 0 0 0
Using two step summing over 4 groups of on average 27.0 processes
Table routines are used for coulomb: TRUE
Table routines are used for vdw: FALSE
Will do PME sum in reciprocal space.
Using a Gaussian width (1/beta) of 0.288146 nm for Ewald
Cut-off's: NS: 0.9 Coulomb: 0.9 LJ: 1.4
System total charge: 0.000
Generated table with 1200 data points for Ewald.
Tabscale = 500 points/nm
Generated table with 1200 data points for LJ6.
Tabscale = 500 points/nm
Generated table with 1200 data points for LJ12.
Tabscale = 500 points/nm
Generated table with 1200 data points for 1-4 COUL.
Tabscale = 500 points/nm
Generated table with 1200 data points for 1-4 LJ6.
Tabscale = 500 points/nm
Generated table with 1200 data points for 1-4 LJ12.
Tabscale = 500 points/nm
Enabling TIP4p water optimization for 164564 molecules.
Configuring nonbonded kernels...
Removing pbc first time
Initializing Parallel LINear Constraint Solver
The number of constraints is 52488
There are inter charge-group constraints,
will communicate selected coordinates each lincs iteration
Linking all bonded interactions to atoms
There are 156384 inter charge-group exclusions,
will use an extra communication step for exclusion forces for PME
The initial number of communication pulses is: X 1 Y 1 Z 1
The initial domain decomposition cell size is: X 2.81 nm Y 2.81 nm Z 4.86 nm
The maximum allowed distance for charge groups involved in interactions is:
non-bonded interactions 1.400 nm
(the following are initial values, they could change due to box deformation)
two-body bonded interactions (-rdd) 1.400 nm
multi-body bonded interactions (-rdd) 1.400 nm
atoms separated by up to 5 constraints (-rcon) 2.806 nm
When dynamic load balancing gets turned on, these settings will change to:
The maximum number of communication pulses is: X 1 Y 1 Z 1
The minimum size for domain decomposition cells is 1.400 nm
The requested allowed shrink of DD cells (option -dds) is: 0.80
The allowed shrink of domain decomposition cells is: X 0.50 Y 0.50 Z 0.29
The maximum allowed distance for charge groups involved in interactions is:
non-bonded interactions 1.400 nm
two-body bonded interactions (-rdd) 1.400 nm
multi-body bonded interactions (-rdd) 1.400 nm
atoms separated by up to 5 constraints (-rcon) 1.400 nm
Making 3D domain decomposition grid 6 x 6 x 3, home cell index 0 0 0
Center of mass motion removal mode is Linear
We have the following groups for center of mass motion removal:
0: System
There are: 547196 Atoms
There are: 164564 VSites
Charge group distribution at step 0: 1835 1783 1808 1763 1765 1723
1796 1810 1783 1889 1773 1802 1777 1813 1726 1765 1801 1790 1766 1759
1733 1767 1738 1761 1744 1728 1775 1785 1759 1758 1754 1807 1738 1781
1716 1751 1775 1771 1794 1764 1765 1777 1798 1776 1827 1819 1752 1800
1751 1809 1768 1849 1788 1847 1824 1767 1834 1745 1737 1753 1776 1778
1785 1829 1788 1863 1725 1785 1778 1800 1754 1820 1747 1757 1773 1773
1819 1726 1745 1784 1772 1753 1771 1788 1781 1784 1734 1761 1769 1733
1801 1767 1798 1781 1779 1758 1796 1800 1856 1795 1740 1828 1736 1779
1783 1795 1776 1849
Grid: 13 x 13 x 10 cells
Constraining the starting coordinates (step 0)
Constraining the coordinates at t0-dt (step 0)
RMS relative constraint deviation after constraining: 3.34e-05
Initial temperature: 300.336 K
Started mdrun on node 0 Wed Mar 4 11:38:03 2009
Step Time Lambda
0 0.00000 0.00000
Energies (kJ/mol)
Angle Proper Dih. Ryckaert-Bell. LJ-14 Coulomb-14
6.76081e+04 1.72586e+04 2.77334e+04 4.21216e+04 2.71567e+05
LJ (SR) LJ (LR) Coulomb (SR) Coul. recip. Potential
1.55700e+06 -4.61765e+04 -9.26002e+06 -1.91687e+06 -9.23978e+06
Kinetic En. Total Energy Temperature Pressure (bar) Cons. rmsd ()
1.37042e+06 -7.86936e+06 3.00934e+02 -2.95959e+03 5.41981e-05
<this is the end of the log file>
[cneale at tcs-f11n05]$cat my.stderr
ATTENTION: 0031-408 196 tasks allocated by LoadLeveler, continuing...
NNODES=196, MYRANK=192, HOSTNAME=tcs-f04n08
NNODES=196, MYRANK=4, HOSTNAME=tcs-f09n10
NODEID=64 argc=3
NODEID=63 argc=3
:-) /scratch/cneale/exe/gromacs-4.0.4_aix_o2/exec/bin/mdrun_mpi (-:
Option Filename Type Description
-s temp.tpr Input Run input file: tpr tpb tpa
-o temp.trr Output Full precision trajectory: trr trj cpt
-x temp.xtc Output, Opt. Compressed trajectory (portable xdr format)
-cpi temp.cpt Input, Opt. Checkpoint file
-cpo temp.cpt Output, Opt. Checkpoint file
-c temp.gro Output Structure file: gro g96 pdb
-e temp.edr Output Energy file: edr ene
-g temp.log Output Log file
-dgdl temp.xvg Output, Opt. xvgr/xmgr file
-field temp.xvg Output, Opt. xvgr/xmgr file
-table temp.xvg Input, Opt. xvgr/xmgr file
-tablep temp.xvg Input, Opt. xvgr/xmgr file
-tableb temp.xvg Input, Opt. xvgr/xmgr file
-rerun temp.xtc Input, Opt. Trajectory: xtc trr trj gro g96 pdb cpt
-tpi temp.xvg Output, Opt. xvgr/xmgr file
-tpid temp.xvg Output, Opt. xvgr/xmgr file
-ei temp.edi Input, Opt. ED sampling input
-eo temp.edo Output, Opt. ED sampling output
-j temp.gct Input, Opt. General coupling stuff
-jo temp.gct Output, Opt. General coupling stuff
-ffout temp.xvg Output, Opt. xvgr/xmgr file
-devout temp.xvg Output, Opt. xvgr/xmgr file
-runav temp.xvg Output, Opt. xvgr/xmgr file
-px temp.xvg Output, Opt. xvgr/xmgr file
-pf temp.xvg Output, Opt. xvgr/xmgr file
-mtx temp.mtx Output, Opt. Hessian matrix
-dn temp.ndx Output, Opt. Index file
Option Type Value Description
-[no]h bool no Print help info and quit
-nice int 0 Set the nicelevel
-deffnm string temp Set the default filename for all file options
-[no]xvgr bool yes Add specific codes (legends etc.) in the output
xvg files for the xmgrace program
-[no]pd bool no Use particle decompostion
-dd vector 0 0 0 Domain decomposition grid, 0 is optimize
-npme int -1 Number of separate nodes to be used for PME, -1
is guess
-ddorder enum interleave DD node order: interleave, pp_pme or cartesian
-[no]ddcheck bool yes Check for all bonded interactions with DD
-rdd real 0 The maximum distance for bonded interactions with
DD (nm), 0 is determine from initial coordinates
-rcon real 0 Maximum distance for P-LINCS (nm), 0 is estimate
-dlb enum auto Dynamic load balancing (with DD): auto, no or yes
-dds real 0.8 Minimum allowed dlb scaling of the DD cell size
-[no]sum bool yes Sum the energies at every step
-[no]v bool no Be loud and noisy
-[no]compact bool yes Write a compact log file
-[no]seppot bool no Write separate V and dVdl terms for each
interaction type and node to the log file(s)
-pforce real -1 Print all forces larger than this (kJ/mol nm)
-[no]reprod bool no Try to avoid optimizations that affect binary
-cpt real 15 Checkpoint interval (minutes)
-[no]append bool no Append to previous output files when continuing
from checkpoint
-[no]addpart bool yes Add the simulation part number to all output
files when continuing from checkpoint
-maxh real -1 Terminate after 0.99 times this time (hours)
-multi int 0 Do multiple simulations in parallel
-replex int 0 Attempt replica exchange every # steps
-reseed int -1 Seed for replica exchange, -1 is generate a seed
-[no]glas bool no Do glass simulation with special long range
-[no]ionize bool no Do a simulation including the effect of an X-Ray
bombardment on your system
Reading file temp.tpr, VERSION 4.0.4 (single precision)
Will use 108 particle-particle and 88 PME only nodes
This is a guess, check the performance at the end of the log file
NOTE: For optimal PME load balancing at high parallelization
PME grid_x (175) and grid_y (175) should be divisible by #PME_nodes (88)
Making 3D domain decomposition 6 x 6 x 3
starting mdrun 'Big Box'
500 steps, 1.0 ps.
ERROR: 0032-117 User pack or receive buffer is too small (24) in
MPI_Sendrecv, task 183
Then from the IBM website:
Parallel Environment for Linux V4.3 Messages
User pack or receive buffer too small (number) in string, task number
The buffer specified for the operation was too small to hold the
message. In the PACK and UNPACK cases it is the space between current
position and buffer end which is too small.
User response
Increase the size of the buffer or reduce the size of the message.
### And the error that I previously reported was:
Parallel Environment for Linux V4.3 Messages
Invalid count (number) in string, task number
The value of count (element count) is out of range.
User response
Make sure that the count is greater than or equal to zero.
Error Class: MPI_ERR_COUNT
### And if I run it a third time, I don't get an MPI based error, but a crash:
$tail temp.log
Reading file temp.tpr, VERSION 4.0.4 (single precision)
Will use 108 particle-particle and 88 PME only nodes
This is a guess, check the performance at the end of the log file
NOTE: For optimal PME load balancing at high parallelization
PME grid_x (175) and grid_y (175) should be divisible by #PME_nodes (88)
Making 3D domain decomposition 6 x 6 x 3
starting mdrun 'Big Box'
500 steps, 1.0 ps.
t = 0.222 ps: Water molecule starting at atom 105813 can not be settled.
Check for bad contacts and/or reduce the timestep.
Wrote pdb files with previous and current coordinates
Step 112, time 0.224 (ps) LINCS WARNING
relative constraint deviation after LINCS:
rms 18077115.836165, max 343959040.000000 (between atoms 24714 and 24713)
bonds that rotated more than 30 degrees:
atom 1 atom 2 angle previous, current, constraint length
24715 24714 90.3 0.1530 14177746.0000 0.1530
24716 24715 93.1 0.1530 2540901.0000 0.1530
24717 24716 91.1 0.1530 473629.9375 0.1530
24718 24717 103.1 0.1530 63926.0820 0.1530
24719 24718 113.0 0.1530 3528.4724 0.1530
24720 24719 33.0 0.1530 0.1829 0.1530
24706 24703 94.5 0.1470 72767.5391 0.1470
24706 24704 91.5 0.1470 207189.8906 0.1470
24706 24705 94.1 0.1470 73579.9141 0.1470
24707 24706 97.4 0.1470 71872.3125 0.1470
24708 24707 94.8 0.1530 282677.5000 0.1530
24709 24708 90.6 0.1430 3258115.2500 0.1430
24710 24709 90.4 0.1610 15580227.0000 0.1610
24713 24710 90.2 0.1610 50052624.0000 0.1610
24712 24710 90.7 0.1480 15122392.0000 0.1480
24711 24710 90.8 0.1480 14735100.0000 0.1480
24714 24713 90.5 0.1430 49186144.0000 0.1430
t = 0.224 ps: Water molecule starting at atom 164033 can not be settled.
Check for bad contacts and/or reduce the timestep.
t = 0.224 ps: Water molecule starting at atom 163709 can not be settled.
Check for bad contacts and/or reduce the timestep.
Wrote pdb files with previous and current coordinates
Wrote pdb files with previous and current coordinates
ERROR: 0031-250 task 145: Segmentation fault
ERROR: 0031-250 task 151: Segmentation fault
ERROR: 0031-250 task 153: Segmentation fault
ERROR: 0031-250 task 112: Segmentation fault
ERROR: 0031-250 task 118: Segmentation fault
ERROR: 0031-250 task 119: Segmentation fault
Many thanks,
