[gmx-users] Count mismatch for state entry SDx, code count is 754728, file count is 0
Chris Neale
chris.neale at utoronto.ca
Thu May 5 18:36:21 CEST 2011
Dear Users:
Using gromacs 4.0.5, I find that there are at least some cases where
some type of disk error can get propagated through both my.tpr and
my_prev.tpr, complicating restarts. This used to be a bigger problem in
gromacs 3, and I don't recall ever seeing it in gromacs 4 so I thought I
would post a notification.
I'm just going to extract some coordinates and restart, but ideally this
wouldn't happen. A google search for the relevant error "Count mismatch
for state entry" only turns up some online source code.
I don't know if this error occurs in 4.5.3, and it's not binary
reproducible so that would be difficult to check. Still, the error
checking that regularly occurs prior to overwriting the previous (and
without error) _prev.cpt file with a new (and with error) _prev.cpt file
seemed to not catch this problem, at least with gromacs 4.0.5.
The run that wrote out the .tpr finished normally due to -maxh, with a
stderr that looked like this:
... < snip > ...
starting mdrun 'Generated by genbox'
10000000 steps, 20000.0 ps (continuing from step 3769350, 7538.7 ps).
[gpc-f138n034:06165] 15 more processes have sent help message
help-mpi-btl-base.txt / btl:no-nics
[gpc-f138n034:06165] Set MCA parameter "orte_base_help_aggregate" to 0
to see all help / error messages
Step 5036590: Run time exceeded 47.322 hours, will terminate the run
Step 5036600: Run time exceeded 47.322 hours, will terminate the run
Average load imbalance: 0.2 %
Part of the total run time spent waiting due to load imbalance: 0.2 %
Steps where the load balancing was limited by -rdd, -rcon and/or -dds:
X 0 % Z 0 %
Average PME mesh/force load: 0.745
Part of the total run time spent waiting due to PP/PME imbalance: 4.9 %
Parallel run - timing based on wallclock.
NODE (s) Real (s) (%)
Time: 170485.000 170485.000 100.0
1d23h21:25
(Mnbf/s) (GFlops) (ns/day) (hour/ns)
Performance: 625.583 31.889 1.284 18.685
gcq#165: "I'm a Jerk" (F. Black)
gcq#165: "I'm a Jerk" (F. Black)
#############################################
And then when I gmxcheck both of the .cpt files I get the exact same
error, although the files do differ:
$ diff md1.cpt md1_prev.cpt
Binary files md1.cpt and md1_prev.cpt differ
$ gmxcheck -f md1.cpt
:-) G R O M A C S (-:
S C A M O R G
:-) VERSION 4.0.5 (-:
Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2008, The GROMACS development team,
check out http://www.gromacs.org for more information.
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License
as published by the Free Software Foundation; either version 2
of the License, or (at your option) any later version.
:-) gmxcheck (-:
Option Filename Type Description
------------------------------------------------------------
-f md1.cpt Input, Opt! Trajectory: xtc trr trj gro g96 pdb cpt
-f2 traj.xtc Input, Opt. Trajectory: xtc trr trj gro g96 pdb cpt
-s1 top1.tpr Input, Opt. Run input file: tpr tpb tpa
-s2 top2.tpr Input, Opt. Run input file: tpr tpb tpa
-c topol.tpr Input, Opt. Structure+mass(db): tpr tpb tpa gro
g96 pdb
-e ener.edr Input, Opt. Energy file: edr ene
-e2 ener2.edr Input, Opt. Energy file: edr ene
-n index.ndx Input, Opt. Index file
-m doc.tex Output, Opt. LaTeX file
Option Type Value Description
------------------------------------------------------
-[no]h bool no Print help info and quit
-nice int 0 Set the nicelevel
-vdwfac real 0.8 Fraction of sum of VdW radii used as warning
cutoff
-bonlo real 0.4 Min. fract. of sum of VdW radii for bonded atoms
-bonhi real 0.7 Max. fract. of sum of VdW radii for bonded atoms
-tol real 0.001 Relative tolerance for comparing real values
defined as 2*(a-b)/(|a|+|b|)
-[no]ab bool no Compare the A and B topology from one file
-lastener string Last energy term to compare (if not given
all are
tested). It makes sense to go up until the
Pressure.
Checking file md1.cpt
-------------------------------------------------------
Program gmxcheck, VERSION 4.0.5
Source code file: checkpoint.c, line: 186
Fatal error:
Count mismatch for state entry SDx, code count is 754728, file count is 0
-------------------------------------------------------
"Confirmed" (Star Trek)
############################ and the same thing for the _prev.cpt file:
# gmxcheck -f md1_prev.cpt
:-) G R O M A C S (-:
GRowing Old MAkes el Chrono Sweat
:-) VERSION 4.0.5 (-:
Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2008, The GROMACS development team,
check out http://www.gromacs.org for more information.
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License
as published by the Free Software Foundation; either version 2
of the License, or (at your option) any later version.
:-) gmxcheck (-:
Option Filename Type Description
------------------------------------------------------------
-f md1_prev.cpt Input, Opt! Trajectory: xtc trr trj gro g96 pdb cpt
-f2 traj.xtc Input, Opt. Trajectory: xtc trr trj gro g96 pdb cpt
-s1 top1.tpr Input, Opt. Run input file: tpr tpb tpa
-s2 top2.tpr Input, Opt. Run input file: tpr tpb tpa
-c topol.tpr Input, Opt. Structure+mass(db): tpr tpb tpa gro
g96 pdb
-e ener.edr Input, Opt. Energy file: edr ene
-e2 ener2.edr Input, Opt. Energy file: edr ene
-n index.ndx Input, Opt. Index file
-m doc.tex Output, Opt. LaTeX file
Option Type Value Description
------------------------------------------------------
-[no]h bool no Print help info and quit
-nice int 0 Set the nicelevel
-vdwfac real 0.8 Fraction of sum of VdW radii used as warning
cutoff
-bonlo real 0.4 Min. fract. of sum of VdW radii for bonded atoms
-bonhi real 0.7 Max. fract. of sum of VdW radii for bonded atoms
-tol real 0.001 Relative tolerance for comparing real values
defined as 2*(a-b)/(|a|+|b|)
-[no]ab bool no Compare the A and B topology from one file
-lastener string Last energy term to compare (if not given
all are
tested). It makes sense to go up until the
Pressure.
Checking file md1_prev.cpt
-------------------------------------------------------
Program gmxcheck, VERSION 4.0.5
Source code file: checkpoint.c, line: 186
Fatal error:
Count mismatch for state entry SDx, code count is 754728, file count is 0
-------------------------------------------------------
"I'm Only Faking When I Get It Right" (Soundgarden)
More information about the gromacs.org_gmx-users
mailing list