[gmx-users] Count mismatch for state entry SDx, code count is 754728, file count is 0

Chris Neale chris.neale at utoronto.ca
Thu May 5 18:36:21 CEST 2011


Dear Users:

Using gromacs 4.0.5, I find that there are at least some cases where 
some type of disk error can get propagated through both my.tpr and 
my_prev.tpr, complicating restarts. This used to be a bigger problem in 
gromacs 3, and I don't recall ever seeing it in gromacs 4 so I thought I 
would post a notification.

I'm just going to extract some coordinates and restart, but ideally this 
wouldn't happen. A google search for the relevant error "Count mismatch 
for state entry" only turns up some online source code.

I don't know if this error occurs in 4.5.3, and it's not binary 
reproducible so that would be difficult to check. Still, the error 
checking that regularly occurs prior to overwriting the previous (and 
without error) _prev.cpt file with a new (and with error) _prev.cpt file 
seemed to not catch this problem, at least with gromacs 4.0.5.

The run that wrote out the .tpr finished normally due to -maxh, with a 
stderr that looked like this:

... < snip > ...
starting mdrun 'Generated by genbox'
10000000 steps,  20000.0 ps (continuing from step 3769350,   7538.7 ps).
[gpc-f138n034:06165] 15 more processes have sent help message 
help-mpi-btl-base.txt / btl:no-nics
[gpc-f138n034:06165] Set MCA parameter "orte_base_help_aggregate" to 0 
to see all help / error messages

Step 5036590: Run time exceeded 47.322 hours, will terminate the run

Step 5036600: Run time exceeded 47.322 hours, will terminate the run

  Average load imbalance: 0.2 %
  Part of the total run time spent waiting due to load imbalance: 0.2 %
  Steps where the load balancing was limited by -rdd, -rcon and/or -dds: 
X 0 % Z 0 %
  Average PME mesh/force load: 0.745
  Part of the total run time spent waiting due to PP/PME imbalance: 4.9 %


         Parallel run - timing based on wallclock.

                NODE (s)   Real (s)      (%)
        Time: 170485.000 170485.000    100.0
                        1d23h21:25
                (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
Performance:    625.583     31.889      1.284     18.685

gcq#165: "I'm a Jerk" (F. Black)


gcq#165: "I'm a Jerk" (F. Black)

#############################################

And then when I gmxcheck both of the .cpt files I get the exact same 
error, although the files do differ:

$ diff md1.cpt md1_prev.cpt
Binary files md1.cpt and md1_prev.cpt differ


$ gmxcheck  -f md1.cpt
                          :-)  G  R  O  M  A  C  S  (-:

                               S  C  A  M  O  R  G

                             :-)  VERSION 4.0.5  (-:


       Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
        Copyright (c) 1991-2000, University of Groningen, The Netherlands.
              Copyright (c) 2001-2008, The GROMACS development team,
             check out http://www.gromacs.org for more information.

          This program is free software; you can redistribute it and/or
           modify it under the terms of the GNU General Public License
          as published by the Free Software Foundation; either version 2
              of the License, or (at your option) any later version.

                                :-)  gmxcheck  (-:

Option     Filename  Type         Description
------------------------------------------------------------
   -f        md1.cpt  Input, Opt!  Trajectory: xtc trr trj gro g96 pdb cpt
  -f2       traj.xtc  Input, Opt.  Trajectory: xtc trr trj gro g96 pdb cpt
  -s1       top1.tpr  Input, Opt.  Run input file: tpr tpb tpa
  -s2       top2.tpr  Input, Opt.  Run input file: tpr tpb tpa
   -c      topol.tpr  Input, Opt.  Structure+mass(db): tpr tpb tpa gro 
g96 pdb
   -e       ener.edr  Input, Opt.  Energy file: edr ene
  -e2      ener2.edr  Input, Opt.  Energy file: edr ene
   -n      index.ndx  Input, Opt.  Index file
   -m        doc.tex  Output, Opt. LaTeX file

Option       Type   Value   Description
------------------------------------------------------
-[no]h       bool   no      Print help info and quit
-nice        int    0       Set the nicelevel
-vdwfac      real   0.8     Fraction of sum of VdW radii used as warning
                             cutoff
-bonlo       real   0.4     Min. fract. of sum of VdW radii for bonded atoms
-bonhi       real   0.7     Max. fract. of sum of VdW radii for bonded atoms
-tol         real   0.001   Relative tolerance for comparing real values
                             defined as 2*(a-b)/(|a|+|b|)
-[no]ab      bool   no      Compare the A and B topology from one file
-lastener    string         Last energy term to compare (if not given 
all are
                             tested). It makes sense to go up until the
                             Pressure.

Checking file md1.cpt

-------------------------------------------------------
Program gmxcheck, VERSION 4.0.5
Source code file: checkpoint.c, line: 186

Fatal error:
Count mismatch for state entry SDx, code count is 754728, file count is 0

-------------------------------------------------------

"Confirmed" (Star Trek)

############################ and the same thing for the _prev.cpt file:

# gmxcheck  -f md1_prev.cpt
                          :-)  G  R  O  M  A  C  S  (-:

                        GRowing Old MAkes el Chrono Sweat

                             :-)  VERSION 4.0.5  (-:


       Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
        Copyright (c) 1991-2000, University of Groningen, The Netherlands.
              Copyright (c) 2001-2008, The GROMACS development team,
             check out http://www.gromacs.org for more information.

          This program is free software; you can redistribute it and/or
           modify it under the terms of the GNU General Public License
          as published by the Free Software Foundation; either version 2
              of the License, or (at your option) any later version.

                                :-)  gmxcheck  (-:

Option     Filename  Type         Description
------------------------------------------------------------
   -f   md1_prev.cpt  Input, Opt!  Trajectory: xtc trr trj gro g96 pdb cpt
  -f2       traj.xtc  Input, Opt.  Trajectory: xtc trr trj gro g96 pdb cpt
  -s1       top1.tpr  Input, Opt.  Run input file: tpr tpb tpa
  -s2       top2.tpr  Input, Opt.  Run input file: tpr tpb tpa
   -c      topol.tpr  Input, Opt.  Structure+mass(db): tpr tpb tpa gro 
g96 pdb
   -e       ener.edr  Input, Opt.  Energy file: edr ene
  -e2      ener2.edr  Input, Opt.  Energy file: edr ene
   -n      index.ndx  Input, Opt.  Index file
   -m        doc.tex  Output, Opt. LaTeX file

Option       Type   Value   Description
------------------------------------------------------
-[no]h       bool   no      Print help info and quit
-nice        int    0       Set the nicelevel
-vdwfac      real   0.8     Fraction of sum of VdW radii used as warning
                             cutoff
-bonlo       real   0.4     Min. fract. of sum of VdW radii for bonded atoms
-bonhi       real   0.7     Max. fract. of sum of VdW radii for bonded atoms
-tol         real   0.001   Relative tolerance for comparing real values
                             defined as 2*(a-b)/(|a|+|b|)
-[no]ab      bool   no      Compare the A and B topology from one file
-lastener    string         Last energy term to compare (if not given 
all are
                             tested). It makes sense to go up until the
                             Pressure.

Checking file md1_prev.cpt

-------------------------------------------------------
Program gmxcheck, VERSION 4.0.5
Source code file: checkpoint.c, line: 186

Fatal error:
Count mismatch for state entry SDx, code count is 754728, file count is 0

-------------------------------------------------------

"I'm Only Faking When I Get It Right" (Soundgarden)




More information about the gromacs.org_gmx-users mailing list