[gmx-users] Re: Count mismatch for state entry SDx, code count is 754728, file count is 0
Chris Neale
chris.neale at utoronto.ca
Thu May 5 18:38:38 CEST 2011
Apologies: my.tpr and my_prev.tpr should have read my.cpt and my_prev.cpt.
On 11-05-05 12:36 PM, Chris Neale wrote:
> Dear Users:
>
> Using gromacs 4.0.5, I find that there are at least some cases where
> some type of disk error can get propagated through both my.tpr and
> my_prev.tpr, complicating restarts. This used to be a bigger problem
> in gromacs 3, and I don't recall ever seeing it in gromacs 4 so I
> thought I would post a notification.
>
> I'm just going to extract some coordinates and restart, but ideally
> this wouldn't happen. A google search for the relevant error "Count
> mismatch for state entry" only turns up some online source code.
>
> I don't know if this error occurs in 4.5.3, and it's not binary
> reproducible so that would be difficult to check. Still, the error
> checking that regularly occurs prior to overwriting the previous (and
> without error) _prev.cpt file with a new (and with error) _prev.cpt
> file seemed to not catch this problem, at least with gromacs 4.0.5.
>
> The run that wrote out the .tpr finished normally due to -maxh, with a
> stderr that looked like this:
>
> ... < snip > ...
> starting mdrun 'Generated by genbox'
> 10000000 steps, 20000.0 ps (continuing from step 3769350, 7538.7 ps).
> [gpc-f138n034:06165] 15 more processes have sent help message
> help-mpi-btl-base.txt / btl:no-nics
> [gpc-f138n034:06165] Set MCA parameter "orte_base_help_aggregate" to 0
> to see all help / error messages
>
> Step 5036590: Run time exceeded 47.322 hours, will terminate the run
>
> Step 5036600: Run time exceeded 47.322 hours, will terminate the run
>
> Average load imbalance: 0.2 %
> Part of the total run time spent waiting due to load imbalance: 0.2 %
> Steps where the load balancing was limited by -rdd, -rcon and/or
> -dds: X 0 % Z 0 %
> Average PME mesh/force load: 0.745
> Part of the total run time spent waiting due to PP/PME imbalance: 4.9 %
>
>
> Parallel run - timing based on wallclock.
>
> NODE (s) Real (s) (%)
> Time: 170485.000 170485.000 100.0
> 1d23h21:25
> (Mnbf/s) (GFlops) (ns/day) (hour/ns)
> Performance: 625.583 31.889 1.284 18.685
>
> gcq#165: "I'm a Jerk" (F. Black)
>
>
> gcq#165: "I'm a Jerk" (F. Black)
>
> #############################################
>
> And then when I gmxcheck both of the .cpt files I get the exact same
> error, although the files do differ:
>
> $ diff md1.cpt md1_prev.cpt
> Binary files md1.cpt and md1_prev.cpt differ
>
>
> $ gmxcheck -f md1.cpt
> :-) G R O M A C S (-:
>
> S C A M O R G
>
> :-) VERSION 4.0.5 (-:
>
>
> Written by David van der Spoel, Erik Lindahl, Berk Hess, and
> others.
> Copyright (c) 1991-2000, University of Groningen, The Netherlands.
> Copyright (c) 2001-2008, The GROMACS development team,
> check out http://www.gromacs.org for more information.
>
> This program is free software; you can redistribute it and/or
> modify it under the terms of the GNU General Public License
> as published by the Free Software Foundation; either version 2
> of the License, or (at your option) any later version.
>
> :-) gmxcheck (-:
>
> Option Filename Type Description
> ------------------------------------------------------------
> -f md1.cpt Input, Opt! Trajectory: xtc trr trj gro g96 pdb cpt
> -f2 traj.xtc Input, Opt. Trajectory: xtc trr trj gro g96 pdb cpt
> -s1 top1.tpr Input, Opt. Run input file: tpr tpb tpa
> -s2 top2.tpr Input, Opt. Run input file: tpr tpb tpa
> -c topol.tpr Input, Opt. Structure+mass(db): tpr tpb tpa gro
> g96 pdb
> -e ener.edr Input, Opt. Energy file: edr ene
> -e2 ener2.edr Input, Opt. Energy file: edr ene
> -n index.ndx Input, Opt. Index file
> -m doc.tex Output, Opt. LaTeX file
>
> Option Type Value Description
> ------------------------------------------------------
> -[no]h bool no Print help info and quit
> -nice int 0 Set the nicelevel
> -vdwfac real 0.8 Fraction of sum of VdW radii used as warning
> cutoff
> -bonlo real 0.4 Min. fract. of sum of VdW radii for bonded
> atoms
> -bonhi real 0.7 Max. fract. of sum of VdW radii for bonded
> atoms
> -tol real 0.001 Relative tolerance for comparing real values
> defined as 2*(a-b)/(|a|+|b|)
> -[no]ab bool no Compare the A and B topology from one file
> -lastener string Last energy term to compare (if not given
> all are
> tested). It makes sense to go up until the
> Pressure.
>
> Checking file md1.cpt
>
> -------------------------------------------------------
> Program gmxcheck, VERSION 4.0.5
> Source code file: checkpoint.c, line: 186
>
> Fatal error:
> Count mismatch for state entry SDx, code count is 754728, file count is 0
>
> -------------------------------------------------------
>
> "Confirmed" (Star Trek)
>
> ############################ and the same thing for the _prev.cpt file:
>
> # gmxcheck -f md1_prev.cpt
> :-) G R O M A C S (-:
>
> GRowing Old MAkes el Chrono Sweat
>
> :-) VERSION 4.0.5 (-:
>
>
> Written by David van der Spoel, Erik Lindahl, Berk Hess, and
> others.
> Copyright (c) 1991-2000, University of Groningen, The Netherlands.
> Copyright (c) 2001-2008, The GROMACS development team,
> check out http://www.gromacs.org for more information.
>
> This program is free software; you can redistribute it and/or
> modify it under the terms of the GNU General Public License
> as published by the Free Software Foundation; either version 2
> of the License, or (at your option) any later version.
>
> :-) gmxcheck (-:
>
> Option Filename Type Description
> ------------------------------------------------------------
> -f md1_prev.cpt Input, Opt! Trajectory: xtc trr trj gro g96 pdb cpt
> -f2 traj.xtc Input, Opt. Trajectory: xtc trr trj gro g96 pdb cpt
> -s1 top1.tpr Input, Opt. Run input file: tpr tpb tpa
> -s2 top2.tpr Input, Opt. Run input file: tpr tpb tpa
> -c topol.tpr Input, Opt. Structure+mass(db): tpr tpb tpa gro
> g96 pdb
> -e ener.edr Input, Opt. Energy file: edr ene
> -e2 ener2.edr Input, Opt. Energy file: edr ene
> -n index.ndx Input, Opt. Index file
> -m doc.tex Output, Opt. LaTeX file
>
> Option Type Value Description
> ------------------------------------------------------
> -[no]h bool no Print help info and quit
> -nice int 0 Set the nicelevel
> -vdwfac real 0.8 Fraction of sum of VdW radii used as warning
> cutoff
> -bonlo real 0.4 Min. fract. of sum of VdW radii for bonded
> atoms
> -bonhi real 0.7 Max. fract. of sum of VdW radii for bonded
> atoms
> -tol real 0.001 Relative tolerance for comparing real values
> defined as 2*(a-b)/(|a|+|b|)
> -[no]ab bool no Compare the A and B topology from one file
> -lastener string Last energy term to compare (if not given
> all are
> tested). It makes sense to go up until the
> Pressure.
>
> Checking file md1_prev.cpt
>
> -------------------------------------------------------
> Program gmxcheck, VERSION 4.0.5
> Source code file: checkpoint.c, line: 186
>
> Fatal error:
> Count mismatch for state entry SDx, code count is 754728, file count is 0
>
> -------------------------------------------------------
>
> "I'm Only Faking When I Get It Right" (Soundgarden)
>
More information about the gromacs.org_gmx-users
mailing list