[gmx-users] RMSD truncation Restart simulation problems
Henri Mone
henriMone at gmail.com
Tue Mar 8 11:41:00 CET 2011
Hi All, hi Mark,
Here are some more details. The outputs and error messages are
attached at the end of the e-mail. After truncation I get the error
message [1a], gromacs has problems with the checksum of the trr fles.
After truncation the trajectories (xtc, trr) have the same length of
27752 frames [1b]. All the edr files have the same length of 277518
frames [1b]. The cpt files used after truncation have a step =
138762700 and t = 277525.400000 [1c].
Before truncation I got the error message [2], gromacs complains that
the 32 subsystems are not compatible.
Anyone a idea was is going wrong?
Thanks,
Henri
====1a: AFTER TRUNCATION: ERROR MESSAGE
Reading checkpoint file state1.cpt generated: Thu Jan 27 02:19:50 2011
#PME-nodes mismatch,
current program: -1
checkpoint file: 0
Reading checkpoint file state2.cpt generated: Thu Jan 27 02:19:50 2011
#PME-nodes mismatch,
current program: -1
checkpoint file: 0
Gromacs binary or parallel settings not identical to previous run.
Continuation is exact, but is not guaranteed to be binary identical.
...
-------------------------------------------------------
Program mdrun_mpi, VERSION 4.5.3
Source code file: checkpoint.c, line: 1767
Fatal error:
Can't read 1048576 bytes of 'traj1.trr' to compute checksum. The file
has been replaced or its contents has been modified.
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------
-------------------------------------------------------
Program mdrun_mpi, VERSION 4.5.3
Source code file: checkpoint.c, line: 1767
Fatal error:
Can't read 1048576 bytes of 'traj2.trr' to compute checksum. The file
has been replaced or its contents has been modified.
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------
Error on node 1, will try to stop all the nodes
Halting parallel program mdrun_mpi on CPU 1 out of 32
gcq#307: "Good Music Saves your Soul" (Lemmy)
[n030212:18418] MPI_ABORT invoked on rank 1 in communicator
MPI_COMM_WORLD with errorcode -1
====1b: AFTER TRUNCATION: XTC TRR
$ gmxcheck -f traj0.xtc
Checking file traj0.xtc
Reading frame 0 time 0.000
# Atoms 224
Precision 0.001 (nm)
Reading frame 27000 time 270000.000
Item #frames Timestep (ps)
Step 27752 10
Time 27752 10
Lambda 0
Coords 27752 10
Velocities 0
Forces 0
Box 27752 10
...
$ gmxcheck -f traj31.xtc
Checking file traj31.xtc
Reading frame 0 time 0.000
# Atoms 224
Precision 0.001 (nm)
Reading frame 27000 time 270000.000
Item #frames Timestep (ps)
Step 27752 10
Time 27752 10
Lambda 0
Coords 27752 10
Velocities 0
Forces 0
Box 27752 10
$ gmxcheck -f traj0.trr
Checking file traj0.trr
trn version: GMX_trn_file (single precision)
Reading frame 0 time 0.000
# Atoms 6647
Reading frame 27000 time 270000.000
Item #frames Timestep (ps)
Step 27752 10
Time 27752 10
Lambda 27752 10
Coords 27752 10
Velocities 27752 10
Forces 0
Box 27752 10
$ gmxcheck -f traj1.trr
Checking file traj1.trr
trn version: GMX_trn_file (single precision)
Reading frame 0 time 0.000
# Atoms 6647
Reading frame 27000 time 270000.000
Item #frames Timestep (ps)
Step 27752 10
Time 27752 10
Lambda 27752 10
Coords 27752 10
Velocities 27752 10
Forces 0
Box 27752 10
...
$ gmxcheck -f traj31.trr
Checking file traj31.trr
trn version: GMX_trn_file (single precision)
Reading frame 0 time 0.000
# Atoms 6647
Reading frame 27000 time 270000.000
Item #frames Timestep (ps)
Step 27752 10
Time 27752 10
Lambda 27752 10
Coords 27752 10
Velocities 27752 10
Forces 0
Box 27752 10
$ eneconv -f ener0.edr
Reading energy frame 0 time 0.000
Continue writing frames from t=0, step=0
Last energy frame read 138759 time 277518.000 iting frame time
276000
Last step written from ener0.edr: t 277518, step 138759000
Last frame written was at step 138759000, time 277518.000000
Wrote 138760 frames
...
$ eneconv -f ener31.edr
Reading energy frame 0 time 0.000
Continue writing frames from t=0, step=0
Last energy frame read 138759 time 277518.000 iting frame time
276000
Last step written from ener31.edr: t 277518, step 138759000
Last frame written was at step 138759000, time 277518.000000
Wrote 138760 frames
====1c: AFTER TRUNCATION: CPT
state0.cpt:
generation time = Thu Jan 27 02:19:50 2011
step = 138762700
t = 277525.400000
...
state31.cpt:
generation time = Thu Jan 27 02:19:50 2011
step = 138762700
t = 277525.400000
$ gmxdump -cp state0.cpt|less
GROMACS version = 4.5.3
GROMACS build time = Fri Dec 3 03:20:53 CET 2010
GROMACS build user = user at cluster
GROMACS build machine = Linux 2.6.18-194.17.4.el5 x86_64
generating program = /opt/gromacs-4.5.3/bin/mdrun_mpi
generation time = Thu Jan 27 02:19:50 2011
checkpoint file version = 12
generating host = n040407
#atoms = 6647
#T-coupling groups = 1
#Nose-Hoover T-chains = 0
#Nose-Hoover T-chains for barostat = 0
integrator = 0
simulation part # = 18
step = 138762700
t = 277525.400000
#PP-nodes = 1
dd_nc[x] = 1
dd_nc[y] = 1
dd_nc[z] = 1
#PME-only nodes = 0
state flags = 6594
ekin data flags = 0
energy history flags = 255
====2: BEFORE TRUNCATION
$ less md2.log
Initializing Replica Exchange
Repl There are 32 replicas:
Multi-checking the number of atoms ... OK
Multi-checking the integrator ... OK
Multi-checking init_step+nsteps ... OK
Multi-checking first exchange step: init_step/-replex ...
first exchange step: init_step/-replex is not equal for all subsystems
subsystem 0: 70425
subsystem 1: 70437
subsystem 2: 70437
subsystem 3: 70437
subsystem 4: 70437
subsystem 5: 70437
subsystem 6: 70437
subsystem 7: 70437
subsystem 8: 70437
subsystem 9: 70437
subsystem 10: 70437
subsystem 11: 70437
subsystem 12: 70437
subsystem 13: 70437
subsystem 14: 70437
subsystem 15: 70437
subsystem 16: 70425
subsystem 17: 70437
subsystem 18: 70437
subsystem 19: 70437
subsystem 20: 70437
subsystem 21: 70437
subsystem 22: 70437
subsystem 23: 70437
subsystem 24: 70425
subsystem 25: 70437
subsystem 26: 70437
subsystem 27: 70437
subsystem 28: 70437
subsystem 29: 70437
subsystem 30: 70437
subsystem 31: 70437
-------------------------------------------------------
Program mdrun_mpi, VERSION 4.5.3
Source code file: main.c, line: 189
Fatal error:
The 32 subsystems are not compatible
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------
$ gmxdump -cp state0.cpt|less
GROMACS version = 4.5.3
GROMACS build time = Fri Dec 3 03:20:53 CET 2010
GROMACS build user = user at cluster
GROMACS build machine = Linux 2.6.18-194.17.4.el5 x86_64
generating program = /opt/gromacs-4.5.3/bin/mdrun_mpi
generation time = Thu Jan 27 15:08:32 2011
checkpoint file version = 12
generating host = n040407
#atoms = 6647
#T-coupling groups = 1
#Nose-Hoover T-chains = 0
#Nose-Hoover T-chains for barostat = 0
integrator = 0
simulation part # = 19
step = 140849180
t = 281698.360000
#PP-nodes = 1
dd_nc[x] = 1
dd_nc[y] = 1
dd_nc[z] = 1
#PME-only nodes = 0
state flags = 6594
ekin data flags = 0
...
$ gmxcheck -f traj0.xtc
Reading frame 0 time 0.000
# Atoms 224
Precision 0.001 (nm)
Reading frame 28000 time 280000.000
Item #frames Timestep (ps)
Step 28170 10
Time 28170 10
Lambda 0
Coords 28170 10
Velocities 0
Forces 0
Box 28170 10
$ gmxcheck -f traj1.xtc
Reading frame 0 time 0.000
# Atoms 224
Precision 0.001 (nm)
Reading frame 28000 time 280000.000
Item #frames Timestep (ps)
Step 28175 10
Time 28175 10
Lambda 0
Coords 28175 10
Velocities 0
Forces 0
Box 28175 10
More information about the gromacs.org_gmx-users
mailing list