[gmx-developers] Bug with continuation from checkpoint with gromacs 4.0.7
Alexey Shvetsov
alexxyum at gmail.com
Tue Mar 16 00:21:23 CET 2010
On Вторник 16 марта 2010 01:45:42 Roland Schulz wrote:
> Hi,
>
> what is nsteps in the mdp file?
>
> Did the simulation up to this point really did 3377000 steps?
>
> Roland
nstep is 50000000 (10ns with 2fs timestep)
>
> 2010/3/15 Alexey Shvetsov <alexxyum at gmail.com>
>
> > Hi,
> >
> > It crashed with this error.
> > In md.log i can see
> >
> > -----------------------------------------------------------
> > Restarting from checkpoint, appending to previous log file.
> >
> > Log file opened on Sat Mar 13 01:48:10 2010
> > Host: n1 pid: 5575 nodeid: 0 nnodes: 128
> > The Gromacs distribution was built Sun Feb 28 02:57:38 MSK 2010 by
> > root at n1 (Linux 2.6.31-gentoo-r6 x86_64)
> >
> >
> >
> > Initializing Domain Decomposition on 128 nodes
> > Dynamic load balancing: auto
> > Will sort the charge groups at every domain (re)decomposition
> >
> > Initial maximum inter charge-group distances:
> > two-body bonded interactions: 0.430 nm, LJ-14, atoms 6915 6922
> >
> > multi-body bonded interactions: 0.430 nm, Ryckaert-Bell., atoms 6915
> > 6922
> >
> > Minimum cell size due to bonded interactions: 0.473 nm
> > Maximum distance for 5 constraints, at 120 deg. angles, all-trans: 0.819
> > nm Estimated maximum distance required for P-LINCS: 0.819 nm
> > This distance will limit the DD cell size, you can override this with
> > -rcon Domain decomposition grid 16 x 7 x 1, separate PME nodes 16
> > Interleaving PP and PME nodes
> > This is a particle-particle only node
> >
> > Domain decomposition nodeid 0, coordinates 0 0 0
> >
> > Using two step summing over 16 groups of on average 7.0 processes
> >
> > Table routines are used for coulomb: TRUE
> > Table routines are used for vdw: TRUE
> > Will do PME sum in reciprocal space.
> >
> > ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> > U. Essman, L. Perela, M. L. Berkowitz, T. Darden, H. Lee and L. G.
> > Pedersen A smooth particle mesh Ewald method
> > J. Chem. Phys. 103 (1995) pp. 8577-8592
> > -------- -------- --- Thank You --- -------- --------
> >
> > Using a Gaussian width (1/beta) of 0.480244 nm for Ewald
> > Using shifted Lennard-Jones, switch between 1.2 and 1.5 nm
> > Cut-off's: NS: 1.7 Coulomb: 1.5 LJ: 1.5
> > System total charge: 0.000
> > Generated table with 5400 data points for Ewald-Switch.
> > Tabscale = 2000 points/nm
> > Generated table with 5400 data points for LJ6Switch.
> > Tabscale = 2000 points/nm
> > Generated table with 5400 data points for LJ12Switch.
> > Tabscale = 2000 points/nm
> > Generated table with 5400 data points for 1-4 COUL.
> > Tabscale = 2000 points/nm
> > Generated table with 5400 data points for 1-4 LJ6.
> > Tabscale = 2000 points/nm
> > Generated table with 5400 data points for 1-4 LJ12.
> > Tabscale = 2000 points/nm
> >
> > Enabling SPC water optimization for 115416 molecules.
> >
> > Configuring nonbonded kernels...
> > Testing x86_64 SSE2 support... present.
> >
> >
> >
> > Initializing Parallel LINear Constraint Solver
> >
> > ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> > B. Hess
> > P-LINCS: A Parallel Linear Constraint Solver for molecular simulation
> > J. Chem. Theory Comput. 4 (2008) pp. 116-122
> > -------- -------- --- Thank You --- -------- --------
> >
> > The number of constraints is 43710
> > There are inter charge-group constraints,
> > will communicate selected coordinates each lincs iteration
> >
> > ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> > S. Miyamoto and P. A. Kollman
> > SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for
> > Rigid Water Models
> > J. Comp. Chem. 13 (1992) pp. 952-962
> > -------- -------- --- Thank You --- -------- --------
> >
> >
> > Linking all bonded interactions to atoms
> > There are 237282 inter charge-group exclusions,
> > will use an extra communication step for exclusion forces for PME-Switch
> >
> > The initial number of communication pulses is: X 2 Y 1
> > The initial domain decomposition cell size is: X 1.18 nm Y 2.54 nm
> >
> > The maximum allowed distance for charge groups involved in interactions
is:
> > non-bonded interactions 1.700 nm
> >
> > (the following are initial values, they could change due to box
> > deformation)
> >
> > two-body bonded interactions (-rdd) 1.700 nm
> >
> > multi-body bonded interactions (-rdd) 1.178 nm
> >
> > atoms separated by up to 5 constraints (-rcon) 1.178 nm
> >
> > When dynamic load balancing gets turned on, these settings will change
> > to: The maximum number of communication pulses is: X 2 Y 2
> > The minimum size for domain decomposition cells is 0.850 nm
> > The requested allowed shrink of DD cells (option -dds) is: 0.80
> > The allowed shrink of domain decomposition cells is: X 0.72 Y 0.33
> >
> > The maximum allowed distance for charge groups involved in interactions
is:
> > non-bonded interactions 1.700 nm
> >
> > two-body bonded interactions (-rdd) 1.700 nm
> >
> > multi-body bonded interactions (-rdd) 0.850 nm
> >
> > atoms separated by up to 5 constraints (-rcon) 0.850 nm
> >
> > Making 2D domain decomposition grid 16 x 7 x 1, home cell index 0 0 0
> >
> > Center of mass motion removal mode is Linear
> >
> > We have the following groups for center of mass motion removal:
> > 0: rest
> >
> > There are: 390476 Atoms
> > Charge group distribution at step 3377000: 1125 1142 1147 1166 1139 1158
> > 1135
> > 1146 1216 1139 1298 1162 1139 1138 1182 1279 1151 1525 1334 1173 1162
> > 1364 1509 1368 1884 1513 1149 1151 1286 1358 1714 2023 1752 1191 1150
> > 1167 1480 2099 2239 2327 1411 1170 1132 1668 2235 1647 2174 1812 1195
> > 1258 1542 1919 1307 1861 1793 1141 1415 1590 1810 1435 2106 1755 1149
> > 1564 1711 2424 2156 2122 1457 1158 1278 1291 1862 2071 1775 1564 1118
> > 1144 1175 1938 2062 1858 1637 1136 1141 1326 1685 1438 1348 1277 1144
> > 1162 1159 1361 1142 1226 1184 1153 1142 1144 1195 1154 1144 1151 1124
> > 1149 1148 1171 1140 1127 1145 1158 Grid: 5 x 7 x 17 cells
> > Initial temperature: 309.173 K
> >
> > Started mdrun on node 0 Sat Mar 13 01:48:12 2010
> >
> > <====== ############### ==>
> > <==== A V E R A G E S ====>
> > <== ############### ======>
> >
> > Energies (kJ/mol)
> >
> > Angle Proper Dih. Ryckaert-Bell. LJ-14
> > Coulomb-14
> >
> > 3.11012e+11 1.88259e+10 3.84061e+11 1.60351e+11
> > 1.48403e+12
> >
> > LJ (SR) Disper. corr. Coulomb (SR) Coul. recip.
> > Potential
> >
> > 2.00208e+12 -3.71589e+10 -1.93169e+13 -2.65842e+12
> > -1.76521e+13 Kinetic En. Total Energy Temperature Pressure (bar)
> > Cons. rmsd () 3.40082e+12 -1.42513e+13 1.04681e+09
> > 3.09179e+06 0.00000e+00
> >
> > Box-X Box-Y Box-Z Volume Density
> > (SI)
> >
> > 6.36496e+07 6.00618e+07 3.96178e+07 1.32807e+10
> > 3.44326e+09
> >
> > pV
> >
> > 7.21862e+08
> >
> > Total Virial (kJ/mol)
> >
> > 1.13360e+12 1.84057e+08 1.42298e+08
> > 1.84057e+08 1.13321e+12 1.72446e+08
> > 1.42298e+08 1.72446e+08 1.13293e+12
> >
> > Pressure (bar)
> >
> > 1.96772e+06 3.52902e+05 -4.76710e+05
> > 3.52902e+05 3.34068e+06 -5.56943e+05
> >
> > -4.76710e+05 -5.56943e+05 3.96697e+06
> >
> > Total Dipole (Debye)
> >
> > 2.04296e+09 7.00647e+08 1.82307e+09
> >
> > Epot (kJ/mol) Coul-SR LJ-SR Coul-14
> > LJ-14
> >
> > Protein-Protein -1.05061e+12 -2.88200e+11 1.48403e+12
> > 1.60351e+11 Protein-Non-Protein -9.55694e+11 -7.40905e+10
> > 0.00000e+00 0.00000e+00
> > Non-Protein-Non-Protein -1.73106e+13 2.36437e+12 0.00000e+00
> > 0.00000e+00
> >
> > T-Protein T-Non-Protein
> >
> > 1.04653e+09 1.04684e+09
> >
> > <====== ############################### ==>
> > <==== R M S - F L U C T U A T I O N S ====>
> > <== ############################### ======>
> >
> > Energies (kJ/mol)
> >
> > Angle Proper Dih. Ryckaert-Bell. LJ-14
> > Coulomb-14
> >
> > 9.01957e+05 1.88394e+05 8.36483e+05 3.32678e+05
> > 1.56961e+06
> >
> > LJ (SR) Disper. corr. Coulomb (SR) Coul. recip.
> > Potential
> >
> > 4.49504e+06 1.61531e+04 7.36949e+06 4.74527e+05
> > 5.31921e+06 Kinetic En. Total Energy Temperature Pressure (bar)
> > Cons. rmsd () 2.96016e+06 6.07351e+06 9.11168e+02
> > 8.50221e+04 0.00000e+00
> >
> > Box-X Box-Y Box-Z Volume Density
> > (SI)
> >
> > 9.22282e+00 8.70294e+00 5.74061e+00 5.77310e+03
> > 1.49680e+03
> >
> > pV
> >
> > 2.01360e+07
> >
> > Total Virial (kJ/mol)
> >
> > 1.43640e+07 8.71197e+06 8.72558e+06
> > 8.71197e+06 1.43505e+07 8.72679e+06
> > 8.72558e+06 8.72679e+06 1.39238e+07
> >
> > Pressure (bar)
> >
> > 1.22151e+05 7.44143e+04 7.44974e+04
> > 7.44143e+04 1.22011e+05 7.44967e+04
> > 7.44974e+04 7.44967e+04 1.18187e+05
> >
> > Total Dipole (Debye)
> >
> > 6.14300e+06 5.87464e+06 5.12540e+06
> >
> > Epot (kJ/mol) Coul-SR LJ-SR Coul-14
> > LJ-14
> >
> > Protein-Protein 7.16780e+06 1.41743e+06 1.56961e+06
> > 3.32678e+05 Protein-Non-Protein 1.08780e+07 9.73569e+05
> > 0.00000e+00 0.00000e+00
> > Non-Protein-Non-Protein 9.00044e+06 4.32503e+06 0.00000e+00
> > 0.00000e+00
> >
> > T-Protein T-Non-Protein
> >
> > 2.66136e+03 9.69716e+02
> >
> > M E G A - F L O P S A C C O U N T I N G
> >
> > RF=Reaction-Field FE=Free Energy SCFE=Soft-Core/Free Energy
> > T=Tabulated W3=SPC/TIP3p W4=TIP4p (single or pairs)
> > NF=No Forces
> >
> > Computing: M-Number M-Flops % Flops
> >
> > -----------------------------------------------------------------------
> >
> > CG-CoM 0.390476 1.171 100.0
> >
> > -----------------------------------------------------------------------
> >
> > Total 1.171 100.0
> >
> > -----------------------------------------------------------------------
> >
> > D O M A I N D E C O M P O S I T I O N S T A T I S T I C S
> >
> > av. #atoms communicated per step for force: 2 x 1120267.0
> > av. #atoms communicated per step for LINCS: 2 x 34671.0
> >
> > R E A L C Y C L E A N D T I M E A C C O U N T I N G
> >
> > Computing: Nodes Number G-Cycles Seconds %
> >
> > -----------------------------------------------------------------------
> >
> > Rest 112 34860.808 0.0 100.0
> >
> > -----------------------------------------------------------------------
> >
> > Total 128 34860.808 0.0 100.0
> >
> > -----------------------------------------------------------------------
> >
> > nodetime = 0! Infinite Giga flopses!
> >
> > Parallel run - timing based on wallclock.
> >
> > Finished mdrun on node 0 Sat Mar 13 01:48:12 2010
> >
> > On Вторник 16 марта 2010 00:58:50 Roland Schulz wrote:
> > > Alexey,
> > >
> > > your not giving enough information.
> > >
> > > What exactly is the error? What happens? Does it hang or does it crash
> >
> > with
> >
> > > an error?
> > >
> > > Roland
> > >
> > > On Fri, Mar 12, 2010 at 7:01 PM, Alexey Shvetsov <alexxyum at gmail.com>
> >
> > wrote:
> > > > Hi all
> > > > Seem like there is bug with continuation from checkpoint for gromacs
> > > > 4.0.7 Steps to reproduce
> > > > 1. submit parrallel job to pbs
> > > > 2. kill job
> > > > 3. try to resume from checkpoint
> > > >
> > > > relevant output from mdrun
> > > > Reading checkpoint file md.cpt generated: Thu Mar 11 12:20:46 2010
> > > >
> > > > Loaded with Money
> > > >
> > > > Making 2D domain decomposition 16 x 7 x 1
> > > >
> > > > WARNING: This run will generate roughly 20607979313638129664 Mb of
> > > > data
> > > >
> > > > starting mdrun 'Protein in water'
> > > > 500000 steps, 1000.0 ps (continuing from step 3377000, 6754.0
> > > > ps).
> > > >
> > > > nodetime = 0! Infinite Giga flopses!
> > > >
> > > > Parallel run - timing based on wallclock.
> > > >
> > > > --
> > > > Best Regards,
> > > > Alexey 'Alexxy' Shvetsov
> > > > Petersburg Nuclear Physics Institute, Russia
> > > > Department of Molecular and Radiation Biophysics
> > > > Gentoo Team Ru
> > > > Gentoo Linux Dev
> > > > mailto:alexxyum at gmail.com
> > > > mailto:alexxy at gentoo.org
> > > > mailto:alexxy at omrb.pnpi.spb.ru
> > > >
> > > > --
> > > > gmx-developers mailing list
> > > > gmx-developers at gromacs.org
> > > > http://lists.gromacs.org/mailman/listinfo/gmx-developers
> > > > Please don't post (un)subscribe requests to the list. Use the
> > > > www interface or send it to gmx-developers-request at gromacs.org.
> >
> > --
> > Best Regards,
> > Alexey 'Alexxy' Shvetsov
> > Petersburg Nuclear Physics Institute, Russia
> > Department of Molecular and Radiation Biophysics
> > Gentoo Team Ru
> > Gentoo Linux Dev
> > mailto:alexxyum at gmail.com
> > mailto:alexxy at gentoo.org
> > mailto:alexxy at omrb.pnpi.spb.ru
> >
> > --
> > gmx-developers mailing list
> > gmx-developers at gromacs.org
> > http://lists.gromacs.org/mailman/listinfo/gmx-developers
> > Please don't post (un)subscribe requests to the list. Use the
> > www interface or send it to gmx-developers-request at gromacs.org.
--
Best Regards,
Alexey 'Alexxy' Shvetsov
Petersburg Nuclear Physics Institute, Russia
Department of Molecular and Radiation Biophysics
Gentoo Team Ru
Gentoo Linux Dev
mailto:alexxyum at gmail.com
mailto:alexxy at gentoo.org
mailto:alexxy at omrb.pnpi.spb.ru
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20100316/68ee81b3/attachment.sig>
More information about the gromacs.org_gmx-developers
mailing list