[gmx-users] problem with restarting from cpt and _prev.cpt file in REMD simulation

Mark Abraham mark.j.abraham at gmail.com
Tue Oct 13 12:43:51 CEST 2015


Hi,

Unfortunately, that should
a) never happen,
b) given a), rarely happen identically for successive checkpoints,
c) is even more strange if your time step is a normal ~2fs, and
d) is not easy to fix.

To address a-c), it would be good for you to get some feedback from the
PLUMED developers about whether others have reported such problems.
Offhand, I don't know of any plain GROMACS user who's been able to produce
such checkpoint files. Last time I looked at their code I didn't see
anything that could do this, but that was 1.x. It is also conceivable that
this is a bug in your MPI layer.

For d), you can try
* back up your files
* use plain mdrun to advance all of the 194440.010 files by one step (e.g.
gmx mdrun -nsteps 1 -s old -cpi state)
* same for the 194440.002 files by as many steps as you need to catch up to
the others
* restart mdrun -multi -replex

Simulations have an internal part number that is expected to match, which
is why you can't just run an extra part to catch up the simulations that
are lagging.

Mark

On Tue, Oct 13, 2015 at 12:05 PM Tomek Wlodarski <tomek.wlodarski at gmail.com>
wrote:

> Dear GROMACS users,
>
> I am running REMD simulation with Gromacs 5.0.4 and PLUMED 2.1 for already
> around 200ns with many smooth restarts. Unfortunately now something went
> wrong because I got error that:
>
> The 6 subsystems are not compatible
>
> So I checked with gmxcheck cpt files
> replica 1: Last frame         -1 time 194440.010
> replica 2: Last frame         -1 time 194440.010
> replica 3: Last frame         -1 time 194440.010
> replica 4: Last frame         -1 time *194440.002*
> replica 5: Last frame         -1 time 194440.010
> replica 6: Last frame         -1 time *194440.002*
> Same situation is in _prev.cpt
> replica 1: Last frame         -1 time 194400.010
> replica 2: Last frame         -1 time 194400.010
> replica 3: Last frame         -1 time 194400.010
> replica 4: Last frame         -1 time *194400.002*
> replica 5: Last frame         -1 time 194400.010
> replica 6: Last frame         -1 time *194400.002*
>
> Any idea how to restart simulation when both cpt and _prev.cpt file seem to
> be broken?
> Why this happened and how to prevent it?
> Thank you.
> All the best,
>
> tomek
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>


More information about the gromacs.org_gmx-users mailing list