[gmx-users] Error on restart REMD simulations

Mark Abraham mark.j.abraham at gmail.com
Mon Jan 5 00:38:09 CET 2015


On Sun, Jan 4, 2015 at 9:22 PM, leila salimi <leilasalimi at gmail.com> wrote:

> Dear Gromacs users,
>
> When I tried to restart my REMD simulations, I got this error :
>
> Fatal error:
> The 16 subsystems are not compatible
>
> When I checked the cpt files I found that 2 replicas have different time
> from other replicas, and I knew that the error comes from this issue.
>
> For example replica 2 and replica 7 are at 124141.297 time and the others
> are at 124580.523 time.
>

Indeed, this is consistent with normal behaviour if the simulation is
stopped any time after the simulations signal to each other that it is time
to checkpoint and before the file system actually flushes the buffers to
disk. (Some parallel file systems seem to helpfully ignore the instruction
to flush to disk, which is less than useful.)


>
>
>
> I found that the *_prev.cpt files have the same information as the two cpt
> file ( replica 2 and 7).
> Can I change the name of this *_prev.cpt to *.cpt and keep the state2.cpt
> and state7.cpt and run the simulation again?
>

Yes, back up your files, and rename the 14 *_prev.cpt to *.cpt and you
should be good. This is one of the reasons we keep the prev.cpt around.


> I used this command "   mdrun_mpi_d -s topol_.tpr -multi 16 -replex 1000
> -append -cpi state.cpt "
>
> How can I fix this problem?


In general, neither you nor mdrun can. Writing the set of files to disk is
not transactional. It could almost be made so, if mdrun would write a
*_next.cpt file, wait for all other simulations to report that they have
done so, then all rename the .cpt files, and proceed with the simulation
only when all simulations report that they have done the renaming. But the
performance characteristics of such an operation are unknown (and probably
bad), and there's still no guarantee that clever filesystems are not doing
the renaming in their in-memory buffer, rather than on disk, and so you're
still vulnerable to wrong behaviour if there is some external problem.

  Is it possible to have the all output files
> appending or I have to rename the outputs every time?
>

Having done the renaming of the .cpt files, you should be fine to append
for the restart.

Mark


> Thanks for your help.
>
> Regards,
> Leila
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>


More information about the gromacs.org_gmx-users mailing list