[gmx-users] Pausing and resuming mdrun

Szilárd Páll pall.szilard at gmail.com
Tue Sep 29 22:27:36 CEST 2015


The combination of STOP and CONT signals should work just fine - the only
negative effect will be incorrect cycle counting output in the log file.
You can also adjust the checkpointing frequency with -cpt if you e.g. want
a shorter interval.

In general, you can refer to types of signals with their name with/without
the SIG prefix or using the corresponding numeric value (e.g. KILL,
SIGKILL, or 9). For more I suggest checking:
- man kill
- https://en.wikipedia.org/wiki/Unix_signal#POSIX_signals

Cheers,
--
Szilárd

On Tue, Sep 29, 2015 at 7:28 PM, Andrew DeYoung <adeyoung at andrew.cmu.edu>
wrote:

> Hi,
>
> I'm running an MD simulation using mdrun (specifically version 4.5.5 -- an
> old version).  I would like to pause this simulation -- perhaps for days
> or weeks -- so that I can run a different one on this node.
>
> Of course, one way is to just kill the simulation altogether:
>
> kill $ProcessID
>
> where $ProcessID is the process ID of the simulation.  Then, when I want
> to resume the simulation, I can just pass the last checkpoint file to
> mdrun.  Checkpoint files have been written every 15 minutes (i.e., the
> default setting), so with this method I will lose at most 15 minutes of
> computation time.
>
> But, is there any way to literally _pause_ the simulation and resume it a
> few days or weeks later?
>
> A Unix/Linux question and answer site (
>
> http://unix.stackexchange.com/questions/2107/how-to-suspend-and-resume-processes
> ) says that one can pause/resume a process with this method:
>
> kill -SIGSTOP $ProcessID
> kill -SIGCONT $ProcessID
>
> or:
>
> kill -SIGSTP $ProcessID
> kill -SIGCONT $ProcessID
>
> Another site ( http://www.cyberciti.biz/faq/unix-kill-command-examples/ )
> says to just use:
>
> kill -STOP $ProcessID
> kill -CONT $ProcessID
>
> My question is, do you think that these Linux methods will work with
> mdrun?  I did a test with mdrun on another node, and it seems to work, but
> I'm just wondering if there are any dangers in using these methods.
>
> (I am accessing the Linux machine remotely, by SSH.  Sometimes my SSH
> connection gives out, so when starting a simulation I always use "nohup
> mdrun -s topol.tpr" so that the mdrun process is not terminated when my
> SSH connection flakes out.  I'm not sure if this will affect the viability
> of the "kill -STOP/CONT" method...)
>
> Thanks so much,
>
> Andrew DeYoung
> Carnegie Mellon University
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>


More information about the gromacs.org_gmx-users mailing list