[gmx-users] maxh mdrun option does not work with REMD simulation

Mark Abraham mark.j.abraham at gmail.com
Tue Apr 5 18:11:54 CEST 2016


Hi,

It should work, but apparently doesn't. Please open an issue at
redmine.gromacs.org and include a tarball of your tpr files so we can see
the problem happen and fix it.

Meanwhile, I suggest the approach of using gmx grompp to construct a run
that will complete in the maximum time you can run, and to use gmx tpbconv
to extend suitably for the next phase.

Mark

On Tue, 5 Apr 2016 16:51 Maud Jusot <Maud.Jusot at impmc.upmc.fr> wrote:

> Hello again
>
> I still don’t manage to restart correctly REMD simulation (see my
> previous message) but I can add some details. When it restarts, gromacs
> doesn’t create new checkpoint files (no matter –cpt option is), and
> doesn’t stop at maxh time. I tried with 3 different versions of gromacs
> (4.6.5, 5.1.0 and 5.1.2) on two different clusters, so I am quite sure
> the problem does not come from the installation nor from the version.
>
> It’s a big issue for me because I try to run 2.4 micro seconds
> simulation and on the cluster I use I can’t do simulation for more than
> 24h (which correspond to 300 ns approximately) without doing restart. So
> without checkpoint file I am unable to re-launch my simulation.
>
> Is there something wrong in what I do ?
> Does any body have an idea or do you think it's a bug and I should write
> to the developers mailing list ?
>
> Thanks,
>
> Maud
>
> Le 29/03/2016 11:33, Maud Jusot a écrit :
> > Dear Gromacs users,
> >
> > I tried to do a REMD simulation with gromacs 5.1 which is re-launched
> > every hour (in a queuing system) with the -maxh option.
> > The first time it was launched, it worked : the run stoped at the maxh
> > time and it was re-launched with the checkpoint files and continued
> > the simulation. But during this second run, when the maxh time was
> > achieved (step 1981062), gromacs said that it was going to stop but it
> > did not stop until the system kill the job (step 2545600) .
> >
> > I tried with different maxh times ( 1/0.95/0.20 hour) to be sure that
> > the time between maxh and the cluster maxtime was sufficient, but in
> > any case the second run continued until it reached the one hour and
> > was killed by the system.
> >
> > I find this very strange that it works the first time and that the
> > second time gromacs says that it has to stop but does not.
> > Moreover, I tried the same work but with a classical simulation
> > (without REMD) and this time there was no problem.
> > Did I forget an option or something like that for maxh being
> > compatible with the REMD ?
> > I searched on the web and the mailing list but I did not find any
> > recording problems between maxh and REMD.
> >
> > Do you have any idea of what the problem is ?
> >
> > Here is the command lines in my script myJob.slurm :
> > ---------------------
> > srun --mpi=pmi2 -K1 --resv-ports -n $SLURM_NTASKS mdrun_mpi -ntomp 1
> > -multi 8 -replex 500 -maxh 0.2 -deffnm mdA_ -cpi mdA_.cpt -cpo
> > mdA_.cpt -v 2>> remdA.log
> > # resubmit the same job at the end for a long run:
> > sbatch myJob.slurm
> > ---------------------
> >
> > Here is a part of my remdA.log file :
> > ---------------------
> > starting mdrun 'myPeptide'
> > starting mdrun 'myPeptide'
> > 120000000 steps, 240000.0 ps (continuing from step 655701, 1311.4 ps).
> > starting mdrun 'myPeptide'
> > 120000000 steps, 240000.0 ps (continuing from step 655701, 1311.4 ps).
> > starting mdrun 'myPeptide'
> > 120000000 steps, 240000.0 ps (continuing from step 655701, 1311.4 ps).
> > starting mdrun 'myPeptide'
> > 120000000 steps, 240000.0 ps (continuing from step 655701, 1311.4 ps).
> > starting mdrun 'myPeptide'
> > 120000000 steps, 240000.0 ps (continuing from step 655701, 1311.4 ps).
> > starting mdrun 'myPeptide'
> > 120000000 steps, 240000.0 ps (continuing from step 655701, 1311.4 ps).
> > starting mdrun 'myPeptide'
> > 120000000 steps, 240000.0 ps (continuing from step 655701, 1311.4 ps).
> > 120000000 steps, 240000.0 ps (continuing from step 655701, 1311.4 ps).
> >
> > Step 1981061: Run time exceeded 0.198 hours, will terminate the run
> >
> > Step 1981062: Run time exceeded 0.198 hours, will terminate the run
> >
> > Step 1981062: Run time exceeded 0.198 hours, will terminate the run
> >
> > Step 1981062: Run time exceeded 0.198 hours, will terminate the run
> >
> > Step 1981062: Run time exceeded 0.198 hours, will terminate the run
> >
> > Step 1981062: Run time exceeded 0.198 hours, will terminate the run
> >
> > Step 1981062: Run time exceeded 0.198 hours, will terminate the run
> >
> > Step 1981062: Run time exceeded 0.198 hours, will terminate the run
> >
> > step 1981100, will finish Sat Mar 26 11:14:40 2016
> > step 1981200, will finish Sat Mar 26 11:14:40 2016
> > ...
> > step 2545600, will finish Sat Mar 26 11:15:49 2016srun: Job step
> > aborted: Waiting up to 32 seconds for job step to finish.
> >
> > Received the TERM signal, stopping at the next NS step
> >
> > Received the TERM signal, stopping at the next NS step
> >
> > Received the TERM signal, stopping at the next NS step
> >
> > Received the TERM signal, stopping at the next NS step
> >
> > Received the TERM signal, stopping at the next NS step
> >
> > Received the TERM signal, stopping at the next NS step
> >
> > Received the TERM signal, stopping at the next NS step
> >
> > Received the TERM signal, stopping at the next NS step
> > ---------------------
> >
> > Thanks a lot,
> >
> > Maud
> >
> >
> >
> >
> >
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.


More information about the gromacs.org_gmx-users mailing list