[gmx-users] maxh mdrun option does not work with REMD simulation

Maud Jusot Maud.Jusot at impmc.upmc.fr
Tue Apr 5 16:51:18 CEST 2016


Hello again

I still don’t manage to restart correctly REMD simulation (see my 
previous message) but I can add some details. When it restarts, gromacs 
doesn’t create new checkpoint files (no matter –cpt option is), and 
doesn’t stop at maxh time. I tried with 3 different versions of gromacs 
(4.6.5, 5.1.0 and 5.1.2) on two different clusters, so I am quite sure 
the problem does not come from the installation nor from the version.

It’s a big issue for me because I try to run 2.4 micro seconds 
simulation and on the cluster I use I can’t do simulation for more than 
24h (which correspond to 300 ns approximately) without doing restart. So 
without checkpoint file I am unable to re-launch my simulation.

Is there something wrong in what I do ?
Does any body have an idea or do you think it's a bug and I should write 
to the developers mailing list ?

Thanks,

Maud

Le 29/03/2016 11:33, Maud Jusot a écrit :
> Dear Gromacs users,
>
> I tried to do a REMD simulation with gromacs 5.1 which is re-launched 
> every hour (in a queuing system) with the -maxh option.
> The first time it was launched, it worked : the run stoped at the maxh 
> time and it was re-launched with the checkpoint files and continued 
> the simulation. But during this second run, when the maxh time was 
> achieved (step 1981062), gromacs said that it was going to stop but it 
> did not stop until the system kill the job (step 2545600) .
>
> I tried with different maxh times ( 1/0.95/0.20 hour) to be sure that 
> the time between maxh and the cluster maxtime was sufficient, but in 
> any case the second run continued until it reached the one hour and 
> was killed by the system.
>
> I find this very strange that it works the first time and that the 
> second time gromacs says that it has to stop but does not.
> Moreover, I tried the same work but with a classical simulation 
> (without REMD) and this time there was no problem.
> Did I forget an option or something like that for maxh being 
> compatible with the REMD ?
> I searched on the web and the mailing list but I did not find any 
> recording problems between maxh and REMD.
>
> Do you have any idea of what the problem is ?
>
> Here is the command lines in my script myJob.slurm :
> ---------------------
> srun --mpi=pmi2 -K1 --resv-ports -n $SLURM_NTASKS mdrun_mpi -ntomp 1 
> -multi 8 -replex 500 -maxh 0.2 -deffnm mdA_ -cpi mdA_.cpt -cpo 
> mdA_.cpt -v 2>> remdA.log
> # resubmit the same job at the end for a long run:
> sbatch myJob.slurm
> ---------------------
>
> Here is a part of my remdA.log file :
> ---------------------
> starting mdrun 'myPeptide'
> starting mdrun 'myPeptide'
> 120000000 steps, 240000.0 ps (continuing from step 655701, 1311.4 ps).
> starting mdrun 'myPeptide'
> 120000000 steps, 240000.0 ps (continuing from step 655701, 1311.4 ps).
> starting mdrun 'myPeptide'
> 120000000 steps, 240000.0 ps (continuing from step 655701, 1311.4 ps).
> starting mdrun 'myPeptide'
> 120000000 steps, 240000.0 ps (continuing from step 655701, 1311.4 ps).
> starting mdrun 'myPeptide'
> 120000000 steps, 240000.0 ps (continuing from step 655701, 1311.4 ps).
> starting mdrun 'myPeptide'
> 120000000 steps, 240000.0 ps (continuing from step 655701, 1311.4 ps).
> starting mdrun 'myPeptide'
> 120000000 steps, 240000.0 ps (continuing from step 655701, 1311.4 ps).
> 120000000 steps, 240000.0 ps (continuing from step 655701, 1311.4 ps).
>
> Step 1981061: Run time exceeded 0.198 hours, will terminate the run
>
> Step 1981062: Run time exceeded 0.198 hours, will terminate the run
>
> Step 1981062: Run time exceeded 0.198 hours, will terminate the run
>
> Step 1981062: Run time exceeded 0.198 hours, will terminate the run
>
> Step 1981062: Run time exceeded 0.198 hours, will terminate the run
>
> Step 1981062: Run time exceeded 0.198 hours, will terminate the run
>
> Step 1981062: Run time exceeded 0.198 hours, will terminate the run
>
> Step 1981062: Run time exceeded 0.198 hours, will terminate the run
>
> step 1981100, will finish Sat Mar 26 11:14:40 2016
> step 1981200, will finish Sat Mar 26 11:14:40 2016
> ...
> step 2545600, will finish Sat Mar 26 11:15:49 2016srun: Job step 
> aborted: Waiting up to 32 seconds for job step to finish.
>
> Received the TERM signal, stopping at the next NS step
>
> Received the TERM signal, stopping at the next NS step
>
> Received the TERM signal, stopping at the next NS step
>
> Received the TERM signal, stopping at the next NS step
>
> Received the TERM signal, stopping at the next NS step
>
> Received the TERM signal, stopping at the next NS step
>
> Received the TERM signal, stopping at the next NS step
>
> Received the TERM signal, stopping at the next NS step
> ---------------------
>
> Thanks a lot,
>
> Maud
>
>
>
>
>



More information about the gromacs.org_gmx-users mailing list