[gmx-users] maxh mdrun option does not work with REMD simulation

Maud Jusot Maud.Jusot at impmc.upmc.fr
Tue Mar 29 11:34:13 CEST 2016


Dear Gromacs users,

I tried to do a REMD simulation with gromacs 5.1 which is re-launched 
every hour (in a queuing system) with the -maxh option.
The first time it was launched, it worked : the run stoped at the maxh 
time and it was re-launched with the checkpoint files and continued the 
simulation. But during this second run, when the maxh time was achieved 
(step 1981062), gromacs said that it was going to stop but it did not 
stop until the system kill the job (step 2545600) .

I tried with different maxh times ( 1/0.95/0.20 hour) to be sure that 
the time between maxh and the cluster maxtime was sufficient, but in any 
case the second run continued until it reached the one hour and was 
killed by the system.

I find this very strange that it works the first time and that the 
second time gromacs says that it has to stop but does not.
Moreover, I tried the same work but with a classical simulation (without 
REMD) and this time there was no problem.
Did I forget an option or something like that for maxh being compatible 
with the REMD ?
I searched on the web and the mailing list but I did not find any 
recording problems between maxh and REMD.

Do you have any idea of what the problem is ?

Here is the command lines in my script myJob.slurm :
---------------------
srun --mpi=pmi2 -K1 --resv-ports -n $SLURM_NTASKS mdrun_mpi -ntomp 1 
-multi 8 -replex 500 -maxh 0.2 -deffnm mdA_ -cpi mdA_.cpt -cpo mdA_.cpt 
-v 2>> remdA.log
# resubmit the same job at the end for a long run:
sbatch myJob.slurm
---------------------

Here is a part of my remdA.log file :
---------------------
starting mdrun 'myPeptide'
starting mdrun 'myPeptide'
120000000 steps, 240000.0 ps (continuing from step 655701,   1311.4 ps).
starting mdrun 'myPeptide'
120000000 steps, 240000.0 ps (continuing from step 655701,   1311.4 ps).
starting mdrun 'myPeptide'
120000000 steps, 240000.0 ps (continuing from step 655701,   1311.4 ps).
starting mdrun 'myPeptide'
120000000 steps, 240000.0 ps (continuing from step 655701,   1311.4 ps).
starting mdrun 'myPeptide'
120000000 steps, 240000.0 ps (continuing from step 655701,   1311.4 ps).
starting mdrun 'myPeptide'
120000000 steps, 240000.0 ps (continuing from step 655701,   1311.4 ps).
starting mdrun 'myPeptide'
120000000 steps, 240000.0 ps (continuing from step 655701,   1311.4 ps).
120000000 steps, 240000.0 ps (continuing from step 655701,   1311.4 ps).

Step 1981061: Run time exceeded 0.198 hours, will terminate the run

Step 1981062: Run time exceeded 0.198 hours, will terminate the run

Step 1981062: Run time exceeded 0.198 hours, will terminate the run

Step 1981062: Run time exceeded 0.198 hours, will terminate the run

Step 1981062: Run time exceeded 0.198 hours, will terminate the run

Step 1981062: Run time exceeded 0.198 hours, will terminate the run

Step 1981062: Run time exceeded 0.198 hours, will terminate the run

Step 1981062: Run time exceeded 0.198 hours, will terminate the run

step 1981100, will finish Sat Mar 26 11:14:40 2016
step 1981200, will finish Sat Mar 26 11:14:40 2016
...
step 2545600, will finish Sat Mar 26 11:15:49 2016srun: Job step 
aborted: Waiting up to 32 seconds for job step to finish.

Received the TERM signal, stopping at the next NS step

Received the TERM signal, stopping at the next NS step

Received the TERM signal, stopping at the next NS step

Received the TERM signal, stopping at the next NS step

Received the TERM signal, stopping at the next NS step

Received the TERM signal, stopping at the next NS step

Received the TERM signal, stopping at the next NS step

Received the TERM signal, stopping at the next NS step
---------------------

Thanks a lot,

Maud







More information about the gromacs.org_gmx-users mailing list