[gmx-users] intermittent changes in energy drift following simulation restarts in v4.6.1

Mark Abraham mark.j.abraham at gmail.com
Mon Sep 9 17:38:10 CEST 2013


No obvious problems. Please open an issue at redmine.gromacs.org when you
have something reproducible, but don't hurry, nobody's likely to have time
to check it out for a week or two.

Cheers,

Mark
On Sep 9, 2013 5:11 PM, "Richard Broadbent" <
richard.broadbent09 at imperial.ac.uk> wrote:

> Hi Mark,
>
> Thanks for the quick response,
>
> On 09/09/13 15:45, Mark Abraham wrote:
>
>> Sounds worrying :-( Thanks for the detailed report and
>> trouble-shooting! So far, I can't think of a reason for it.
>>
>> A couple of suggestions:
>> * try again with 4.6.3 (at least while trouble-shooting) in case its a
>> fixed bug
>>
> I'll test that side by side with 4.6.1 that way we can have both for
> comparison
>
>> * post a representative .mdp file
>>
> its below this message the production run is built using tpbconv -extend
> on the .tpr built from that .mdp.
>
>  * is there anything out of the ordinary in the topology?
>>
> I built the residues myself but they're just standard polymer monomer
> units nothing out of the ordinary.
>
>  * if the problem is restart-related and shows up in the drift quickly,
>> then you can probably find a reproducible case via a job that does
>> lots of short-interval restarts and saves all the intermediate files -
>> a (set of) inputs that can reproduce the problem sounds like what we'd
>> need to diagnose and/or fix anything
>>
> I'm already starting to build them will be testing them tomorrow
>
>> * does it happen in a non-multi simulation? (or more particularly,
>> what are you doing with -multi?)
>>
> The -multi was used to move the job into a faster queue I've seen it in
> non -multi jobs
>
>> * check .log files for warnings, and that there are none being
>> suppressed at the grompp stage
>>
> There are no errors at grompp stage
>  I haven't identified any warnings in the mdrun logs but I'm going to have
> a another look before I'm 100% certain that there aren't any in there but I
> couldn't see any on a first look through
>
>  * see if the group cut-off scheme in 4.6.x shows the same problem
>>
>>  Will do
>
>  Mark
>>
>
>
> Thanks,
>
> Richard
>
>
> integrator = md
> bd_fric     = 0
>
> dt = 0.002
>
> nsteps = 2500000
>
> comm_mode = linear
>
> nstxout = 100000
> nstvout = 100000
> nstfout = 0
>
> xtc_grps = P84
> nstxtcout = 50000
>
> nstlog = 100000
>
> nstenergy = 50000
>
> pbc = xyz
> periodic_molecules = no
>
> ns_type             = grid
> nstlist             = 10
>
> rlist = 1.25
> optimize_fft = yes
> fourier_nx = 128
> fourier_ny = 128
> fourier_nz = 128
>
> pme_order       = 4
> epsilon_r       = 1.0
>
> coulombtype = pme
> coulomb-modifier = Potential-shift-Verlet
> rcoulomb = 1.2
>
> vdwtype = cut-off
> vdw-modifier = Potential-shift-Verlet
>
> rvdw = 1.20
>
> DispCorr = EnerPres
>
> tcoupl = no
>
> nsttcouple = 5
>
> pcoupl = no
>
> constraints = h-bonds
>
> lincs_order = 6
> lincs_iter = 2
>
> cutoff-scheme = Verlet
> verlet-buffer-drift = -1
>
>
>
>>
>> On Mon, Sep 9, 2013 at 4:08 PM, Richard Broadbent
>> <richard.broadbent09 at imperial.**ac.uk<richard.broadbent09 at imperial.ac.uk>>
>> wrote:
>>
>>> Dear All,
>>>
>>> I've been analysing a series of long (200 ns) NVE simulations  (md
>>> integrator) on ~93'000 atom systems I ran the simulations in groups of 3
>>> using the -multi option in gromacs v4.6.1 double precision.
>>>
>>> Simulations were run with 1 OpenMP thread per MPI process
>>>
>>> The simulations were restarted at regular intervals using the following
>>> submission script:
>>>
>>>
>>> FILE=4.6_P84_DIO_
>>>
>>> module load fftw xe-gromacs/4.6.1
>>>
>>> # Change to the direcotry that the job was submitted from
>>> cd $PBS_O_WORKDIR
>>>
>>> export NPROC=`qstat -f $PBS_JOBID | grep mppwidth | awk '{print $3}'`
>>> export NTASK=`qstat -f $PBS_JOBID | grep mppnppn  | awk '{print $3}'`
>>>
>>> aprun -n $NPROC -N $NTASK mdrun_mpi_d  -deffnm $FILE  -maxh 24 -multi 3
>>> -npme 64 -append -cpi
>>>
>>>
>>>
>>> ###
>>>
>>> The first simulation was run with the same script except the mdrun line
>>> was
>>>
>>> aprun -n $NPROC -N $NTASK mdrun_mpi_d  -deffnm $FILE  -maxh 24 -multi 3
>>> -npme 64
>>>
>>> ###
>>>
>>>
>>> The simulations generally ran and restarted without trouble, however, in
>>> several simulations the energy drift changed radically following the
>>> restart.
>>>
>>> in one simulation the simulation ran for 50 ns (including one restart)
>>> with
>>> a drift of -141.6 +/- 0.1 kJ mol^-1 ns^1
>>> it was restarted then had a drift of +104 +/- 1 kJ mol^-1 ns^1 for 15 ns
>>> then was restarted and continued with a drift of -138 +/- 0.1 kJ mol^-1
>>> ns^1
>>> for a further 50~ns.
>>>
>>> The other 2 simulations running in parallel with this calculation through
>>> the -multi option did not experience a change in gradient.
>>>
>>> the drifts were calculated by least squares analysis of the output from
>>> the
>>> total energy data given by
>>>
>>> echo "total" | g_energy_d -f ${FILE}${i}.edr -o total_${FILE}${i}.xvg
>>> -xvg
>>> none
>>>
>>>
>>> The simulation writes to the edr every 20 ps and the transition is
>>> masked by
>>> the expected oscillations in energy due to the integrator on a 2~ns
>>> interval
>>> but the change in drift is clear when looking at a 4~ns range centred on
>>> the
>>> restart.
>>>
>>> The hardware used was of the same specification for all jobs (27 cray XE6
>>> nodes (9 nodes per simulation), 32 mpi processes per node)
>>>
>>> The simulations use the verlet cut-off scheme
>>> there are H-bond constraints enforced using lincs (order 6, iterations 2)
>>>
>>>
>>> I can't think what would cause this change in the drift during a restart.
>>> However, I have seen it in simulations run on both an AMD system (cray
>>> XE6,
>>> AVX-FMA) and an intel system  (SGI-ice, SSE4.1).
>>>
>>>
>>> I have some data generated using the same procedure using v4.5.5 and
>>> v4.5.7
>>> (different cut-off scheme) and the restarts in that system have not
>>> caused
>>> any appreciable changes in the simulation.
>>>
>>> Unfortunately I didn't save the checkpoint files used for the restart (I
>>> will in the future). I'm going to try building a new input file from just
>>> before the restart using the trr trajectory data.
>>>
>>>
>>> Does anyone have any ideas of what might have caused this?
>>>
>>> Has anyone seen similar effects?
>>>
>>> Thanks,
>>>
>>> Richard
>>> --
>>> gmx-users mailing list    gmx-users at gromacs.org
>>> http://lists.gromacs.org/**mailman/listinfo/gmx-users<http://lists.gromacs.org/mailman/listinfo/gmx-users>
>>> * Please search the archive at
>>> http://www.gromacs.org/**Support/Mailing_Lists/Search<http://www.gromacs.org/Support/Mailing_Lists/Search>before posting!
>>> * Please don't post (un)subscribe requests to the list. Use the www
>>> interface or send it to gmx-users-request at gromacs.org.
>>> * Can't post? Read http://www.gromacs.org/**Support/Mailing_Lists<http://www.gromacs.org/Support/Mailing_Lists>
>>>
>> --
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/**mailman/listinfo/gmx-users<http://lists.gromacs.org/mailman/listinfo/gmx-users>
> * Please search the archive at http://www.gromacs.org/**
> Support/Mailing_Lists/Search<http://www.gromacs.org/Support/Mailing_Lists/Search>before posting!
> * Please don't post (un)subscribe requests to the list. Use the www
> interface or send it to gmx-users-request at gromacs.org.
> * Can't post? Read http://www.gromacs.org/**Support/Mailing_Lists<http://www.gromacs.org/Support/Mailing_Lists>
>



More information about the gromacs.org_gmx-users mailing list