[gmx-users] intermittent changes in energy drift following simulation restarts in v4.6.1

Richard Broadbent richard.broadbent09 at imperial.ac.uk
Mon Sep 9 17:09:52 CEST 2013


Hi Mark,

Thanks for the quick response,

On 09/09/13 15:45, Mark Abraham wrote:
> Sounds worrying :-( Thanks for the detailed report and
> trouble-shooting! So far, I can't think of a reason for it.
>
> A couple of suggestions:
> * try again with 4.6.3 (at least while trouble-shooting) in case its a fixed bug
I'll test that side by side with 4.6.1 that way we can have both for 
comparison
> * post a representative .mdp file
its below this message the production run is built using tpbconv -extend
on the .tpr built from that .mdp.

> * is there anything out of the ordinary in the topology?
I built the residues myself but they're just standard polymer monomer 
units nothing out of the ordinary.

> * if the problem is restart-related and shows up in the drift quickly,
> then you can probably find a reproducible case via a job that does
> lots of short-interval restarts and saves all the intermediate files -
> a (set of) inputs that can reproduce the problem sounds like what we'd
> need to diagnose and/or fix anything
I'm already starting to build them will be testing them tomorrow
> * does it happen in a non-multi simulation? (or more particularly,
> what are you doing with -multi?)
The -multi was used to move the job into a faster queue I've seen it in 
non -multi jobs
> * check .log files for warnings, and that there are none being
> suppressed at the grompp stage
There are no errors at grompp stage
  I haven't identified any warnings in the mdrun logs but I'm going to 
have a another look before I'm 100% certain that there aren't any in 
there but I couldn't see any on a first look through

> * see if the group cut-off scheme in 4.6.x shows the same problem
>
Will do

> Mark


Thanks,

Richard


integrator = md
bd_fric     = 0

dt = 0.002

nsteps = 2500000

comm_mode = linear

nstxout = 100000
nstvout = 100000
nstfout = 0

xtc_grps = P84
nstxtcout = 50000

nstlog = 100000

nstenergy = 50000

pbc = xyz
periodic_molecules = no

ns_type             = grid
nstlist             = 10

rlist = 1.25
optimize_fft = yes
fourier_nx = 128
fourier_ny = 128
fourier_nz = 128

pme_order       = 4
epsilon_r       = 1.0

coulombtype = pme
coulomb-modifier = Potential-shift-Verlet
rcoulomb = 1.2

vdwtype = cut-off
vdw-modifier = Potential-shift-Verlet

rvdw = 1.20

DispCorr = EnerPres

tcoupl = no

nsttcouple = 5

pcoupl = no

constraints = h-bonds

lincs_order = 6
lincs_iter = 2

cutoff-scheme = Verlet
verlet-buffer-drift = -1


>
>
> On Mon, Sep 9, 2013 at 4:08 PM, Richard Broadbent
> <richard.broadbent09 at imperial.ac.uk> wrote:
>> Dear All,
>>
>> I've been analysing a series of long (200 ns) NVE simulations  (md
>> integrator) on ~93'000 atom systems I ran the simulations in groups of 3
>> using the -multi option in gromacs v4.6.1 double precision.
>>
>> Simulations were run with 1 OpenMP thread per MPI process
>>
>> The simulations were restarted at regular intervals using the following
>> submission script:
>>
>>
>> FILE=4.6_P84_DIO_
>>
>> module load fftw xe-gromacs/4.6.1
>>
>> # Change to the direcotry that the job was submitted from
>> cd $PBS_O_WORKDIR
>>
>> export NPROC=`qstat -f $PBS_JOBID | grep mppwidth | awk '{print $3}'`
>> export NTASK=`qstat -f $PBS_JOBID | grep mppnppn  | awk '{print $3}'`
>>
>> aprun -n $NPROC -N $NTASK mdrun_mpi_d  -deffnm $FILE  -maxh 24 -multi 3
>> -npme 64 -append -cpi
>>
>>
>>
>> ###
>>
>> The first simulation was run with the same script except the mdrun line was
>>
>> aprun -n $NPROC -N $NTASK mdrun_mpi_d  -deffnm $FILE  -maxh 24 -multi 3
>> -npme 64
>>
>> ###
>>
>>
>> The simulations generally ran and restarted without trouble, however, in
>> several simulations the energy drift changed radically following the
>> restart.
>>
>> in one simulation the simulation ran for 50 ns (including one restart) with
>> a drift of -141.6 +/- 0.1 kJ mol^-1 ns^1
>> it was restarted then had a drift of +104 +/- 1 kJ mol^-1 ns^1 for 15 ns
>> then was restarted and continued with a drift of -138 +/- 0.1 kJ mol^-1 ns^1
>> for a further 50~ns.
>>
>> The other 2 simulations running in parallel with this calculation through
>> the -multi option did not experience a change in gradient.
>>
>> the drifts were calculated by least squares analysis of the output from the
>> total energy data given by
>>
>> echo "total" | g_energy_d -f ${FILE}${i}.edr -o total_${FILE}${i}.xvg -xvg
>> none
>>
>>
>> The simulation writes to the edr every 20 ps and the transition is masked by
>> the expected oscillations in energy due to the integrator on a 2~ns interval
>> but the change in drift is clear when looking at a 4~ns range centred on the
>> restart.
>>
>> The hardware used was of the same specification for all jobs (27 cray XE6
>> nodes (9 nodes per simulation), 32 mpi processes per node)
>>
>> The simulations use the verlet cut-off scheme
>> there are H-bond constraints enforced using lincs (order 6, iterations 2)
>>
>>
>> I can't think what would cause this change in the drift during a restart.
>> However, I have seen it in simulations run on both an AMD system (cray XE6,
>> AVX-FMA) and an intel system  (SGI-ice, SSE4.1).
>>
>>
>> I have some data generated using the same procedure using v4.5.5 and v4.5.7
>> (different cut-off scheme) and the restarts in that system have not caused
>> any appreciable changes in the simulation.
>>
>> Unfortunately I didn't save the checkpoint files used for the restart (I
>> will in the future). I'm going to try building a new input file from just
>> before the restart using the trr trajectory data.
>>
>>
>> Does anyone have any ideas of what might have caused this?
>>
>> Has anyone seen similar effects?
>>
>> Thanks,
>>
>> Richard
>> --
>> gmx-users mailing list    gmx-users at gromacs.org
>> http://lists.gromacs.org/mailman/listinfo/gmx-users
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
>> * Please don't post (un)subscribe requests to the list. Use the www
>> interface or send it to gmx-users-request at gromacs.org.
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists



More information about the gromacs.org_gmx-users mailing list