[gmx-users] Strong egative energy drift (losing energy) in explicit water AMBER protein simulation

Wed Jun 13 16:36:21 CEST 2012

On 6/13/12 10:20 AM, ms wrote:
> Hi,
>
> I am trying to prepare a simple system for tests with CUDA. My guinea pig is the
> lysozyme system from this tutorial:
> http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin/gmx-tutorials/lysozyme/01_pdb2gmx.html
>
>
> but I prepared it using the AMBER99sb-ildn force field and the SPCE water model.
>
> After extensive minimization steps, I started a (CPU) very short test run (3 ns)
> to check if everything was ok before beginning to deal with CUDA.
>
> However I found that there is a strong drift in energy and temperature:
>
>  >>>>>g_energy_d output:
>
>   Opened 1AKI_production_GPU.edr as double precision energy file
>
>   Select the terms you want from the following list by
>   selecting either (part of) the name or the number or a combination.
>   End your selection with an empty line or a zero.
>   -------------------------------------------------------------------
>     1  Bond             2  Angle            3  Proper-Dih.      4 Improper-Dih.
>     5  LJ-14            6  Coulomb-14       7  LJ-(SR)          8 Coulomb-(SR)
>     9  Coul.-recip.    10  Potential       11  Kinetic-En.     12 Total- Energy
>    13  Temperature     14  Pressure        15  Box-X           16 Box-Y
>    17  Box-Z           18  Volume          19  Density         20  pV
>    21  Enthalpy        22  Vir-XX          23  Vir-XY          24 Vir-XZ
>    25  Vir-YX          26  Vir-YY          27  Vir-YZ          28 Vir-ZX
>    29  Vir-ZY          30  Vir-ZZ          31  Pres-XX         32 Pres-XY
>    33  Pres-XZ         34  Pres-YX         35  Pres-YY         36 Pres-YZ
>    37  Pres-ZX         38  Pres-ZY         39  Pres-ZZ         40 #Surf*SurfTen
>    41  Box-Vel-XX      42  Box-Vel-YY      43  Box-Vel-ZZ      44  Mu-X
>    45  Mu-Y                                46  Mu-Z
>    47  Coul-SR:Protein-Protein             48  LJ-SR:Protein-Protein
>    49  Coul-14:Protein-Protein             50  LJ-14:Protein-Protein
>    51  Coul-SR:Protein-non-Protein         52  LJ-SR:Protein-non-Protein
>    53  Coul-14:Protein-non-Protein         54  LJ-14:Protein-non-Protein
>    55  Coul-SR:non-Protein-non-Protein     56 LJ-SR:non-Protein-non-Protein
>    57  Coul-14:non-Protein-non-Protein     58 LJ-14:non-Protein-non-Protein
>    59  T-System
>
> 10
> 11
> 12
> 13
> 14
>
>   Last energy frame read 30000 time 5000.000
>
>   Statistics over 3000001 steps [ 2000.0000 through 5000.0000 ps ], 5 data sets
>   All statistics are over 1500001 points
>
>   Energy                      Average   Err.Est.       RMSD  Tot-Drift
>
> -------------------------------------------------------------------------------
>   Potential                   -529618       1500    3004.68   -10322.5  (kJ/mol)
>   Kinetic En.                 86141.8        610     1295.7   -4294.67 (kJ/mol)
>   Total Energy                -443476       2100    4220.26   -14617.1 (kJ/mol)
>   Temperature                  292.49        2.1    4.39949   -14.5823   (K)
>   Pressure                    1.02119     0.0046    133.601 -0.0134421 (bar)
>
>   You may want to use the -driftcorr flag in order to correct
>   for spurious drift in the graphs. Note that this is not
>   a substitute for proper equilibration and sampling!
>
>   WARNING: nmol = 1, this may not be what you want.
>
> Temperature dependent fluctuation properties at T = 292.49.
>
>
> The MDP is:
>
>  >>>>>production_GPU.mdp
>
> ; PREPROCESSING OPTIONS
> title            = production run 1 for GPU usage
> include            =
> define            =
> ; RUN CONTROL PARAMETERS
> integrator        = md        ; for GPUs: "Option md is accepted but keep in
> mind that the actual algorithm is not leap-frog."
>
> tinit            = 2000
> dt            = 0.001        ; 1 fs
> nsteps            = 3000000    ; 3 ns
> init_step        = 0
> ; CENTER OF MASS MOTION REMOVAL
> nstcomm            = 1
> comm_mode        = linear
> comm_grps        = protein non-protein
> ; OUTPUT CONTROL
> nstxout            = 2000        ; Doubled these because disk output is a strong
> bottleneck apparently
>
> nstvout            = 2000
> nstfout            = 0
> nstlog            = 100
> nstenergy        = 100
> nstxtcout        = 100
> nstcalcenergy           = -1
> xtcprecision        = 1000
> energygrps        = protein non-protein
> ; NEIGHBOR SEARCHING PARAMETERS
> nstlist            = 2.
> ns_type            = grid
> pbc            = xyz
> rlist            = 1.0
> ; OPTIONS FOR ELECTROSTATICS AND VDW
> coulombtype        = PME        ; this is OK for GPUs
> rcoulomb_switch        = 0.
> rcoulomb        = 1.0
> epsilon_r        = 1
> vdwtype            = cut-off
> rvdw-switch        = 0.
> rvdw            = 1.0
> DispCorr        = no
> fourierspacing        = 0.12
> fourier_nx        = 0
> fourier_ny        = 0
> fourier_nz        = 0
> pme_order        = 4
> ewald_rtol        = 1e-05
> epsilon_surface        = 0
> optimize_fft        = no
> ; TEMPERATURE COUPLING
> tcoupl            = andersen    ; All values of this are equivalent to
> "andersen" in GPU mode
> tc_grps            = system
> tau_t            = 0.1
> ref_t            = 300.00
> ; PRESSURE COUPLING
> pcoupl            = parrinello-rahman    ; "OpenMM implements the Monte Carlo
> barostat. All values for Pcoupl are thus accepted."
>
> pcoupltype        = isotropic
> tau_p            = 1.0
> compressibility        = 4.5e-05
> ref_p            = 1.0
> ; VELOCITY GENERATION
> gen_vel            = no
>
> <<<<<<<<
>
> Command lines are:
>
>   grompp_d -f production_GPU.mdp -c 1AKI_em4sol.gro -p topol.top -o
> 1AKI_production_GPU.tpr
>

Here, you're not preserving any of the previous state information.  You're 
picking up from 2 ns, but not passing a .cpt file to grompp - the previous state 
is lost.  Is that what you want?  In conjunction with "gen_vel = no" I suspect 
you could see some instabilities.

>   mpirun -np 8 mdrun_d -v -deffn 1AKI_production_GPU -s 1AKI_production_GPU.tpr
> -g 1AKI_production_GPU.log -c 1AKI_production_GPU.gro -o 1AKI_production_GPU.trr
> -g 1AKI_production_GPU.log -e 1AKI_production_GPU.edr
>

As an aside, proper use of -deffnm (not -deffn) saves you all of this typing :)

mpirun -np 8 mdrun_d -v -deffnm 1AKI_production_GPU

That's all you need.

> I am using Gromacs 4.5.5 compiled in double precision.
>
> I am very rusty with Gromacs, since I last dealt molecular dynamics more than 1
> year ago :) , so probably I am missing something obvious. Any hint on where
> should I look for to solve the problem? (Also, advice on if the .mdp is indeed
> correct for CUDA simulations are welcome)
>

I see the same whenever I run on GPU, but my systems are always implicit 
solvent.  Do you get reasonable performance with an explicit solvent PME system 
on GPU?  I thought that was supposed to be really slow.

Do you observe similar effects on CPU?  My tests have always indicated that 
equivalent systems on CPU are far more stable (energetically and structurally) 
than on GPU.  I have never had any real luck on GPU.  I get great performance, 
and then crashes ;)

Maybe others can share their experiences.

-Justin

-- 
========================================

Justin A. Lemkul, Ph.D.
Research Scientist
Department of Biochemistry
Virginia Tech
Blacksburg, VA
jalemkul[at]vt.edu | (540) 231-9080
http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin

========================================