[gmx-users] Possible free energy bug?
Matthew Zwier
mczwier at gmail.com
Fri Mar 11 02:44:34 CET 2011
Dear Justin,
We recently experienced a similar problem (LINCS errors, step*.pdb
files), and then GROMACS usually segfaulted. The cause was a
miscompiled copy of GROMACS. Another member of our group had compiled
GROMACS on an Intel Core2 quad (gcc -march=core2) and tried to run the
copy without modification on an AMD Magny Cours machine.
Recompilation with the correct subarchitecture type (-march=amdfam10)
fixed the problem. Don't really know why it didn't die with SIGILL or
SIGBUS instead of SIGSEGV, but that's probably a question for the
hardware gurus.
So...are you observing segfaults? What compiler are you using (and on
what OS)? What were the compilation parameters for 4.5.3? Also, are
you really running across nodes with MPI, or running on the same node
with MPI?
Cheers,
Matt Zwier
On Thu, Mar 10, 2011 at 1:55 PM, Justin A. Lemkul <jalemkul at vt.edu> wrote:
>
> Hi All,
>
> I've been troubleshooting a problem for some time now and I wanted to report
> it here and solicit some feedback before I submit a bug report to see if
> there's anything else I can try.
>
> Here's the situation: I ran some free energy calculations (thermodynamic
> integration) a long time ago using version 3.3.3 to determine the hydration
> free energy of a series of small molecules. Results were good and they
> ended up as part of a paper, so I'm trying to reproduce the methodology with
> 4.5.3 (using BAR) to see if I understand the workflow completely. The
> problem is my systems are crashing. The runs simply stop randomly (usually
> within a few hundred ps) with lots of LINCS warnings and step*.pdb files
> being written.
>
> I know the parameters are good, and produce stable trajectories, since I
> spent months on them some years ago. The system prep is steepest descents EM
> to Fmax < 100 (always achieved), NVT at 298 K for 100 ps, NPT at 298K/1 bar
> for 100 ps, then 5 ns of data collection under NPT conditions. Here's the
> rundown of what I'm seeing:
>
> 1. All LJ transformations work fine. The problem only comes when I have a
> molecule with full LJ interaction and I am "charging" it (i.e., introducing
> charges to the partially-interacting species).
>
> 2. Simulations at lambda=1 (full interaction) work fine.
>
> 3. Simulations with the free energy code off entirely work fine under all
> conditions.
>
> 4. I cannot run in serial due to http://redmine.gromacs.org/issues/715. The
> bug seems to affect other systems and is not specifically related to my free
> energy calculations.
>
> 5. Running with DD fails because my system is relatively small (more on this
> in a moment).
>
> 6. Running with mdrun -pd 2 works, but mdrun -pd 4 crashes for any value of
> lambda != 1.
>
> 7. I created a larger system (instead of a 3x3x3-nm cube of water with my
> molecule, I used 4x4x4) and ran on 4 CPU's with DD (lambda = 0, i.e. full
> vdW, no intermolecular Coulombic interactions - .mdp file is below). This
> run also crashed with some warnings about DD cell size:
>
> DD load balancing is limited by minimum cell size in dimension X
> DD step 329999 vol min/aver 0.748! load imb.: force 31.5%
>
> ...and then the actual crash:
>
> -------------------------------------------------------
> Program mdrun_4.5.3_gcc_mpi, VERSION 4.5.3
> Source code file: domdec_con.c, line: 693
>
> Fatal error:
> DD cell 0 0 0 could only obtain 14 of the 15 atoms that are connected via
> constraints from the neighboring cells. This probably means your constraint
> lengths are too long compared to the domain decomposition cell size.
> Decrease the number of domain decomposition grid cells or lincs-order or use
> the -rcon option of mdrun.
> For more information and tips for troubleshooting, please check the GROMACS
> website at http://www.gromacs.org/Documentation/Errors
> -------------------------------------------------------
>
> Watching the trajectory doesn't seem to give any useful information. The
> small molecule of interest is at a periodic boundary when the crash happens,
> but there are several crosses prior to the crash without incident, so I
> don't know if the issue is related to PBC or not, but it appears not.
>
> 8. I initially thought the problem might be related to the barostat, but
> switching from P-R to Berendsen does not alleviate the problem, nor does
> increasing tau_p (tested 0.5, 1.0, 2.0, and 5.0 - all crash). Longer tau_p
> simply delays the crash, but does not prevent it.
>
> So after all that, I'm wondering if (1) anyone has seen the same, or (2) if
> there's anything else I can try (environment variables, hidden tricks, etc)
> that I can use to get to the bottom of this before I give up and file a bug
> report.
>
> If you made it this far, thanks for reading my novel and hopefully someone
> can give me some ideas. The .mdp file I'm using is below, but it is just
> one of many that I've tried. In theory, it should work, since the
> parameters are the same as my successful 3.3.3 runs, with the exception of
> the new free energy features in 4.5.3 and obvious keyword changes related to
> the difference in version.
>
> -Justin
>
> --- .mdp file ---
>
> ; Run control
> integrator = sd ; Langevin dynamics
> tinit = 0
> dt = 0.002
> nsteps = 2500000 ; 5 ns
> nstcomm = 100
> ; Output control
> nstxout = 500
> nstvout = 500
> nstfout = 0
> nstlog = 500
> nstenergy = 500
> nstxtcout = 0
> xtc-precision = 1000
> ; Neighborsearching and short-range nonbonded interactions
> nstlist = 5
> ns_type = grid
> pbc = xyz
> rlist = 0.9
> ; Electrostatics
> coulombtype = PME
> rcoulomb = 0.9
> ; van der Waals
> vdw-type = cutoff
> rvdw = 1.4
> ; Apply long range dispersion corrections for Energy and Pressure
> DispCorr = EnerPres
> ; Spacing for the PME/PPPM FFT grid
> fourierspacing = 0.12
> ; EWALD/PME/PPPM parameters
> pme_order = 4
> ewald_rtol = 1e-05
> epsilon_surface = 0
> optimize_fft = no
> ; Temperature coupling
> ; tcoupl is implicitly handled by the sd integrator
> tc_grps = system
> tau_t = 1.0
> ref_t = 298
> ; Pressure coupling is on for NPT
> Pcoupl = Berendsen
> tau_p = 2.0
> compressibility = 4.5e-05
> ref_p = 1.0
> ; Free energy control stuff
> free_energy = yes
> init_lambda = 0.00
> delta_lambda = 0
> foreign_lambda = 0.05
> sc-alpha = 0
> sc-power = 1.0
> sc-sigma = 0
> couple-moltype = MOR ; name of moleculetype to couple
> couple-lambda0 = vdw ; vdW interactions
> couple-lambda1 = vdw-q ; turn on everything
> couple-intramol = no
> dhdl_derivatives = yes ; this line (and the next two) are
> defaults
> separate_dhdl_file = yes ; included only for pedantry
> nstdhdl = 10
> ; Do not generate velocities
> gen_vel = no
> ; options for bonds
> constraints = all-bonds
> ; Type of constraint algorithm
> constraint-algorithm = lincs
> ; Constrain the starting configuration
> ; since we are continuing from NPT
> continuation = yes
> ; Highest order in the expansion of the constraint coupling matrix
> lincs-order = 4
>
>
> --
> ========================================
>
> Justin A. Lemkul
> Ph.D. Candidate
> ICTAS Doctoral Scholar
> MILES-IGERT Trainee
> Department of Biochemistry
> Virginia Tech
> Blacksburg, VA
> jalemkul[at]vt.edu | (540) 231-9080
> http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin
>
> ========================================
> --
> gmx-users mailing list gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> Please don't post (un)subscribe requests to the list. Use the www interface
> or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
More information about the gromacs.org_gmx-users
mailing list