[gmx-developers] Bug in nbnxn grid construction?
Justin Lemkul
jalemkul at vt.edu
Sun Feb 14 19:20:44 CET 2016
Hi Devs,
I have managed to trigger a bug when running some relative free energy
calculations, but it is not specific to the free energy code itself. The bug is
also not always triggered in different (yet similar) systems. We have identical
workflows in other protein-ligand complexes that work fine, it's just the last
two that we're doing that have failed. The ligands have virtual sites (a new
approach we have developed related to halogen parameters); removing these
virtual sites or removing the ligand entirely allows for stable calculations.
But again, other systems with virtual sites work fine using the same workflow
and the same ligands in water (doing the free energy transformation) run fine,
so the ligand topology is sane.
Here is the summary of what I have tried:
1. Free energy + Verlet = immediate seg fault in energy minimization
2. FE off + Verlet = seg fault
3. FE + group = seg fault
4. FE off + group = WORKS
5. Single-point energy (FE off + Verlet) = seg fault (so it is not specific to
the minimizer, as I use the md integrator in the .mdp file)
6. Generic kernels seg fault
7. Disabling SIMD (GMX_SIMD=None when running cmake) seg faults
8. Doing an absolute free energy calculation runs fine
It's Coul(SR) that shows -nan, as other terms, aside from the ones that depend
on Coul(SR), are OK.
Steepest Descents:
Tolerance (Fmax) = 1.00000e+01
Number of steps = 5000
Step Time Lambda
0 0.00000 0.00000
Energies (kJ/mol)
Bond U-B Proper Dih. Improper Dih. CMAP Dih.
1.19653e+03 3.68663e+03 1.06325e+04 1.25430e+02 -1.45477e+02
LJ-14 Coulomb-14 LJ (SR) Disper. corr. Coulomb (SR)
2.68056e+03 3.50864e+04 2.24225e+05 -2.98888e+03 -nan
Coul. recip. Potential Pres. DC (bar) Pressure (bar) dVremain/dl
9.54634e+03 -nan 0.00000e+00 -nan -nan
dEkin/dl dVcoul/dl dVvdw/dl dVrestraint/dl Constr. rmsd
0.00000e+00 -nan -2.57387e+01 0.00000e+00 1.57006e-06
The problem can be reproduced in version 5.1 (which we've been using for a while
now in this project) or in current git master. gmx dump indicates that
everything I have asked for (A and B states, etc) in the topology is correct.
Interestingly, when I tried to zero out the ligand charges with convert-tpr
-zeroq as a test of the Coulomb calculation, I got another seg fault, but I
don't know how that would be related. Running convert-tpr -zeroq on the ligand
in water .tpr works fine.
Appended below is a gdb backtrace of the seg fault triggered with my desired run
settings.
Any help would be greatly appreciated! I'm pretty desperate at this point
because this is the last part of a long-awaited paper for new force field
parameters. I can share any necessary .tpr or other input files with anyone who
will take a look, but I was hesitant to post to Redmine as these are new force
field parameters that we are just about to publish after a few years of work.
I've tried debugging a bit myself but the parts of the code that are failing are
very cryptic to me, so I'm hitting a wall.
-Justin
Steepest Descents:
Tolerance (Fmax) = 1.00000e+01
Number of steps = 5000
Step= 0, Dmax= 1.0e-02 nm, Epot= -nan Fmax= 2.02165e+05, atom= 10613
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff78f3cc7 in sort_atoms.isra.23.constprop () from
/state/partition1/home/jalemkul/software/gromacs/5.1.0/bin/../lib/libgromacs.so.1
(gdb) bt
#0 0x00007ffff78f3cc7 in sort_atoms.isra.23.constprop () from
/state/partition1/home/jalemkul/software/gromacs/5.1.0/bin/../lib/libgromacs.so.1
#1 0x00007ffff7901b75 in calc_cell_indices._omp_fn () from
/state/partition1/home/jalemkul/software/gromacs/5.1.0/bin/../lib/libgromacs.so.1
#2 0x00007ffff7904350 in nbnxn_put_on_grid () from
/state/partition1/home/jalemkul/software/gromacs/5.1.0/bin/../lib/libgromacs.so.1
#3 0x00007ffff79f4182 in do_force_cutsVERLET(_IO_FILE*, t_commrec*,
t_inputrec*, long, t_nrnb*, gmx_wallcycle*, gmx_localtop_t*, gmx_groups_t*,
float (*) [3], float (*) [3], history_t*, float (*) [3], float (*) [3],
t_mdatoms*, gmx_enerdata_t*, t_fcdata*, float*, t_graph*, t_forcerec*,
interaction_const_t*, gmx_vsite_t*, float*, double, _IO_FILE*, gmx_edsam*, int,
int) () from
/state/partition1/home/jalemkul/software/gromacs/5.1.0/bin/../lib/libgromacs.so.1
#4 0x00007ffff79fae22 in do_force () from
/state/partition1/home/jalemkul/software/gromacs/5.1.0/bin/../lib/libgromacs.so.1
#5 0x00007ffff79b9b0a in evaluate_energy(_IO_FILE*, t_commrec*, gmx_mtop_t*,
em_state_t*, gmx_localtop_t*, t_inputrec*, t_nrnb*, gmx_wallcycle*,
gmx_global_stat*, gmx_vsite_t*, gmx_constr*, t_fcdata*, t_graph*, t_mdatoms*,
t_forcerec*, float*, gmx_enerdata_t*, float (*) [3], float (*) [3], long, int) ()
from
/state/partition1/home/jalemkul/software/gromacs/5.1.0/bin/../lib/libgromacs.so.1
#6 0x00007ffff79cc8ab in do_steep () from
/state/partition1/home/jalemkul/software/gromacs/5.1.0/bin/../lib/libgromacs.so.1
#7 0x0000000000417148 in mdrunner ()
#8 0x000000000042caf0 in gmx_mdrun(int, char**) ()
#9 0x00007ffff6cc36bd in gmx::CommandLineModuleManager::run(int, char**) ()
from
/state/partition1/home/jalemkul/software/gromacs/5.1.0/bin/../lib/libgromacs.so.1
#10 0x000000000040cdbc in main ()
--
==================================================
Justin A. Lemkul, Ph.D.
Ruth L. Kirschstein NRSA Postdoctoral Fellow
Department of Pharmaceutical Sciences
School of Pharmacy
Health Sciences Facility II, Room 629
University of Maryland, Baltimore
20 Penn St.
Baltimore, MD 21201
jalemkul at outerbanks.umaryland.edu | (410) 706-7441
http://mackerell.umaryland.edu/~jalemkul
==================================================
More information about the gromacs.org_gmx-developers
mailing list