[gmx-developers] Bug in nbnxn grid construction?

Justin Lemkul jalemkul at vt.edu
Sun Feb 14 19:20:44 CET 2016

Hi Devs,

I have managed to trigger a bug when running some relative free energy 
calculations, but it is not specific to the free energy code itself.  The bug is 
also not always triggered in different (yet similar) systems.  We have identical 
workflows in other protein-ligand complexes that work fine, it's just the last 
two that we're doing that have failed.  The ligands have virtual sites (a new 
approach we have developed related to halogen parameters); removing these 
virtual sites or removing the ligand entirely allows for stable calculations. 
But again, other systems with virtual sites work fine using the same workflow 
and the same ligands in water (doing the free energy transformation) run fine, 
so the ligand topology is sane.

Here is the summary of what I have tried:

1. Free energy + Verlet = immediate seg fault in energy minimization
2. FE off + Verlet = seg fault
3. FE + group = seg fault
4. FE off + group = WORKS
5. Single-point energy (FE off + Verlet) = seg fault (so it is not specific to 
the minimizer, as I use the md integrator in the .mdp file)
6. Generic kernels seg fault
7. Disabling SIMD (GMX_SIMD=None when running cmake) seg faults
8. Doing an absolute free energy calculation runs fine

It's Coul(SR) that shows -nan, as other terms, aside from the ones that depend 
on Coul(SR), are OK.

Steepest Descents:
    Tolerance (Fmax)   =  1.00000e+01
    Number of steps    =         5000
            Step           Time         Lambda
               0        0.00000        0.00000

    Energies (kJ/mol)
            Bond            U-B    Proper Dih.  Improper Dih.      CMAP Dih.
     1.19653e+03    3.68663e+03    1.06325e+04    1.25430e+02   -1.45477e+02
           LJ-14     Coulomb-14        LJ (SR)  Disper. corr.   Coulomb (SR)
     2.68056e+03    3.50864e+04    2.24225e+05   -2.98888e+03           -nan
    Coul. recip.      Potential Pres. DC (bar) Pressure (bar)    dVremain/dl
     9.54634e+03           -nan    0.00000e+00           -nan           -nan
        dEkin/dl      dVcoul/dl       dVvdw/dl dVrestraint/dl   Constr. rmsd
     0.00000e+00           -nan   -2.57387e+01    0.00000e+00    1.57006e-06

The problem can be reproduced in version 5.1 (which we've been using for a while 
now in this project) or in current git master.  gmx dump indicates that 
everything I have asked for (A and B states, etc) in the topology is correct. 
Interestingly, when I tried to zero out the ligand charges with convert-tpr 
-zeroq as a test of the Coulomb calculation, I got another seg fault, but I 
don't know how that would be related.  Running convert-tpr -zeroq on the ligand 
in water .tpr works fine.

Appended below is a gdb backtrace of the seg fault triggered with my desired run 

Any help would be greatly appreciated!  I'm pretty desperate at this point 
because this is the last part of a long-awaited paper for new force field 
parameters.  I can share any necessary .tpr or other input files with anyone who 
will take a look, but I was hesitant to post to Redmine as these are new force 
field parameters that we are just about to publish after a few years of work. 
I've tried debugging a bit myself but the parts of the code that are failing are 
very cryptic to me, so I'm hitting a wall.


Steepest Descents:
    Tolerance (Fmax)   =  1.00000e+01
    Number of steps    =         5000
Step=    0, Dmax= 1.0e-02 nm, Epot=         -nan Fmax= 2.02165e+05, atom= 10613

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff78f3cc7 in sort_atoms.isra.23.constprop () from 
(gdb) bt
#0  0x00007ffff78f3cc7 in sort_atoms.isra.23.constprop () from 
#1  0x00007ffff7901b75 in calc_cell_indices._omp_fn () from 
#2  0x00007ffff7904350 in nbnxn_put_on_grid () from 
#3  0x00007ffff79f4182 in do_force_cutsVERLET(_IO_FILE*, t_commrec*, 
t_inputrec*, long, t_nrnb*, gmx_wallcycle*, gmx_localtop_t*, gmx_groups_t*, 
float (*) [3], float (*) [3], history_t*, float (*) [3], float (*) [3], 
t_mdatoms*, gmx_enerdata_t*, t_fcdata*, float*, t_graph*, t_forcerec*, 
interaction_const_t*, gmx_vsite_t*, float*, double, _IO_FILE*, gmx_edsam*, int, 
int) () from 
#4  0x00007ffff79fae22 in do_force () from 
#5  0x00007ffff79b9b0a in evaluate_energy(_IO_FILE*, t_commrec*, gmx_mtop_t*, 
em_state_t*, gmx_localtop_t*, t_inputrec*, t_nrnb*, gmx_wallcycle*, 
gmx_global_stat*, gmx_vsite_t*, gmx_constr*, t_fcdata*, t_graph*, t_mdatoms*, 
t_forcerec*, float*, gmx_enerdata_t*, float (*) [3], float (*) [3], long, int) ()
#6  0x00007ffff79cc8ab in do_steep () from 
#7  0x0000000000417148 in mdrunner ()
#8  0x000000000042caf0 in gmx_mdrun(int, char**) ()
#9  0x00007ffff6cc36bd in gmx::CommandLineModuleManager::run(int, char**) () 
#10 0x000000000040cdbc in main ()


Justin A. Lemkul, Ph.D.
Ruth L. Kirschstein NRSA Postdoctoral Fellow

Department of Pharmaceutical Sciences
School of Pharmacy
Health Sciences Facility II, Room 629
University of Maryland, Baltimore
20 Penn St.
Baltimore, MD 21201

jalemkul at outerbanks.umaryland.edu | (410) 706-7441


More information about the gromacs.org_gmx-developers mailing list