[gmx-developers] Bug in nbnxn grid construction?

Justin Lemkul jalemkul at vt.edu
Sun Feb 14 21:11:54 CET 2016


http://redmine.gromacs.org/issues/1902

Thanks, Berk.

-Justin

On 2/14/16 3:03 PM, Berk Hess wrote:
> Hi,
>
> Have you files an issue on redmine? I'd not, please do so and attach one or more tprs that cause the segfaults. If it segfaults, we should be able to find the source of this issue right away.
>
> Cheers,
>
> Berk
>
> On Feb 14, 2016 7:20 PM, Justin Lemkul <jalemkul at vt.edu> wrote:
>>
>>
>> Hi Devs,
>>
>> I have managed to trigger a bug when running some relative free energy
>> calculations, but it is not specific to the free energy code itself.  The bug is
>> also not always triggered in different (yet similar) systems.  We have identical
>> workflows in other protein-ligand complexes that work fine, it's just the last
>> two that we're doing that have failed.  The ligands have virtual sites (a new
>> approach we have developed related to halogen parameters); removing these
>> virtual sites or removing the ligand entirely allows for stable calculations.
>> But again, other systems with virtual sites work fine using the same workflow
>> and the same ligands in water (doing the free energy transformation) run fine,
>> so the ligand topology is sane.
>>
>> Here is the summary of what I have tried:
>>
>> 1. Free energy + Verlet = immediate seg fault in energy minimization
>> 2. FE off + Verlet = seg fault
>> 3. FE + group = seg fault
>> 4. FE off + group = WORKS
>> 5. Single-point energy (FE off + Verlet) = seg fault (so it is not specific to
>> the minimizer, as I use the md integrator in the .mdp file)
>> 6. Generic kernels seg fault
>> 7. Disabling SIMD (GMX_SIMD=None when running cmake) seg faults
>> 8. Doing an absolute free energy calculation runs fine
>>
>> It's Coul(SR) that shows -nan, as other terms, aside from the ones that depend
>> on Coul(SR), are OK.
>>
>> Steepest Descents:
>>      Tolerance (Fmax)   =  1.00000e+01
>>      Number of steps    =         5000
>>              Step           Time         Lambda
>>                 0        0.00000        0.00000
>>
>>      Energies (kJ/mol)
>>              Bond            U-B    Proper Dih.  Improper Dih.      CMAP Dih.
>>       1.19653e+03    3.68663e+03    1.06325e+04    1.25430e+02   -1.45477e+02
>>             LJ-14     Coulomb-14        LJ (SR)  Disper. corr.   Coulomb (SR)
>>       2.68056e+03    3.50864e+04    2.24225e+05   -2.98888e+03           -nan
>>      Coul. recip.      Potential Pres. DC (bar) Pressure (bar)    dVremain/dl
>>       9.54634e+03           -nan    0.00000e+00           -nan           -nan
>>          dEkin/dl      dVcoul/dl       dVvdw/dl dVrestraint/dl   Constr. rmsd
>>       0.00000e+00           -nan   -2.57387e+01    0.00000e+00    1.57006e-06
>>
>> The problem can be reproduced in version 5.1 (which we've been using for a while
>> now in this project) or in current git master.  gmx dump indicates that
>> everything I have asked for (A and B states, etc) in the topology is correct.
>> Interestingly, when I tried to zero out the ligand charges with convert-tpr
>> -zeroq as a test of the Coulomb calculation, I got another seg fault, but I
>> don't know how that would be related.  Running convert-tpr -zeroq on the ligand
>> in water .tpr works fine.
>>
>> Appended below is a gdb backtrace of the seg fault triggered with my desired run
>> settings.
>>
>> Any help would be greatly appreciated!  I'm pretty desperate at this point
>> because this is the last part of a long-awaited paper for new force field
>> parameters.  I can share any necessary .tpr or other input files with anyone who
>> will take a look, but I was hesitant to post to Redmine as these are new force
>> field parameters that we are just about to publish after a few years of work.
>> I've tried debugging a bit myself but the parts of the code that are failing are
>> very cryptic to me, so I'm hitting a wall.
>>
>> -Justin
>>
>> Steepest Descents:
>>      Tolerance (Fmax)   =  1.00000e+01
>>      Number of steps    =         5000
>> Step=    0, Dmax= 1.0e-02 nm, Epot=         -nan Fmax= 2.02165e+05, atom= 10613
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> 0x00007ffff78f3cc7 in sort_atoms.isra.23.constprop () from
>> /state/partition1/home/jalemkul/software/gromacs/5.1.0/bin/../lib/libgromacs.so.1
>> (gdb) bt
>> #0  0x00007ffff78f3cc7 in sort_atoms.isra.23.constprop () from
>> /state/partition1/home/jalemkul/software/gromacs/5.1.0/bin/../lib/libgromacs.so.1
>> #1  0x00007ffff7901b75 in calc_cell_indices._omp_fn () from
>> /state/partition1/home/jalemkul/software/gromacs/5.1.0/bin/../lib/libgromacs.so.1
>> #2  0x00007ffff7904350 in nbnxn_put_on_grid () from
>> /state/partition1/home/jalemkul/software/gromacs/5.1.0/bin/../lib/libgromacs.so.1
>> #3  0x00007ffff79f4182 in do_force_cutsVERLET(_IO_FILE*, t_commrec*,
>> t_inputrec*, long, t_nrnb*, gmx_wallcycle*, gmx_localtop_t*, gmx_groups_t*,
>> float (*) [3], float (*) [3], history_t*, float (*) [3], float (*) [3],
>> t_mdatoms*, gmx_enerdata_t*, t_fcdata*, float*, t_graph*, t_forcerec*,
>> interaction_const_t*, gmx_vsite_t*, float*, double, _IO_FILE*, gmx_edsam*, int,
>> int) () from
>> /state/partition1/home/jalemkul/software/gromacs/5.1.0/bin/../lib/libgromacs.so.1
>> #4  0x00007ffff79fae22 in do_force () from
>> /state/partition1/home/jalemkul/software/gromacs/5.1.0/bin/../lib/libgromacs.so.1
>> #5  0x00007ffff79b9b0a in evaluate_energy(_IO_FILE*, t_commrec*, gmx_mtop_t*,
>> em_state_t*, gmx_localtop_t*, t_inputrec*, t_nrnb*, gmx_wallcycle*,
>> gmx_global_stat*, gmx_vsite_t*, gmx_constr*, t_fcdata*, t_graph*, t_mdatoms*,
>> t_forcerec*, float*, gmx_enerdata_t*, float (*) [3], float (*) [3], long, int) ()
>>      from
>> /state/partition1/home/jalemkul/software/gromacs/5.1.0/bin/../lib/libgromacs.so.1
>> #6  0x00007ffff79cc8ab in do_steep () from
>> /state/partition1/home/jalemkul/software/gromacs/5.1.0/bin/../lib/libgromacs.so.1
>> #7  0x0000000000417148 in mdrunner ()
>> #8  0x000000000042caf0 in gmx_mdrun(int, char**) ()
>> #9  0x00007ffff6cc36bd in gmx::CommandLineModuleManager::run(int, char**) ()
>> from
>> /state/partition1/home/jalemkul/software/gromacs/5.1.0/bin/../lib/libgromacs.so.1
>> #10 0x000000000040cdbc in main ()
>>
>> --
>> ==================================================
>>
>> Justin A. Lemkul, Ph.D.
>> Ruth L. Kirschstein NRSA Postdoctoral Fellow
>>
>> Department of Pharmaceutical Sciences
>> School of Pharmacy
>> Health Sciences Facility II, Room 629
>> University of Maryland, Baltimore
>> 20 Penn St.
>> Baltimore, MD 21201
>>
>> jalemkul at outerbanks.umaryland.edu | (410) 706-7441
>> http://mackerell.umaryland.edu/~jalemkul
>>
>> ==================================================
>> --
>> Gromacs Developers mailing list
>>
>> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers or send a mail to gmx-developers-request at gromacs.org.

-- 
==================================================

Justin A. Lemkul, Ph.D.
Ruth L. Kirschstein NRSA Postdoctoral Fellow

Department of Pharmaceutical Sciences
School of Pharmacy
Health Sciences Facility II, Room 629
University of Maryland, Baltimore
20 Penn St.
Baltimore, MD 21201

jalemkul at outerbanks.umaryland.edu | (410) 706-7441
http://mackerell.umaryland.edu/~jalemkul

==================================================


More information about the gromacs.org_gmx-developers mailing list