[gmx-users] Re: Segmentation fault, mdrun_mpi

Justin Lemkul jalemkul at vt.edu
Wed Oct 10 19:38:22 CEST 2012



On 10/10/12 1:33 PM, Ladasky wrote:
> Update:
>
>
> Ladasky wrote
>>
>> Justin Lemkul wrote
>>> Random segmentation faults are really hard to debug.  Can you resume the
>>> run
>>> using a checkpoint file?  That would suggest maybe an MPI problem or
>>> something
>>> else external to Gromacs.  Without a reproducible system and a debugging
>>> backtrace, it's going to be hard to figure out where the problem is
>>> coming from.
>> Thanks for that tip, Justin.  I tried to resume one run which failed at
>> 1.06 million cycles, and it WORKED.  It proceeded all the way to the 2.50
>> million cycles that I designated.  I now have two separate .trr files, but
>> I suppose they can be merged.
>>
>> I don't know whether my crashes are random yet.  I will try re-running
>> that simulation again from time zero, to see whether it segfaults at the
>> same place.  If it doesn't, then I have a problem which may have nothing
>> to do with GROMACS.
>
> I just tried exactly that, a re-run of the same structure.  This time, it
> ran without stopping, from time zero to 2.50 million cycles!  No crash at
> 1.06 million cycles this time.
>
> Unless GROMACS is using some random number generator which affects the
> outcome of repeated simulations (and I think that the only time that random
> number generation would be needed would be when initial velocities are
> generated, which was done during the earlier equilibration step), I will
> conclude that my simulation conditions are indeed acceptable, and that
> sometimes the software just behaves badly.
>

There are plenty of things that can differ between runs (unless you've turned 
off optimizations and are using the -reprod option), but in all practical sense, 
they should not lead to random seg faults.

> Is that a common occurrence?
>

Based on the fact that very few people post seg fault problems that are not 
precipitated by actual crashes (i.e. LINCS warnings), I would say no.  There is 
no evidence yet to suggest what the real problem is, but until such time, 
Gromacs is innocent until proven guilty ;)

> I could write a script which just automatically restarts my simulations
> provided that they 1) ran for a decent number of cycles and b) exited with a
> segmentation fault error.  I could then have the script check in after a few
> minutes to make sure that they haven't crashed again, and soldier on.
>

That's an option.  If you're running in a queue system, there may be 
notification options if something goes wrong, as well.

-Justin

-- 
========================================

Justin A. Lemkul, Ph.D.
Research Scientist
Department of Biochemistry
Virginia Tech
Blacksburg, VA
jalemkul[at]vt.edu | (540) 231-9080
http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin

========================================



More information about the gromacs.org_gmx-users mailing list