[gmx-users] Re: Segmentation fault, mdrun_mpi
jalemkul at vt.edu
Wed Oct 10 19:38:22 CEST 2012
On 10/10/12 1:33 PM, Ladasky wrote:
> Ladasky wrote
>> Justin Lemkul wrote
>>> Random segmentation faults are really hard to debug. Can you resume the
>>> using a checkpoint file? That would suggest maybe an MPI problem or
>>> else external to Gromacs. Without a reproducible system and a debugging
>>> backtrace, it's going to be hard to figure out where the problem is
>>> coming from.
>> Thanks for that tip, Justin. I tried to resume one run which failed at
>> 1.06 million cycles, and it WORKED. It proceeded all the way to the 2.50
>> million cycles that I designated. I now have two separate .trr files, but
>> I suppose they can be merged.
>> I don't know whether my crashes are random yet. I will try re-running
>> that simulation again from time zero, to see whether it segfaults at the
>> same place. If it doesn't, then I have a problem which may have nothing
>> to do with GROMACS.
> I just tried exactly that, a re-run of the same structure. This time, it
> ran without stopping, from time zero to 2.50 million cycles! No crash at
> 1.06 million cycles this time.
> Unless GROMACS is using some random number generator which affects the
> outcome of repeated simulations (and I think that the only time that random
> number generation would be needed would be when initial velocities are
> generated, which was done during the earlier equilibration step), I will
> conclude that my simulation conditions are indeed acceptable, and that
> sometimes the software just behaves badly.
There are plenty of things that can differ between runs (unless you've turned
off optimizations and are using the -reprod option), but in all practical sense,
they should not lead to random seg faults.
> Is that a common occurrence?
Based on the fact that very few people post seg fault problems that are not
precipitated by actual crashes (i.e. LINCS warnings), I would say no. There is
no evidence yet to suggest what the real problem is, but until such time,
Gromacs is innocent until proven guilty ;)
> I could write a script which just automatically restarts my simulations
> provided that they 1) ran for a decent number of cycles and b) exited with a
> segmentation fault error. I could then have the script check in after a few
> minutes to make sure that they haven't crashed again, and soldier on.
That's an option. If you're running in a queue system, there may be
notification options if something goes wrong, as well.
Justin A. Lemkul, Ph.D.
Department of Biochemistry
jalemkul[at]vt.edu | (540) 231-9080
More information about the gromacs.org_gmx-users