[gmx-users] GPU job often stopped
szilard.pall at cbr.su.se
Mon Apr 29 15:31:01 CEST 2013
On Mon, Apr 29, 2013 at 2:41 PM, Albert <mailmd2011 at gmail.com> wrote:
> On 04/28/2013 05:45 PM, Justin Lemkul wrote:
>> Frequent failures suggest instability in the simulated system. Check your
>> .log file or stderr for informative Gromacs diagnostic information.
> my log file didn't have any errors, the end of topped log file something
> DD step 22599999 vol min/aver 0.967 load imb.: force 0.8%
> Step Time Lambda
> 22600000 45200.00000 0.00000
> Energies (kJ/mol)
> Angle U-B Proper Dih. Improper Dih. LJ-14
> 9.86437e+03 4.02406e+04 3.52809e+04 6.13542e+02 8.61815e+03
> Coulomb-14 LJ (SR) Disper. corr. Coulomb (SR) Coul. recip.
> 1.25055e+04 3.05477e+04 -9.05956e+03 -6.02400e+05 1.58357e+03
> Position Rest. Potential Kinetic En. Total Energy Temperature
> 1.39149e+02 -4.72066e+05 1.37165e+05 -3.34901e+05 3.11958e+02
> Pres. DC (bar) Pressure (bar) Constr. rmsd
> -2.94092e+02 -7.91535e+01 1.79812e-05
> also in the information file I only obtained information:
> step 13300, will finish Tue Apr 30 14:41
> NOTE: Turning on dynamic load balancing
> Probably the machine was restarted from time to time?
The segv indicates that mdrun crashed and not that the machine was
restarted. The GPU detection output (both on stderr and log) should
show whether ECC is "on" (and so does the nvidia-smi tool).
> gmx-users mailing list gmx-users at gromacs.org
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the www
> interface or send it to gmx-users-request at gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
More information about the gromacs.org_gmx-users