[gmx-developers] segfault goes away when restarting from cpt

Aleksei Iupinov aleksei.iupinov at scilifelab.se
Fri May 5 13:13:17 CEST 2017


Hello Michael,

You are welcome to register and file the bug at the
https://redmine.gromacs.org/ issue tracker.
There you can attach the input file and the logs as well (so that we know
the exact Gromacs version, etc).

Best regards,
Aleksei

On Fri, May 5, 2017 at 10:23 AM, Michael Brunsteiner <mbx0009 at yahoo.com>
wrote:

>
> hi,
>
> I post this here as it might be developers rather than a user issue ...
> I ran an NPT sim with simulated annealing of an amorphous solid sample with
> some organic molecules,as in:
>
> gmx grompp -f md-1bar-353-253.mdp -p sis3-7-simp.top -c up-nr2-3.gro -o
> do-nr2-3.tpr
> nohup gmx mdrun -v -deffnm do-nr2-3 > er 2>&1 &
>
> after around 30 nano secs the simulation stops without further notice.
> neither in the log-file nor in stdout or stderr there are any indicators
> of what happened
> but when i look into the relevant syslog file i find:
>
> May  5 03:24:38 rcpe-sbd-node03 kernel: [82541302.295784] gmx[2218]:
> segfault at ffffffff9d3ebea0 ip 00007f5b708be3a1 sp 00007f5b657f9dc0 error
> 7 in libgromacs.so.2.3.0[7f5b706d9000+1d1e000]
>
> when i restart the sim on the same computer and from the last cpt file, as
> in:
>
> nohup gmx mdrun -v -deffnm do-nr2-3 -cpi do-nr2-3.cpt -noappend > er 2>&1 &
>
> the sim happily continues beyond the point where it previously seg-faulted
> without any further issues ...
>
> tpr file is too large to attach (if anybody's interested i can upload it
> somewhere)
> below i put the last 30 or so lines of both stderr+stdout and the log-file
> I believe the warning at the end of stderr is harmless, but even if it
> actually is the reason
> for the segfault this still does not explain why nothing is written to
> stderr when it happens
> and why the sim works when restarted from the cpt file ... can it be that
> this is a hardware issue??
>
> regards,
> Michael
>
>
> stderr+stdout:
> [..]
>     Brand:  Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz
>     SIMD instructions most likely to fit this hardware: AVX_256
>     SIMD instructions selected at GROMACS compile time: AVX_256
>
>   Hardware topology: Full, with devices
>   GPU info:
>     Number of GPUs detected: 1
>     #0: NVIDIA GeForce GTX 780, compute cap.: 3.5, ECC:  no, stat:
> compatible
>
> Reading file do-nr2-3.tpr, VERSION 2016.3 (single precision)
> Changing nstlist from 20 to 40, rlist from 1.2 to 1.2
>
> Using 1 MPI thread
> Using 12 OpenMP threads
>
> 1 compatible GPU is present, with ID 0
> 1 GPU auto-selected for this run.
> Mapping of GPU ID to the 1 PP rank in this node: 0
>
> starting mdrun 'system'
> 110000000 steps, 110000.0 ps.
> step   80: timed with pme grid 40 40 24, coulomb cutoff 1.200: 81.2
> M-cycles
> step   80: the box size limits the PME load balancing to a coulomb cut-off
> of 1.368
> step  160: timed with pme grid 32 36 24, coulomb cutoff 1.368: 72.9
> M-cycles
> step  240: timed with pme grid 36 36 24, coulomb cutoff 1.264: 75.7
> M-cycles
> step  320: timed with pme grid 36 40 24, coulomb cutoff 1.216: 78.6
> M-cycles
> step  400: timed with pme grid 40 40 24, coulomb cutoff 1.200: 81.2
> M-cycles
>               optimal pme grid 32 36 24, coulomb cutoff 1.368
> step 31031000, will finish Fri May  5 14:27:08 2017
> Step 31031061  Warning: pressure scaling more than 1%, mu: 0.999153
> 0.982333 0.997814
>
>
>
> log-file:
> [..]
>            Step           Time
>        31030000    31030.00000
>
> Current ref_t for group System:    327.9
>    Energies (kJ/mol)
>            Bond          Angle    Proper Dih.  Improper Dih.          LJ-14
>     8.65826e+03    1.38650e+04    1.08781e+04    4.26101e+02    6.01339e+03
>      Coulomb-14        LJ (SR)   Coulomb (SR)   Coul. recip.      Potential
>    -2.98635e+04   -1.12638e+04    1.63267e+04    2.11350e+02    1.52514e+04
>     Kinetic En.   Total Energy    Temperature Pressure (bar)
>     2.22541e+04    3.75055e+04    3.28069e+02    6.14802e+02
>
>            Step           Time
>        31031000    31031.00000
>
> Current ref_t for group System:    327.8
>    Energies (kJ/mol)
>            Bond          Angle    Proper Dih.  Improper Dih.          LJ-14
>     8.41290e+03    1.38660e+04    1.08950e+04    3.51583e+02    5.79937e+03
>      Coulomb-14        LJ (SR)   Coulomb (SR)   Coul. recip.      Potential
>    -2.99386e+04   -1.15255e+04    1.64549e+04    2.30994e+02    1.45468e+04
>     Kinetic En.   Total Energy    Temperature Pressure (bar)
>     2.27307e+04    3.72775e+04    3.35095e+02   -1.54620e+00
>
>
>
>
>
>
>
>
> ------------------------------
> --
> Gromacs Developers mailing list
>
> * Please search the archive at http://www.gromacs.org/
> Support/Mailing_Lists/GMX-developers_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers or
> send a mail to gmx-developers-request at gromacs.org.
>
>
> --
> Gromacs Developers mailing list
>
> * Please search the archive at http://www.gromacs.org/
> Support/Mailing_Lists/GMX-developers_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
> or send a mail to gmx-developers-request at gromacs.org.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20170505/63546cd2/attachment.html>


More information about the gromacs.org_gmx-developers mailing list