[gmx-users] jwe1050i + jwe0019i errors = SIGSEGV (Fujitsu)

Mark Abraham mark.j.abraham at gmail.com
Tue Oct 15 12:42:59 CEST 2013


On Thu, Oct 10, 2013 at 2:34 PM, James <jamesresearching at gmail.com> wrote:

> Dear Mark,
>
> Thanks again for your response.
>
> Many of the regression tests seem to have passed:
>
> All 16 simple tests PASSED
> All 19 complex tests PASSED
> All 142 kernel tests PASSED
> All 9 freeenergy tests PASSED
> All 0 extra tests PASSED
> Error not all 42 pdb2gmx tests have been done successfully
> Only 0 energies in the log file
> pdb2gmx tests FAILED
>
> I'm not sure why pdb2gmx failed but I suppose it will not impact the
> crashing I'm experiencing.
>

No, that's fine. Probably they don't have sufficiently explicit guards to
stop people running the energy minimization with a more-than-useful number
of OpenMP threads.


> Regarding the stack trace showing line numbers, what is the best way to go
> about this, in this context? I'm not really experienced in that aspect.
>

That's a matter of compiling in debug mode (use cmake ..
-DCMAKE_BUILD_TYPE=Debug), and hopefully observing the same crash with an
error message that has more useful information. The debug mode annotates
the executable so that a finger can be pointed at the code line that caused
the segfault. Hopefully the compiler does this properly, but support for
this in OpenMP is a corner compiler writers might cut ;-) Depending on the
details, loading a core dump in a debugger can also be necessary, but your
local sysadmins are the people to talk to there.

Mark

Thanks again for your help!
>
> Best regards,
>
> James
>
>
> On 21 September 2013 23:12, Mark Abraham <mark.j.abraham at gmail.com> wrote:
>
> > On Sat, Sep 21, 2013 at 2:45 PM, James <jamesresearching at gmail.com>
> wrote:
> > > Dear Mark and the rest of the Gromacs team,
> > >
> > > Thanks a lot for your response. I have been trying to isolate the
> problem
> > > and have also been in discussion with the support staff. They suggested
> > it
> > > may be a bug in the gromacs code, and I have tried to isolate the
> problem
> > > more precisely.
> >
> > First, do the GROMACS regression tests for Verlet kernels pass? (Run
> > them all, but those with nbnxn prefix are of interest here.) They
> > likely won't scale to 16 OMP threads, but you can vary OMP_NUM_THREADS
> > environment variable to see what you can see.
> >
> > > Considering that the calculation is run under MPI with 16 OpenMP cores
> > per
> > > MPI node, the error seems to occur under the following conditions:
> > >
> > > A few thousand atoms: 1 or 2 MPI nodes: OK
> > > Double the number of atoms (~15,000): 1 MPI node: OK, 2 MPI nodes:
> > SIGSEGV
> > > error described below.
> > >
> > > So it seems that the error occurs for relatively large systems which
> use
> > > MPI.
> >
> > ~500 atoms per core (thread) is a system in the normal GROMACS scaling
> > regime. 16 OMP threads is more than is useful on other HPC systems,
> > but since we don't know what your hardware is, whether you are
> > investigating something useful is your decision.
> >
> > > The crash mentions the "calc_cell_indices" function (see below). Is
> this
> > > somehow a problem with memory not being sufficient at the MPI interface
> > at
> > > this function? I'm not sure how to proceed further. Any help would be
> > > greatly appreciated.
> >
> > If there is a problem with GROMACS (which so far I doubt), we'd need a
> > stack trace that shows a line number (rather than addresses) in order
> > to start to locate it.
> >
> > Mark
> >
> > > Gromacs version is 4.6.3.
> > >
> > > Thank you very much for your time.
> > >
> > > James
> > >
> > >
> > > On 4 September 2013 16:05, Mark Abraham <mark.j.abraham at gmail.com>
> > wrote:
> > >
> > >> On Sep 4, 2013 7:59 AM, "James" <jamesresearching at gmail.com> wrote:
> > >> >
> > >> > Dear all,
> > >> >
> > >> > I'm trying to run Gromacs on a Fujitsu supercomputer but the
> software
> > is
> > >> > crashing.
> > >> >
> > >> > I run grompp:
> > >> >
> > >> > grompp_mpi_d -f parameters.mdp -c system.pdb -p overthe.top
> > >> >
> > >> > and it produces the error:
> > >> >
> > >> > jwe1050i-w The hardware barrier couldn't be used and continues
> > processing
> > >> > using the software barrier.
> > >> > taken to (standard) corrective action, execution continuing.
> > >> > error summary (Fortran)
> > >> > error number error level error count
> > >> > jwe1050i w 1
> > >> > total error count = 1
> > >> >
> > >> > but still outputs topol.tpr so I can continue.
> > >>
> > >> There's no value in compiling grompp with MPI or in double precision.
> > >>
> > >> > I then run with
> > >> >
> > >> > export FLIB_FASTOMP=FALSE
> > >> > source /home/username/Gromacs463/bin/GMXRC.bash
> > >> > mpiexec mdrun_mpi_d -ntomp 16 -v
> > >> >
> > >> > but it crashes:
> > >> >
> > >> > starting mdrun 'testrun'
> > >> > 50000 steps, 100.0 ps.
> > >> > jwe0019i-u The program was terminated abnormally with signal number
> > >> SIGSEGV.
> > >> > signal identifier = SEGV_MAPERR, address not mapped to object
> > >> > error occurs at calc_cell_indices._OMP_1 loc 0000000000233474 offset
> > >> > 00000000000003b4
> > >> > calc_cell_indices._OMP_1 at loc 00000000002330c0 called from loc
> > >> > ffffffff02088fa0 in start_thread
> > >> > start_thread at loc ffffffff02088e4c called from loc
> ffffffff029d19b4
> > in
> > >> > __thread_start
> > >> > __thread_start at loc ffffffff029d1988 called from o.s.
> > >> > error summary (Fortran)
> > >> > error number error level error count
> > >> > jwe0019i u 1
> > >> > jwe1050i w 1
> > >> > total error count = 2
> > >> > [ERR.] PLE 0014 plexec The process terminated
> > >> >
> > >>
> > >>
> >
> abnormally.(rank=1)(nid=0x03060006)(exitstatus=240)(CODE=2002,1966080,61440)
> > >> > [ERR.] PLE The program that the user specified may be illegal or
> > >> > inaccessible on the node.(nid=0x03060006)
> > >> >
> > >> > Any ideas what could be wrong? It works on my local intel machine.
> > >>
> > >> Looks like it wasn't compiled correctly for the target machine. What
> was
> > >> the cmake command, what does mdrun -version output? Also, if this is
> > the K
> > >> computer, probably we can't help, because the compiler docs are
> > officially
> > >> unavailable to us. National secret, and all ;-)
> > >>
> > >> Mark
> > >>
> > >> >
> > >> > Thanks in advance,
> > >> >
> > >> > James
> > >> > --
> > >> > gmx-users mailing list    gmx-users at gromacs.org
> > >> > http://lists.gromacs.org/mailman/listinfo/gmx-users
> > >> > * Please search the archive at
> > >> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> > >> > * Please don't post (un)subscribe requests to the list. Use the
> > >> > www interface or send it to gmx-users-request at gromacs.org.
> > >> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >> --
> > >> gmx-users mailing list    gmx-users at gromacs.org
> > >> http://lists.gromacs.org/mailman/listinfo/gmx-users
> > >> * Please search the archive at
> > >> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> > >> * Please don't post (un)subscribe requests to the list. Use the
> > >> www interface or send it to gmx-users-request at gromacs.org.
> > >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >>
> > > --
> > > gmx-users mailing list    gmx-users at gromacs.org
> > > http://lists.gromacs.org/mailman/listinfo/gmx-users
> > > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> > > * Please don't post (un)subscribe requests to the list. Use the
> > > www interface or send it to gmx-users-request at gromacs.org.
> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > --
> > gmx-users mailing list    gmx-users at gromacs.org
> > http://lists.gromacs.org/mailman/listinfo/gmx-users
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> > * Please don't post (un)subscribe requests to the list. Use the
> > www interface or send it to gmx-users-request at gromacs.org.
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> --
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>



More information about the gromacs.org_gmx-users mailing list