[gmx-users] jwe1050i + jwe0019i errors = SIGSEGV (Fujitsu)

James jamesresearching at gmail.com
Thu Oct 10 14:34:46 CEST 2013


Dear Mark,

Thanks again for your response.

Many of the regression tests seem to have passed:

All 16 simple tests PASSED
All 19 complex tests PASSED
All 142 kernel tests PASSED
All 9 freeenergy tests PASSED
All 0 extra tests PASSED
Error not all 42 pdb2gmx tests have been done successfully
Only 0 energies in the log file
pdb2gmx tests FAILED

I'm not sure why pdb2gmx failed but I suppose it will not impact the
crashing I'm experiencing.

Regarding the stack trace showing line numbers, what is the best way to go
about this, in this context? I'm not really experienced in that aspect.

Thanks again for your help!

Best regards,

James


On 21 September 2013 23:12, Mark Abraham <mark.j.abraham at gmail.com> wrote:

> On Sat, Sep 21, 2013 at 2:45 PM, James <jamesresearching at gmail.com> wrote:
> > Dear Mark and the rest of the Gromacs team,
> >
> > Thanks a lot for your response. I have been trying to isolate the problem
> > and have also been in discussion with the support staff. They suggested
> it
> > may be a bug in the gromacs code, and I have tried to isolate the problem
> > more precisely.
>
> First, do the GROMACS regression tests for Verlet kernels pass? (Run
> them all, but those with nbnxn prefix are of interest here.) They
> likely won't scale to 16 OMP threads, but you can vary OMP_NUM_THREADS
> environment variable to see what you can see.
>
> > Considering that the calculation is run under MPI with 16 OpenMP cores
> per
> > MPI node, the error seems to occur under the following conditions:
> >
> > A few thousand atoms: 1 or 2 MPI nodes: OK
> > Double the number of atoms (~15,000): 1 MPI node: OK, 2 MPI nodes:
> SIGSEGV
> > error described below.
> >
> > So it seems that the error occurs for relatively large systems which use
> > MPI.
>
> ~500 atoms per core (thread) is a system in the normal GROMACS scaling
> regime. 16 OMP threads is more than is useful on other HPC systems,
> but since we don't know what your hardware is, whether you are
> investigating something useful is your decision.
>
> > The crash mentions the "calc_cell_indices" function (see below). Is this
> > somehow a problem with memory not being sufficient at the MPI interface
> at
> > this function? I'm not sure how to proceed further. Any help would be
> > greatly appreciated.
>
> If there is a problem with GROMACS (which so far I doubt), we'd need a
> stack trace that shows a line number (rather than addresses) in order
> to start to locate it.
>
> Mark
>
> > Gromacs version is 4.6.3.
> >
> > Thank you very much for your time.
> >
> > James
> >
> >
> > On 4 September 2013 16:05, Mark Abraham <mark.j.abraham at gmail.com>
> wrote:
> >
> >> On Sep 4, 2013 7:59 AM, "James" <jamesresearching at gmail.com> wrote:
> >> >
> >> > Dear all,
> >> >
> >> > I'm trying to run Gromacs on a Fujitsu supercomputer but the software
> is
> >> > crashing.
> >> >
> >> > I run grompp:
> >> >
> >> > grompp_mpi_d -f parameters.mdp -c system.pdb -p overthe.top
> >> >
> >> > and it produces the error:
> >> >
> >> > jwe1050i-w The hardware barrier couldn't be used and continues
> processing
> >> > using the software barrier.
> >> > taken to (standard) corrective action, execution continuing.
> >> > error summary (Fortran)
> >> > error number error level error count
> >> > jwe1050i w 1
> >> > total error count = 1
> >> >
> >> > but still outputs topol.tpr so I can continue.
> >>
> >> There's no value in compiling grompp with MPI or in double precision.
> >>
> >> > I then run with
> >> >
> >> > export FLIB_FASTOMP=FALSE
> >> > source /home/username/Gromacs463/bin/GMXRC.bash
> >> > mpiexec mdrun_mpi_d -ntomp 16 -v
> >> >
> >> > but it crashes:
> >> >
> >> > starting mdrun 'testrun'
> >> > 50000 steps, 100.0 ps.
> >> > jwe0019i-u The program was terminated abnormally with signal number
> >> SIGSEGV.
> >> > signal identifier = SEGV_MAPERR, address not mapped to object
> >> > error occurs at calc_cell_indices._OMP_1 loc 0000000000233474 offset
> >> > 00000000000003b4
> >> > calc_cell_indices._OMP_1 at loc 00000000002330c0 called from loc
> >> > ffffffff02088fa0 in start_thread
> >> > start_thread at loc ffffffff02088e4c called from loc ffffffff029d19b4
> in
> >> > __thread_start
> >> > __thread_start at loc ffffffff029d1988 called from o.s.
> >> > error summary (Fortran)
> >> > error number error level error count
> >> > jwe0019i u 1
> >> > jwe1050i w 1
> >> > total error count = 2
> >> > [ERR.] PLE 0014 plexec The process terminated
> >> >
> >>
> >>
> abnormally.(rank=1)(nid=0x03060006)(exitstatus=240)(CODE=2002,1966080,61440)
> >> > [ERR.] PLE The program that the user specified may be illegal or
> >> > inaccessible on the node.(nid=0x03060006)
> >> >
> >> > Any ideas what could be wrong? It works on my local intel machine.
> >>
> >> Looks like it wasn't compiled correctly for the target machine. What was
> >> the cmake command, what does mdrun -version output? Also, if this is
> the K
> >> computer, probably we can't help, because the compiler docs are
> officially
> >> unavailable to us. National secret, and all ;-)
> >>
> >> Mark
> >>
> >> >
> >> > Thanks in advance,
> >> >
> >> > James
> >> > --
> >> > gmx-users mailing list    gmx-users at gromacs.org
> >> > http://lists.gromacs.org/mailman/listinfo/gmx-users
> >> > * Please search the archive at
> >> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> >> > * Please don't post (un)subscribe requests to the list. Use the
> >> > www interface or send it to gmx-users-request at gromacs.org.
> >> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >> --
> >> gmx-users mailing list    gmx-users at gromacs.org
> >> http://lists.gromacs.org/mailman/listinfo/gmx-users
> >> * Please search the archive at
> >> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> >> * Please don't post (un)subscribe requests to the list. Use the
> >> www interface or send it to gmx-users-request at gromacs.org.
> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>
> > --
> > gmx-users mailing list    gmx-users at gromacs.org
> > http://lists.gromacs.org/mailman/listinfo/gmx-users
> > * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> > * Please don't post (un)subscribe requests to the list. Use the
> > www interface or send it to gmx-users-request at gromacs.org.
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> --
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>



More information about the gromacs.org_gmx-users mailing list