[gmx-users] jwe1050i + jwe0019i errors = SIGSEGV (Fujitsu)

Mark Abraham mark.j.abraham at gmail.com
Sat Sep 21 16:12:43 CEST 2013


On Sat, Sep 21, 2013 at 2:45 PM, James <jamesresearching at gmail.com> wrote:
> Dear Mark and the rest of the Gromacs team,
>
> Thanks a lot for your response. I have been trying to isolate the problem
> and have also been in discussion with the support staff. They suggested it
> may be a bug in the gromacs code, and I have tried to isolate the problem
> more precisely.

First, do the GROMACS regression tests for Verlet kernels pass? (Run
them all, but those with nbnxn prefix are of interest here.) They
likely won't scale to 16 OMP threads, but you can vary OMP_NUM_THREADS
environment variable to see what you can see.

> Considering that the calculation is run under MPI with 16 OpenMP cores per
> MPI node, the error seems to occur under the following conditions:
>
> A few thousand atoms: 1 or 2 MPI nodes: OK
> Double the number of atoms (~15,000): 1 MPI node: OK, 2 MPI nodes: SIGSEGV
> error described below.
>
> So it seems that the error occurs for relatively large systems which use
> MPI.

~500 atoms per core (thread) is a system in the normal GROMACS scaling
regime. 16 OMP threads is more than is useful on other HPC systems,
but since we don't know what your hardware is, whether you are
investigating something useful is your decision.

> The crash mentions the "calc_cell_indices" function (see below). Is this
> somehow a problem with memory not being sufficient at the MPI interface at
> this function? I'm not sure how to proceed further. Any help would be
> greatly appreciated.

If there is a problem with GROMACS (which so far I doubt), we'd need a
stack trace that shows a line number (rather than addresses) in order
to start to locate it.

Mark

> Gromacs version is 4.6.3.
>
> Thank you very much for your time.
>
> James
>
>
> On 4 September 2013 16:05, Mark Abraham <mark.j.abraham at gmail.com> wrote:
>
>> On Sep 4, 2013 7:59 AM, "James" <jamesresearching at gmail.com> wrote:
>> >
>> > Dear all,
>> >
>> > I'm trying to run Gromacs on a Fujitsu supercomputer but the software is
>> > crashing.
>> >
>> > I run grompp:
>> >
>> > grompp_mpi_d -f parameters.mdp -c system.pdb -p overthe.top
>> >
>> > and it produces the error:
>> >
>> > jwe1050i-w The hardware barrier couldn't be used and continues processing
>> > using the software barrier.
>> > taken to (standard) corrective action, execution continuing.
>> > error summary (Fortran)
>> > error number error level error count
>> > jwe1050i w 1
>> > total error count = 1
>> >
>> > but still outputs topol.tpr so I can continue.
>>
>> There's no value in compiling grompp with MPI or in double precision.
>>
>> > I then run with
>> >
>> > export FLIB_FASTOMP=FALSE
>> > source /home/username/Gromacs463/bin/GMXRC.bash
>> > mpiexec mdrun_mpi_d -ntomp 16 -v
>> >
>> > but it crashes:
>> >
>> > starting mdrun 'testrun'
>> > 50000 steps, 100.0 ps.
>> > jwe0019i-u The program was terminated abnormally with signal number
>> SIGSEGV.
>> > signal identifier = SEGV_MAPERR, address not mapped to object
>> > error occurs at calc_cell_indices._OMP_1 loc 0000000000233474 offset
>> > 00000000000003b4
>> > calc_cell_indices._OMP_1 at loc 00000000002330c0 called from loc
>> > ffffffff02088fa0 in start_thread
>> > start_thread at loc ffffffff02088e4c called from loc ffffffff029d19b4 in
>> > __thread_start
>> > __thread_start at loc ffffffff029d1988 called from o.s.
>> > error summary (Fortran)
>> > error number error level error count
>> > jwe0019i u 1
>> > jwe1050i w 1
>> > total error count = 2
>> > [ERR.] PLE 0014 plexec The process terminated
>> >
>>
>> abnormally.(rank=1)(nid=0x03060006)(exitstatus=240)(CODE=2002,1966080,61440)
>> > [ERR.] PLE The program that the user specified may be illegal or
>> > inaccessible on the node.(nid=0x03060006)
>> >
>> > Any ideas what could be wrong? It works on my local intel machine.
>>
>> Looks like it wasn't compiled correctly for the target machine. What was
>> the cmake command, what does mdrun -version output? Also, if this is the K
>> computer, probably we can't help, because the compiler docs are officially
>> unavailable to us. National secret, and all ;-)
>>
>> Mark
>>
>> >
>> > Thanks in advance,
>> >
>> > James
>> > --
>> > gmx-users mailing list    gmx-users at gromacs.org
>> > http://lists.gromacs.org/mailman/listinfo/gmx-users
>> > * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
>> > * Please don't post (un)subscribe requests to the list. Use the
>> > www interface or send it to gmx-users-request at gromacs.org.
>> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> --
>> gmx-users mailing list    gmx-users at gromacs.org
>> http://lists.gromacs.org/mailman/listinfo/gmx-users
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
>> * Please don't post (un)subscribe requests to the list. Use the
>> www interface or send it to gmx-users-request at gromacs.org.
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
> --
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists



More information about the gromacs.org_gmx-users mailing list