[gmx-users] Gromacs 4.6 segmentation fault with mdrun

Thu Nov 15 23:13:45 CET 2012

Hi Szilárd,

I assume I get the same segmentation fault error as Sebastian (don't
shoot if not so). I have 2 NVIDA GTX580 cards (and 4x12-core amd64
opteron 6174).

in brief : 
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffc07f8700 (LWP 32035)]
0x00007ffff61de301 in nbnxn_make_pairlist.omp_fn.2 ()
from /usr/local/gromacs/bin/../lib/libmd.so.6

Also -nb cpu with Verlet cutoff-scheme results in this error...

gcc 4.4.5 (Debian 4.4.5-8), Linux kernel 3.1.1
CMake 2.8.7

If I attach the mdrun.debug output file to this mail, the mail to the
list gets bounced by the mailserver (because mdrun.debug > 50 Kb).

Hoping this might help,

regards,

raf
===========
compiled code :
commit 20da7188b18722adcd53088ec30e5f256af62f20
Author: Szilard Pall <pszilard at cbr.su.se>
Date:   Tue Oct 2 00:29:33 2012 +0200

===========
(gdb) exec mdrun
(gdb) run -debug 1 -v -s test.tpr 

Reading file test.tpr, VERSION 4.6-dev-20121002-20da718 (single
precision)
[New Thread 0x7ffff3844700 (LWP 31986)]
[Thread 0x7ffff3844700 (LWP 31986) exited]
[New Thread 0x7ffff3844700 (LWP 31987)]
[Thread 0x7ffff3844700 (LWP 31987) exited]
Changing nstlist from 10 to 50, rlist from 2 to 2.156

Starting 2 tMPI threads
[New Thread 0x7ffff3844700 (LWP 31992)]
Using 2 MPI threads
Using 24 OpenMP threads per tMPI thread

2 GPUs detected:
  #0: NVIDIA GeForce GTX 580, compute cap.: 2.0, ECC:  no, stat:
compatible
  #1: NVIDIA GeForce GTX 580, compute cap.: 2.0, ECC:  no, stat:
compatible

2 GPUs auto-selected to be used for this run: #0, #1

Back Off! I just backed up ctab14.xvg to ./#ctab14.xvg.1#
Initialized GPU ID #1: GeForce GTX 580
[New Thread 0x7ffff3043700 (LWP 31993)]

Back Off! I just backed up dtab14.xvg to ./#dtab14.xvg.1#

Back Off! I just backed up rtab14.xvg to ./#rtab14.xvg.1#
[New Thread 0x7ffff1b3c700 (LWP 31995)]
[New Thread 0x7ffff133b700 (LWP 31996)]
[New Thread 0x7ffff0b3a700 (LWP 31997)]
[New Thread 0x7fffebfff700 (LWP 31998)]
[New Thread 0x7fffeb7fe700 (LWP 31999)]
[New Thread 0x7fffeaffd700 (LWP 32000)]
[New Thread 0x7fffea7fc700 (LWP 32001)]
[New Thread 0x7fffe9ffb700 (LWP 32002)]
[New Thread 0x7fffe97fa700 (LWP 32003)]
[New Thread 0x7fffe8ff9700 (LWP 32004)]
[New Thread 0x7fffe87f8700 (LWP 32005)]
[New Thread 0x7fffe7ff7700 (LWP 32006)]
[New Thread 0x7fffe77f6700 (LWP 32007)]
[New Thread 0x7fffe6ff5700 (LWP 32008)]
[New Thread 0x7fffe67f4700 (LWP 32009)]
[New Thread 0x7fffe5ff3700 (LWP 32010)]
[New Thread 0x7fffe57f2700 (LWP 32011)]
[New Thread 0x7fffe4ff1700 (LWP 32012)]
[New Thread 0x7fffe47f0700 (LWP 32013)]
[New Thread 0x7fffe3fef700 (LWP 32014)]
[New Thread 0x7fffe37ee700 (LWP 32015)]
[New Thread 0x7fffe2fed700 (LWP 32016)]
[New Thread 0x7fffe27ec700 (LWP 32017)]
Initialized GPU ID #0: GeForce GTX 580
Using CUDA 8x8x8 non-bonded kernels
[New Thread 0x7fffe1feb700 (LWP 32018)]
[New Thread 0x7fffe0ae4700 (LWP 32019)]
[New Thread 0x7fffcbfff700 (LWP 32020)]
[New Thread 0x7fffcb7fe700 (LWP 32021)]
[New Thread 0x7fffcaffd700 (LWP 32022)]
[New Thread 0x7fffca7fc700 (LWP 32023)]
[New Thread 0x7fffc9ffb700 (LWP 32024)]
[New Thread 0x7fffc97fa700 (LWP 32025)]
[New Thread 0x7fffc8ff9700 (LWP 32026)]
[New Thread 0x7fffc3fff700 (LWP 32027)]
[New Thread 0x7fffc37fe700 (LWP 32028)]
[New Thread 0x7fffc2ffd700 (LWP 32029)]
[New Thread 0x7fffc27fc700 (LWP 32031)]
[New Thread 0x7fffc1ffb700 (LWP 32032)]
[New Thread 0x7fffc17fa700 (LWP 32033)]
[New Thread 0x7fffc0ff9700 (LWP 32034)]
[New Thread 0x7fffc07f8700 (LWP 32035)]
[New Thread 0x7fffbfff7700 (LWP 32036)]
[New Thread 0x7fffbf7f6700 (LWP 32037)]
[New Thread 0x7fffbeff5700 (LWP 32038)]
[New Thread 0x7fffbe7f4700 (LWP 32039)]
[New Thread 0x7fffbdff3700 (LWP 32040)]
[New Thread 0x7fffbd7f2700 (LWP 32042)]
[New Thread 0x7fffbcff1700 (LWP 32043)]
Making 1D domain decomposition 2 x 1 x 1

* WARNING * WARNING * WARNING * WARNING * WARNING * WARNING *
We have just committed the new CPU detection code in this branch,
and will commit new SSE/AVX kernels in a few days. However, this
means that currently only the NxN kernels are accelerated!
In the mean time, you might want to avoid production runs in 4.6.

Back Off! I just backed up traj.trr to ./#traj.trr.1#

Back Off! I just backed up traj.xtc to ./#traj.xtc.1#

Back Off! I just backed up ener.edr to ./#ener.edr.1#
starting mdrun 'Protein in water'
100000 steps,    200.0 ps.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffc07f8700 (LWP 32035)]
0x00007ffff61de301 in nbnxn_make_pairlist.omp_fn.2 ()
from /usr/local/gromacs/bin/../lib/libmd.so.6
(gdb) 

============================================
Verlet, nb by cpu only:

(gdb) run -debug 1 -nb cpu -v -s test.tpr

Reading file test.tpr, VERSION 4.6-dev-20121002-20da718 (single
precision)
[New Thread 0x7ffff3844700 (LWP 32050)]
[Thread 0x7ffff3844700 (LWP 32050) exited]
[New Thread 0x7ffff3844700 (LWP 32051)]
[Thread 0x7ffff3844700 (LWP 32051) exited]
Starting 48 tMPI threads
[New Thread 0x7ffff3844700 (LWP 32058)]
[New Thread 0x7ffff3043700 (LWP 32059)]
[New Thread 0x7ffff2842700 (LWP 32060)]
[New Thread 0x7ffff2041700 (LWP 32061)]
[New Thread 0x7ffff1840700 (LWP 32062)]
[New Thread 0x7ffff103f700 (LWP 32063)]
[New Thread 0x7ffff083e700 (LWP 32064)]
[New Thread 0x7fffe3fff700 (LWP 32065)]
[New Thread 0x7fffe37fe700 (LWP 32066)]
[New Thread 0x7fffe2ffd700 (LWP 32067)]
[New Thread 0x7fffe27fc700 (LWP 32068)]
[New Thread 0x7fffe1ffb700 (LWP 32069)]
[New Thread 0x7fffe17fa700 (LWP 32070)]
[New Thread 0x7fffe0ff9700 (LWP 32071)]
[New Thread 0x7fffdbfff700 (LWP 32072)]
[New Thread 0x7fffdb7fe700 (LWP 32073)]
[New Thread 0x7fffdaffd700 (LWP 32074)]
[New Thread 0x7fffda7fc700 (LWP 32075)]
[New Thread 0x7fffd9ffb700 (LWP 32076)]
[New Thread 0x7fffd97fa700 (LWP 32077)]
[New Thread 0x7fffd8ff9700 (LWP 32078)]
[New Thread 0x7fffd3fff700 (LWP 32079)]
[New Thread 0x7fffd37fe700 (LWP 32080)]
[New Thread 0x7fffd2ffd700 (LWP 32081)]
[New Thread 0x7fffd27fc700 (LWP 32082)]
[New Thread 0x7fffd1ffb700 (LWP 32083)]
[New Thread 0x7fffd17fa700 (LWP 32084)]
[New Thread 0x7fffd0ff9700 (LWP 32085)]
[New Thread 0x7fffd07f8700 (LWP 32086)]
[New Thread 0x7fffcfff7700 (LWP 32087)]
[New Thread 0x7fffcf7f6700 (LWP 32088)]
[New Thread 0x7fffceff5700 (LWP 32089)]
[New Thread 0x7fffce7f4700 (LWP 32090)]
[New Thread 0x7fffcdff3700 (LWP 32091)]
[New Thread 0x7fffcd7f2700 (LWP 32092)]
[New Thread 0x7fffccff1700 (LWP 32093)]
[New Thread 0x7fffcc7f0700 (LWP 32094)]
[New Thread 0x7fffcbfef700 (LWP 32095)]
[New Thread 0x7fffcb7ee700 (LWP 32096)]
[New Thread 0x7fffcafed700 (LWP 32097)]
[New Thread 0x7fffca7ec700 (LWP 32098)]
[New Thread 0x7fffc9feb700 (LWP 32099)]
[New Thread 0x7fffc97ea700 (LWP 32100)]
[New Thread 0x7fffc8fe9700 (LWP 32101)]
[New Thread 0x7fffc87e8700 (LWP 32102)]
[New Thread 0x7fffc7fe7700 (LWP 32103)]
[New Thread 0x7fffc77e6700 (LWP 32104)]

Will use 45 particle-particle and 3 PME only nodes
This is a guess, check the performance at the end of the log file
Using 48 MPI threads
Using 1 OpenMP thread per tMPI thread

2 GPUs detected:
  #0: NVIDIA GeForce GTX 580, compute cap.: 2.0, ECC:  no, stat:
compatible
  #1: NVIDIA GeForce GTX 580, compute cap.: 2.0, ECC:  no, stat:
compatible

Back Off! I just backed up ctab14.xvg to ./#ctab14.xvg.2#

Back Off! I just backed up dtab14.xvg to ./#dtab14.xvg.2#

Back Off! I just backed up rtab14.xvg to ./#rtab14.xvg.2#
Using SSE2 4x4 non-bonded kernels
Making 3D domain decomposition 3 x 5 x 3

* WARNING * WARNING * WARNING * WARNING * WARNING * WARNING *
We have just committed the new CPU detection code in this branch,
and will commit new SSE/AVX kernels in a few days. However, this
means that currently only the NxN kernels are accelerated!
In the mean time, you might want to avoid production runs in 4.6.

Back Off! I just backed up traj.trr to ./#traj.trr.2#

Back Off! I just backed up traj.xtc to ./#traj.xtc.2#

Back Off! I just backed up ener.edr to ./#ener.edr.2#
starting mdrun 'Protein in water'
100000 steps,    200.0 ps.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffcd7f2700 (LWP 32092)]
0x00007ffff61db499 in nbnxn_make_pairlist.omp_fn.2 ()
from /usr/local/gromacs/bin/../lib/libmd.so.6
(gdb) 
=============================================

On Mon, 2012-11-12 at 19:37 +0100, Szilárd Páll wrote:
> Hi Sebastian,
> 
> That is very likely a bug so I'd appreciate if you could provide a bit
more
> information, like:
> 
> - OS, compiler
> 
> - results of runs with the following configurations:
>   - "mdrun -nb cpu" (to run CPU-only with Verlet scheme)
>  - "GMX_EMULATE_GPU=1 mdrun -nb gpu" (to run GPU emulation using plain
C
> kernels);
>   - "mdrun" without any arguments (which will use 2x(n/2 cores + 1
GPU))
>   - "mdrun -ntmpi 1" without any other arguments (which will use n
cores +
> the first GPU)
> 
> - please attach the log files of all failed and a successful run as
well as
> the mdrun.debug file from a failed runs that you can obtain with
"mdrun
> -debug 1"
> 
> Note that a backtrace would be very useful and if you can get one I'd
> be grateful, but for now the above should be minimum effort and I'll
> provide simple introductions to get a backtrace later (if needed).
> 
> Thanks,
> 
> --
> Szilárd
> 
> 
> On Mon, Nov 12, 2012 at 6:22 PM, sebastian <
> sebastian.waltz at physik.uni-freiburg.de> wrote:
> 
> > On 11/12/2012 04:12 PM, sebastian wrote:
> > > Dear GROMACS user,
> > >
> > > I am running in major problems trying to use gromacs 4.6 on my
desktop
> > > with two GTX 670 GPU's and one i7 cpu. On the system I installed
the
> > > CUDA 4.2, running fine for many different test programs.
> > > Compiling the git version of gromacs 4.6 with hybrid acceleration
I get
> > > one error message of a missing libxml2 but it compiles with no
further
> > > complaints. The tools I tested (like g_rdf or grompp usw.) work
fine as
> > > long as I generate the tpr files with the right gromacs version.
> > > Now, if I try to use mdrun (GMX_GPU_ID=1 mdrun -nt 1 -v
-deffnm ....)
> > > the preparation seems to work fine until it starts the actual run.
It
> > > stops with a segmentation fault:
> > >
> > > Reading file pdz_cis_ex_200ns_test.tpr, VERSION
> > > 4.6-dev-20121002-20da718-dirty (single precision)
> > >
> > > Using 1 MPI thread
> > >
> > > Using 1 OpenMP thread
> > >
> > >
> > > 2 GPUs detected:
> > >
> > >   #0: NVIDIA GeForce GTX 670, compute cap.: 3.0, ECC:  no, stat:
> > compatible
> > >
> > >   #1: NVIDIA GeForce GTX 670, compute cap.: 3.0, ECC:  no, stat:
> > compatible
> > >
> > >
> > > 1 GPU user-selected to be used for this run: #1
> > >
> > >
> > > Using CUDA 8x8x8 non-bonded kernels
> > >
> > >
> > > * WARNING * WARNING * WARNING * WARNING * WARNING * WARNING *
> > >
> > > We have just committed the new CPU detection code in this branch,
> > >
> > > and will commit new SSE/AVX kernels in a few days. However, this
> > >
> > > means that currently only the NxN kernels are accelerated!
> > >
> >
> > Since it does run on a pure CPU run (without the verlet cut-off
scheme)
> > does it maybe help to change the NxN kernels  manually in the .mdp
file
> > (how can I do so)? Or is there something wrong using the CUDA 4.2
> > version or what so ever. The libxml2 should not be a problem since
the
> > pure CPU run works.
> >
> > > In the mean time, you might want to avoid production runs in 4.6.
> > >
> > >
> > > Back Off! I just backed up pdz_cis_ex_200ns_test.trr to
> > > ./#pdz_cis_ex_200ns_test.trr.4#
> > >
> > >
> > > Back Off! I just backed up pdz_cis_ex_200ns_test.xtc to
> > > ./#pdz_cis_ex_200ns_test.xtc.4#
> > >
> > >
> > > Back Off! I just backed up pdz_cis_ex_200ns_test.edr to
> > > ./#pdz_cis_ex_200ns_test.edr.4#
> > >
> > > starting mdrun 'Protein in water'
> > >
> > > 3500000 steps,   7000.0 ps.
> > >
> > > Segmentation fault
> > >
> > >
> > > Since I have no idea whats going wrong any help is welcomed.
> > > Attached you find the log file.
> > >
> >
> > Help is really appreciated since I want to use my new desktop
including
> > the GPU's
> >
> > > Thanks a lot
> > >
> > > Sebastian
> > >
> > >
> > >
> > >
> > >
> > >
> >
> > --
> > gmx-users mailing list    gmx-users at gromacs.org
> > http://lists.gromacs.org/mailman/listinfo/gmx-users
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> > * Please don't post (un)subscribe requests to the list. Use the
> > www interface or send it to gmx-users-request at gromacs.org.
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> -- 
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> * Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the 
> www interface or send it to gmx-users-request at gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists