[gmx-users] Gromacs 4.6 segmentation fault with mdrun

Thu Nov 22 00:23:53 CET 2012

Roland,

He explicitly stated that he is using 20da718 which is also from the
nbnxn_hybrid_acc branch.

Raf, as Roland said, get the release-4-6 ad try again!

There's an important thing to mention: your hardware configuration is
probably quite imbalanced and the default settings are certainly not the
best to run with: two MPI processes/threads with 24 OpenMP threads + a GPU
each. GROMACS works best with balanced hardware configuration and yours is
certainly not balanced, the GPUs will not be able to keep up with 64 CPU
cores.

Regarding the run configuration  most importantly, in most cases you should
avoid running a group of OpenMP threads across sockets (except on Intel,
<=12-16 threads). On these Opterons  running OpenMP at most on a half CPU
is recommended (the CPUs are in reality two CPU dies bolted together) and
in fact you might be better off with even less threads per MPI
process/thread. This means that multiple processes will have to share a GPU
which is not optimal and work only with MPI in the current version.

So to conclude, to get the best performance you should try a few
combinations:

# process 0,1 will use GPU0, process 2,3 GPU1
# this avoids running across sockets, but for aforementioned reasons it
will still be suboptimal
mpirun -np 4 mdrun_mpi -gpu_id 0011

# process 0,1,2,3 will use GPU0, process 4,5,6,7 GPU1
# this config will probably still be slower than the next one
mpirun -np 8 mdrun_mpi -gpu_id 000011111

# process 0,1,2,3,4,5,6,7 will use GPU0, process 8,9,10,11,12,13,14,15 GPU1
# this config will probably still be slower than the next one
mpirun -np 16 mdrun_mpi -gpu_id 00000000111111111

You should go ahead and try with 32 and 64 processes as well, I suspect
that 2 or 3 threads/process will be the fastest. Depending on what system
you are simulating, this could lead to load imbalance, but that you'll have
to see.

If it turns out that the "Wait for GPU" time is more than a few percent
(which will probably be the case), it means that a GTX 580 is not fast
enough for two of these Opterons. What you can try is to run using the
"hybrid" mode with "-nb gpu_cpu" which might help.

Cheers,

--
Szilárd

On Sat, Nov 17, 2012 at 3:11 AM, Roland Schulz <roland at utk.edu> wrote:

> Hi Raf,
>
> which version of Gromacs did you use? If you used branch nbnxn_hybrid_acc
> please use branch release-4-6 instead and see whether that fixes your
> issue. If not please open a bug and upload your log file and your tpr.
>
> Roland
>
>
> On Thu, Nov 15, 2012 at 5:13 PM, Raf Ponsaerts <
> raf.ponsaerts at med.kuleuven.be> wrote:
>
> > Hi Szilárd,
> >
> > I assume I get the same segmentation fault error as Sebastian (don't
> > shoot if not so). I have 2 NVIDA GTX580 cards (and 4x12-core amd64
> > opteron 6174).
> >
> > in brief :
> > Program received signal SIGSEGV, Segmentation fault.
> > [Switching to Thread 0x7fffc07f8700 (LWP 32035)]
> > 0x00007ffff61de301 in nbnxn_make_pairlist.omp_fn.2 ()
> > from /usr/local/gromacs/bin/../lib/libmd.so.6
> >
> > Also -nb cpu with Verlet cutoff-scheme results in this error...
> >
> > gcc 4.4.5 (Debian 4.4.5-8), Linux kernel 3.1.1
> > CMake 2.8.7
> >
> > If I attach the mdrun.debug output file to this mail, the mail to the
> > list gets bounced by the mailserver (because mdrun.debug > 50 Kb).
> >
> > Hoping this might help,
> >
> > regards,
> >
> > raf
> > ===========
> > compiled code :
> > commit 20da7188b18722adcd53088ec30e5f256af62f20
> > Author: Szilard Pall <pszilard at cbr.su.se>
> > Date:   Tue Oct 2 00:29:33 2012 +0200
> >
> > ===========
> > (gdb) exec mdrun
> > (gdb) run -debug 1 -v -s test.tpr
> >
> > Reading file test.tpr, VERSION 4.6-dev-20121002-20da718 (single
> > precision)
> > [New Thread 0x7ffff3844700 (LWP 31986)]
> > [Thread 0x7ffff3844700 (LWP 31986) exited]
> > [New Thread 0x7ffff3844700 (LWP 31987)]
> > [Thread 0x7ffff3844700 (LWP 31987) exited]
> > Changing nstlist from 10 to 50, rlist from 2 to 2.156
> >
> > Starting 2 tMPI threads
> > [New Thread 0x7ffff3844700 (LWP 31992)]
> > Using 2 MPI threads
> > Using 24 OpenMP threads per tMPI thread
> >
> > 2 GPUs detected:
> >   #0: NVIDIA GeForce GTX 580, compute cap.: 2.0, ECC:  no, stat:
> > compatible
> >   #1: NVIDIA GeForce GTX 580, compute cap.: 2.0, ECC:  no, stat:
> > compatible
> >
> > 2 GPUs auto-selected to be used for this run: #0, #1
> >
> >
> > Back Off! I just backed up ctab14.xvg to ./#ctab14.xvg.1#
> > Initialized GPU ID #1: GeForce GTX 580
> > [New Thread 0x7ffff3043700 (LWP 31993)]
> >
> > Back Off! I just backed up dtab14.xvg to ./#dtab14.xvg.1#
> >
> > Back Off! I just backed up rtab14.xvg to ./#rtab14.xvg.1#
> > [New Thread 0x7ffff1b3c700 (LWP 31995)]
> > [New Thread 0x7ffff133b700 (LWP 31996)]
> > [New Thread 0x7ffff0b3a700 (LWP 31997)]
> > [New Thread 0x7fffebfff700 (LWP 31998)]
> > [New Thread 0x7fffeb7fe700 (LWP 31999)]
> > [New Thread 0x7fffeaffd700 (LWP 32000)]
> > [New Thread 0x7fffea7fc700 (LWP 32001)]
> > [New Thread 0x7fffe9ffb700 (LWP 32002)]
> > [New Thread 0x7fffe97fa700 (LWP 32003)]
> > [New Thread 0x7fffe8ff9700 (LWP 32004)]
> > [New Thread 0x7fffe87f8700 (LWP 32005)]
> > [New Thread 0x7fffe7ff7700 (LWP 32006)]
> > [New Thread 0x7fffe77f6700 (LWP 32007)]
> > [New Thread 0x7fffe6ff5700 (LWP 32008)]
> > [New Thread 0x7fffe67f4700 (LWP 32009)]
> > [New Thread 0x7fffe5ff3700 (LWP 32010)]
> > [New Thread 0x7fffe57f2700 (LWP 32011)]
> > [New Thread 0x7fffe4ff1700 (LWP 32012)]
> > [New Thread 0x7fffe47f0700 (LWP 32013)]
> > [New Thread 0x7fffe3fef700 (LWP 32014)]
> > [New Thread 0x7fffe37ee700 (LWP 32015)]
> > [New Thread 0x7fffe2fed700 (LWP 32016)]
> > [New Thread 0x7fffe27ec700 (LWP 32017)]
> > Initialized GPU ID #0: GeForce GTX 580
> > Using CUDA 8x8x8 non-bonded kernels
> > [New Thread 0x7fffe1feb700 (LWP 32018)]
> > [New Thread 0x7fffe0ae4700 (LWP 32019)]
> > [New Thread 0x7fffcbfff700 (LWP 32020)]
> > [New Thread 0x7fffcb7fe700 (LWP 32021)]
> > [New Thread 0x7fffcaffd700 (LWP 32022)]
> > [New Thread 0x7fffca7fc700 (LWP 32023)]
> > [New Thread 0x7fffc9ffb700 (LWP 32024)]
> > [New Thread 0x7fffc97fa700 (LWP 32025)]
> > [New Thread 0x7fffc8ff9700 (LWP 32026)]
> > [New Thread 0x7fffc3fff700 (LWP 32027)]
> > [New Thread 0x7fffc37fe700 (LWP 32028)]
> > [New Thread 0x7fffc2ffd700 (LWP 32029)]
> > [New Thread 0x7fffc27fc700 (LWP 32031)]
> > [New Thread 0x7fffc1ffb700 (LWP 32032)]
> > [New Thread 0x7fffc17fa700 (LWP 32033)]
> > [New Thread 0x7fffc0ff9700 (LWP 32034)]
> > [New Thread 0x7fffc07f8700 (LWP 32035)]
> > [New Thread 0x7fffbfff7700 (LWP 32036)]
> > [New Thread 0x7fffbf7f6700 (LWP 32037)]
> > [New Thread 0x7fffbeff5700 (LWP 32038)]
> > [New Thread 0x7fffbe7f4700 (LWP 32039)]
> > [New Thread 0x7fffbdff3700 (LWP 32040)]
> > [New Thread 0x7fffbd7f2700 (LWP 32042)]
> > [New Thread 0x7fffbcff1700 (LWP 32043)]
> > Making 1D domain decomposition 2 x 1 x 1
> >
> > * WARNING * WARNING * WARNING * WARNING * WARNING * WARNING *
> > We have just committed the new CPU detection code in this branch,
> > and will commit new SSE/AVX kernels in a few days. However, this
> > means that currently only the NxN kernels are accelerated!
> > In the mean time, you might want to avoid production runs in 4.6.
> >
> >
> > Back Off! I just backed up traj.trr to ./#traj.trr.1#
> >
> > Back Off! I just backed up traj.xtc to ./#traj.xtc.1#
> >
> > Back Off! I just backed up ener.edr to ./#ener.edr.1#
> > starting mdrun 'Protein in water'
> > 100000 steps,    200.0 ps.
> >
> > Program received signal SIGSEGV, Segmentation fault.
> > [Switching to Thread 0x7fffc07f8700 (LWP 32035)]
> > 0x00007ffff61de301 in nbnxn_make_pairlist.omp_fn.2 ()
> > from /usr/local/gromacs/bin/../lib/libmd.so.6
> > (gdb)
> >
> > ============================================
> > Verlet, nb by cpu only:
> >
> > (gdb) run -debug 1 -nb cpu -v -s test.tpr
> >
> > Reading file test.tpr, VERSION 4.6-dev-20121002-20da718 (single
> > precision)
> > [New Thread 0x7ffff3844700 (LWP 32050)]
> > [Thread 0x7ffff3844700 (LWP 32050) exited]
> > [New Thread 0x7ffff3844700 (LWP 32051)]
> > [Thread 0x7ffff3844700 (LWP 32051) exited]
> > Starting 48 tMPI threads
> > [New Thread 0x7ffff3844700 (LWP 32058)]
> > [New Thread 0x7ffff3043700 (LWP 32059)]
> > [New Thread 0x7ffff2842700 (LWP 32060)]
> > [New Thread 0x7ffff2041700 (LWP 32061)]
> > [New Thread 0x7ffff1840700 (LWP 32062)]
> > [New Thread 0x7ffff103f700 (LWP 32063)]
> > [New Thread 0x7ffff083e700 (LWP 32064)]
> > [New Thread 0x7fffe3fff700 (LWP 32065)]
> > [New Thread 0x7fffe37fe700 (LWP 32066)]
> > [New Thread 0x7fffe2ffd700 (LWP 32067)]
> > [New Thread 0x7fffe27fc700 (LWP 32068)]
> > [New Thread 0x7fffe1ffb700 (LWP 32069)]
> > [New Thread 0x7fffe17fa700 (LWP 32070)]
> > [New Thread 0x7fffe0ff9700 (LWP 32071)]
> > [New Thread 0x7fffdbfff700 (LWP 32072)]
> > [New Thread 0x7fffdb7fe700 (LWP 32073)]
> > [New Thread 0x7fffdaffd700 (LWP 32074)]
> > [New Thread 0x7fffda7fc700 (LWP 32075)]
> > [New Thread 0x7fffd9ffb700 (LWP 32076)]
> > [New Thread 0x7fffd97fa700 (LWP 32077)]
> > [New Thread 0x7fffd8ff9700 (LWP 32078)]
> > [New Thread 0x7fffd3fff700 (LWP 32079)]
> > [New Thread 0x7fffd37fe700 (LWP 32080)]
> > [New Thread 0x7fffd2ffd700 (LWP 32081)]
> > [New Thread 0x7fffd27fc700 (LWP 32082)]
> > [New Thread 0x7fffd1ffb700 (LWP 32083)]
> > [New Thread 0x7fffd17fa700 (LWP 32084)]
> > [New Thread 0x7fffd0ff9700 (LWP 32085)]
> > [New Thread 0x7fffd07f8700 (LWP 32086)]
> > [New Thread 0x7fffcfff7700 (LWP 32087)]
> > [New Thread 0x7fffcf7f6700 (LWP 32088)]
> > [New Thread 0x7fffceff5700 (LWP 32089)]
> > [New Thread 0x7fffce7f4700 (LWP 32090)]
> > [New Thread 0x7fffcdff3700 (LWP 32091)]
> > [New Thread 0x7fffcd7f2700 (LWP 32092)]
> > [New Thread 0x7fffccff1700 (LWP 32093)]
> > [New Thread 0x7fffcc7f0700 (LWP 32094)]
> > [New Thread 0x7fffcbfef700 (LWP 32095)]
> > [New Thread 0x7fffcb7ee700 (LWP 32096)]
> > [New Thread 0x7fffcafed700 (LWP 32097)]
> > [New Thread 0x7fffca7ec700 (LWP 32098)]
> > [New Thread 0x7fffc9feb700 (LWP 32099)]
> > [New Thread 0x7fffc97ea700 (LWP 32100)]
> > [New Thread 0x7fffc8fe9700 (LWP 32101)]
> > [New Thread 0x7fffc87e8700 (LWP 32102)]
> > [New Thread 0x7fffc7fe7700 (LWP 32103)]
> > [New Thread 0x7fffc77e6700 (LWP 32104)]
> >
> > Will use 45 particle-particle and 3 PME only nodes
> > This is a guess, check the performance at the end of the log file
> > Using 48 MPI threads
> > Using 1 OpenMP thread per tMPI thread
> >
> > 2 GPUs detected:
> >   #0: NVIDIA GeForce GTX 580, compute cap.: 2.0, ECC:  no, stat:
> > compatible
> >   #1: NVIDIA GeForce GTX 580, compute cap.: 2.0, ECC:  no, stat:
> > compatible
> >
> >
> > Back Off! I just backed up ctab14.xvg to ./#ctab14.xvg.2#
> >
> > Back Off! I just backed up dtab14.xvg to ./#dtab14.xvg.2#
> >
> > Back Off! I just backed up rtab14.xvg to ./#rtab14.xvg.2#
> > Using SSE2 4x4 non-bonded kernels
> > Making 3D domain decomposition 3 x 5 x 3
> >
> > * WARNING * WARNING * WARNING * WARNING * WARNING * WARNING *
> > We have just committed the new CPU detection code in this branch,
> > and will commit new SSE/AVX kernels in a few days. However, this
> > means that currently only the NxN kernels are accelerated!
> > In the mean time, you might want to avoid production runs in 4.6.
> >
> >
> > Back Off! I just backed up traj.trr to ./#traj.trr.2#
> >
> > Back Off! I just backed up traj.xtc to ./#traj.xtc.2#
> >
> > Back Off! I just backed up ener.edr to ./#ener.edr.2#
> > starting mdrun 'Protein in water'
> > 100000 steps,    200.0 ps.
> >
> > Program received signal SIGSEGV, Segmentation fault.
> > [Switching to Thread 0x7fffcd7f2700 (LWP 32092)]
> > 0x00007ffff61db499 in nbnxn_make_pairlist.omp_fn.2 ()
> > from /usr/local/gromacs/bin/../lib/libmd.so.6
> > (gdb)
> > =============================================
> >
> >
> > On Mon, 2012-11-12 at 19:37 +0100, Szilárd Páll wrote:
> > > Hi Sebastian,
> > >
> > > That is very likely a bug so I'd appreciate if you could provide a bit
> > more
> > > information, like:
> > >
> > > - OS, compiler
> > >
> > > - results of runs with the following configurations:
> > >   - "mdrun -nb cpu" (to run CPU-only with Verlet scheme)
> > >  - "GMX_EMULATE_GPU=1 mdrun -nb gpu" (to run GPU emulation using plain
> > C
> > > kernels);
> > >   - "mdrun" without any arguments (which will use 2x(n/2 cores + 1
> > GPU))
> > >   - "mdrun -ntmpi 1" without any other arguments (which will use n
> > cores +
> > > the first GPU)
> > >
> > > - please attach the log files of all failed and a successful run as
> > well as
> > > the mdrun.debug file from a failed runs that you can obtain with
> > "mdrun
> > > -debug 1"
> > >
> > > Note that a backtrace would be very useful and if you can get one I'd
> > > be grateful, but for now the above should be minimum effort and I'll
> > > provide simple introductions to get a backtrace later (if needed).
> > >
> > > Thanks,
> > >
> > > --
> > > Szilárd
> > >
> > >
> > > On Mon, Nov 12, 2012 at 6:22 PM, sebastian <
> > > sebastian.waltz at physik.uni-freiburg.de> wrote:
> > >
> > > > On 11/12/2012 04:12 PM, sebastian wrote:
> > > > > Dear GROMACS user,
> > > > >
> > > > > I am running in major problems trying to use gromacs 4.6 on my
> > desktop
> > > > > with two GTX 670 GPU's and one i7 cpu. On the system I installed
> > the
> > > > > CUDA 4.2, running fine for many different test programs.
> > > > > Compiling the git version of gromacs 4.6 with hybrid acceleration
> > I get
> > > > > one error message of a missing libxml2 but it compiles with no
> > further
> > > > > complaints. The tools I tested (like g_rdf or grompp usw.) work
> > fine as
> > > > > long as I generate the tpr files with the right gromacs version.
> > > > > Now, if I try to use mdrun (GMX_GPU_ID=1 mdrun -nt 1 -v
> > -deffnm ....)
> > > > > the preparation seems to work fine until it starts the actual run.
> > It
> > > > > stops with a segmentation fault:
> > > > >
> > > > > Reading file pdz_cis_ex_200ns_test.tpr, VERSION
> > > > > 4.6-dev-20121002-20da718-dirty (single precision)
> > > > >
> > > > > Using 1 MPI thread
> > > > >
> > > > > Using 1 OpenMP thread
> > > > >
> > > > >
> > > > > 2 GPUs detected:
> > > > >
> > > > >   #0: NVIDIA GeForce GTX 670, compute cap.: 3.0, ECC:  no, stat:
> > > > compatible
> > > > >
> > > > >   #1: NVIDIA GeForce GTX 670, compute cap.: 3.0, ECC:  no, stat:
> > > > compatible
> > > > >
> > > > >
> > > > > 1 GPU user-selected to be used for this run: #1
> > > > >
> > > > >
> > > > > Using CUDA 8x8x8 non-bonded kernels
> > > > >
> > > > >
> > > > > * WARNING * WARNING * WARNING * WARNING * WARNING * WARNING *
> > > > >
> > > > > We have just committed the new CPU detection code in this branch,
> > > > >
> > > > > and will commit new SSE/AVX kernels in a few days. However, this
> > > > >
> > > > > means that currently only the NxN kernels are accelerated!
> > > > >
> > > >
> > > > Since it does run on a pure CPU run (without the verlet cut-off
> > scheme)
> > > > does it maybe help to change the NxN kernels  manually in the .mdp
> > file
> > > > (how can I do so)? Or is there something wrong using the CUDA 4.2
> > > > version or what so ever. The libxml2 should not be a problem since
> > the
> > > > pure CPU run works.
> > > >
> > > > > In the mean time, you might want to avoid production runs in 4.6.
> > > > >
> > > > >
> > > > > Back Off! I just backed up pdz_cis_ex_200ns_test.trr to
> > > > > ./#pdz_cis_ex_200ns_test.trr.4#
> > > > >
> > > > >
> > > > > Back Off! I just backed up pdz_cis_ex_200ns_test.xtc to
> > > > > ./#pdz_cis_ex_200ns_test.xtc.4#
> > > > >
> > > > >
> > > > > Back Off! I just backed up pdz_cis_ex_200ns_test.edr to
> > > > > ./#pdz_cis_ex_200ns_test.edr.4#
> > > > >
> > > > > starting mdrun 'Protein in water'
> > > > >
> > > > > 3500000 steps,   7000.0 ps.
> > > > >
> > > > > Segmentation fault
> > > > >
> > > > >
> > > > > Since I have no idea whats going wrong any help is welcomed.
> > > > > Attached you find the log file.
> > > > >
> > > >
> > > > Help is really appreciated since I want to use my new desktop
> > including
> > > > the GPU's
> > > >
> > > > > Thanks a lot
> > > > >
> > > > > Sebastian
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > > --
> > > > gmx-users mailing list    gmx-users at gromacs.org
> > > > http://lists.gromacs.org/mailman/listinfo/gmx-users
> > > > * Please search the archive at
> > > > http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> > > > * Please don't post (un)subscribe requests to the list. Use the
> > > > www interface or send it to gmx-users-request at gromacs.org.
> > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > > >
> > > --
> > > gmx-users mailing list    gmx-users at gromacs.org
> > > http://lists.gromacs.org/mailman/listinfo/gmx-users
> > > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> > > * Please don't post (un)subscribe requests to the list. Use the
> > > www interface or send it to gmx-users-request at gromacs.org.
> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> >
> >
> > --
> > gmx-users mailing list    gmx-users at gromacs.org
> > http://lists.gromacs.org/mailman/listinfo/gmx-users
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> > * Please don't post (un)subscribe requests to the list. Use the
> > www interface or send it to gmx-users-request at gromacs.org.
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> >
> >
> >
> >
>
>
> --
> ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
> 865-241-1537, ORNL PO BOX 2008 MS6309
> --
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>