[gmx-users] error in the middle of running mdrun_mpi

Mark Abraham mark.j.abraham at gmail.com
Fri Oct 24 00:04:17 CEST 2014


Hi,

The warning message told you not to increase the table distance unless you
were sure the table distance was the problem. Why were you sure the table
distance was the problem, rather than some form of general instability of
your system? In addition to all the usual reasons for
http://www.gromacs.org/Documentation/Terminology/Blowing_Up, the GB kernels
are completely untested, so you might try running with 4.5.7 (last version
known to be probably-good for GB) to see whether the problem is in the code
or your setup.

Mark

On Thu, Oct 23, 2014 at 10:38 PM, Nizar Masbukhin <nizar.fkub08 at gmail.com>
wrote:

> Dear gromacs users,
>
> I try simulate protein folding using REMD sampling method in implicit
> solvent. I run my simulation on MPI-compiled gromacs 5.0.2 on single node.
> I have succesfully minimized &equilibrated (NVT-constrained, and NPT
> constrained) my system. However, In the middle of mdrun_mpi process, the
> warning messages appear.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *starting mdrun 'Protein'500000000 steps, 500000.0 ps.starting mdrun
> 'Protein'500000000 steps, 500000.0 ps.starting mdrun 'Protein'500000000
> steps, 500000.0 ps.starting mdrun 'Protein'500000000 steps, 500000.0
> ps.starting mdrun 'Protein'starting mdrun 'Protein'500000000 steps,
> 500000.0 ps.starting mdrun 'Protein'500000000 steps, 500000.0 ps.starting
> mdrun 'Protein'500000000 steps, 500000.0 ps.500000000 steps, 500000.0
> ps.step 2873100, will finish Sat Nov  1 10:03:07 2014WARNING: Listed
> nonbonded interaction between particles 192 and 197at distance 16.773 which
> is larger than the table limit 10.500 nm.This is likely either a 1,4
> interaction, or a listed interaction insidea smaller molecule you are
> decoupling during a free energy calculation.Since interactions at distances
> beyond the table cannot be computed,they are skipped until they are inside
> the table limit again. You willonly see this message once, even if it
> occurs for several interactions.IMPORTANT: This should not happen in a
> stable simulation, so there isprobably something wrong with your system.
> Only change the table-extensiondistance in the mdp file if you are really
> sure that is the reason.*
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *[nizarPC:07548] *** Process received signal ***[nizarPC:07548] Signal:
> Segmentation fault (11)[nizarPC:07548] Signal code: Address not mapped
> (1)[nizarPC:07548] Failing at address: 0x1ef8d90[nizarPC:07548] [ 0]
> /lib/x86_64-linux-gnu/libc.so.6(+0x36c30) [0x7f610bc9fc30][nizarPC:07548] [
> 1]
>
> /usr/local/gromacs/bin/../lib/libgromacs_mpi.so.0(nb_kernel_ElecGB_VdwLJ_GeomP1P1_F_avx_256_single+0x836)
> [0x7f610d3a2466][nizarPC:07548] [ 2]
> /usr/local/gromacs/bin/../lib/libgromacs_mpi.so.0(do_nonbonded+0x240)
> [0x7f610d235a30][nizarPC:07548] [ 3]
> /usr/local/gromacs/bin/../lib/libgromacs_mpi.so.0(do_force_lowlevel+0x1d3e)
> [0x7f610d97bebe][nizarPC:07548] [ 4]
>
> /usr/local/gromacs/bin/../lib/libgromacs_mpi.so.0(do_force_cutsGROUP+0x1510)
> [0x7f610d91bbe0][nizarPC:07548] [ 5] mdrun_mpi(do_md+0x57c1)
> [0x42e5e1][nizarPC:07548] [ 6] mdrun_mpi(mdrunner+0x12a1)
> [0x413af1][nizarPC:07548] [ 7] mdrun_mpi(_Z9gmx_mdruniPPc+0x18e5)
> [0x4337b5][nizarPC:07548] [ 8]
>
> /usr/local/gromacs/bin/../lib/libgromacs_mpi.so.0(_ZN3gmx24CommandLineModuleManager3runEiPPc+0x92)
> [0x7f610ce15a42][nizarPC:07548] [ 9] mdrun_mpi(main+0x7c)
> [0x40cb8c][nizarPC:07548] [10]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)
> [0x7f610bc8aec5][nizarPC:07548] [11] mdrun_mpi() [0x40ccce][nizarPC:07548]
> *** End of error message
>
> ***--------------------------------------------------------------------------mpirun
> noticed that process rank 5 with PID 7548 on node nizarPC exited on signal
> 11 (Segmentation fault).*
> I have increased the table-extension to 500.00 (how much this value should
> be?), and re-grompp and mdrun again. there were no warning message
> regarding table-extension anymore, However, this error messages showed:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *starting mdrun 'Protein'500000000 steps, 500000.0 ps.starting mdrun
> 'Protein'500000000 steps, 500000.0 ps.starting mdrun 'Protein'500000000
> steps, 500000.0 ps.starting mdrun 'Protein'500000000 steps, 500000.0
> ps.starting mdrun 'Protein'500000000 steps, 500000.0 ps.starting mdrun
> 'Protein'starting mdrun 'Protein'500000000 steps, 500000.0 ps.starting
> mdrun 'Protein'500000000 steps, 500000.0 ps.500000000 steps, 500000.0
> ps.step 4142800, will finish Sat Nov  1 10:35:55 2014[nizarPC:09984] ***
> Process received signal ***[nizarPC:09984] Signal: Segmentation fault
> (11)[nizarPC:09984] Signal code: Address not mapped (1)[nizarPC:09984]
> Failing at address: 0x1464040[nizarPC:09984] [ 0]
> /lib/x86_64-linux-gnu/libc.so.6(+0x36c30) [0x7fa764b65c30][nizarPC:09984] [
> 1]
>
> /usr/local/gromacs/bin/../lib/libgromacs_mpi.so.0(nb_kernel_ElecGB_VdwLJ_GeomP1P1_F_avx_256_single+0x85f)
> [0x7fa76626848f][nizarPC:09984] [ 2]
> /usr/local/gromacs/bin/../lib/libgromacs_mpi.so.0(do_nonbonded+0x240)
> [0x7fa7660fba30][nizarPC:09984] [ 3]
> /usr/local/gromacs/bin/../lib/libgromacs_mpi.so.0(do_force_lowlevel+0x1d3e)
> [0x7fa766841ebe][nizarPC:09984] [ 4]
>
> /usr/local/gromacs/bin/../lib/libgromacs_mpi.so.0(do_force_cutsGROUP+0x1510)
> [0x7fa7667e1be0][nizarPC:09984] [ 5] mdrun_mpi(do_md+0x57c1)
> [0x42e5e1][nizarPC:09984] [ 6] mdrun_mpi(mdrunner+0x12a1)
> [0x413af1][nizarPC:09984] [ 7] mdrun_mpi(_Z9gmx_mdruniPPc+0x18e5)
> [0x4337b5][nizarPC:09984] [ 8]
>
> /usr/local/gromacs/bin/../lib/libgromacs_mpi.so.0(_ZN3gmx24CommandLineModuleManager3runEiPPc+0x92)
> [0x7fa765cdba42][nizarPC:09984] [ 9] mdrun_mpi(main+0x7c)
> [0x40cb8c][nizarPC:09984] [10]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)
> [0x7fa764b50ec5][nizarPC:09984] [11] mdrun_mpi() [0x40ccce][nizarPC:09984]
> *** End of error message
>
> ***--------------------------------------------------------------------------mpirun
> noticed that process rank 6 with PID 9984 on node nizarPC exited on signal
> 11 (Segmentation fault).*
>
> Then I just continued the mdrun_mpi (using .cpt file). The simulation run
> fine 1 ps after this the same error messages appeared:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *starting mdrun 'Protein'starting mdrun 'Protein'500000000 steps, 500000.0
> ps (continuing from step 3961630,   3961.6 ps).starting mdrun
> 'Protein'500000000 steps, 500000.0 ps (continuing from step 3961630,
> 3961.6 ps).starting mdrun 'Protein'500000000 steps, 500000.0 ps (continuing
> from step 3961630,   3961.6 ps).starting mdrun 'Protein'500000000 steps,
> 500000.0 ps (continuing from step 3961630,   3961.6 ps).starting mdrun
> 'Protein'500000000 steps, 500000.0 ps (continuing from step 3961630,
> 3961.6 ps).starting mdrun 'Protein'500000000 steps, 500000.0 ps (continuing
> from step 3961630,   3961.6 ps).starting mdrun 'Protein'500000000 steps,
> 500000.0 ps (continuing from step 3961630,   3961.6 ps).500000000 steps,
> 500000.0 ps (continuing from step 3961630,   3961.6 ps).step 4790900, will
> finish Sun Nov  2 03:03:06 2014[nizarPC:11170] *** Process received signal
> ***[nizarPC:11170] Signal: Segmentation fault (11)[nizarPC:11170] Signal
> code: Address not mapped (1)[nizarPC:11170] Failing at address:
> 0x29a0260[nizarPC:11170] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)
> [0x7f8b07ba0c30][nizarPC:11170] [ 1]
>
> /usr/local/gromacs/bin/../lib/libgromacs_mpi.so.0(nb_kernel_ElecGB_VdwLJ_GeomP1P1_F_avx_256_single+0x836)
> [0x7f8b092a3466][nizarPC:11170] [ 2]
> /usr/local/gromacs/bin/../lib/libgromacs_mpi.so.0(do_nonbonded+0x240)
> [0x7f8b09136a30][nizarPC:11170] [ 3]
> /usr/local/gromacs/bin/../lib/libgromacs_mpi.so.0(do_force_lowlevel+0x1d3e)
> [0x7f8b0987cebe][nizarPC:11170] [ 4]
>
> /usr/local/gromacs/bin/../lib/libgromacs_mpi.so.0(do_force_cutsGROUP+0x1510)
> [0x7f8b0981cbe0][nizarPC:11170] [ 5] mdrun_mpi(do_md+0x57c1)
> [0x42e5e1][nizarPC:11170] [ 6] mdrun_mpi(mdrunner+0x12a1)
> [0x413af1][nizarPC:11170] [ 7] mdrun_mpi(_Z9gmx_mdruniPPc+0x18e5)
> [0x4337b5][nizarPC:11170] [ 8]
>
> /usr/local/gromacs/bin/../lib/libgromacs_mpi.so.0(_ZN3gmx24CommandLineModuleManager3runEiPPc+0x92)
> [0x7f8b08d16a42][nizarPC:11170] [ 9] mdrun_mpi(main+0x7c)
> [0x40cb8c][nizarPC:11170] [10]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)
> [0x7f8b07b8bec5][nizarPC:11170] [11] mdrun_mpi() [0x40ccce][nizarPC:11170]
> *** End of error message
>
> ***--------------------------------------------------------------------------mpirun
> noticed that process rank 1 with PID 11170 on node nizarPC exited on signal
> 11 (Segmentation fault).*
>
> I did that (continuing simulation) several times, till the las error
> messages showed:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *starting mdrun 'Protein'starting mdrun 'Protein'500000000 steps, 500000.0
> ps (continuing from step 6071150,   6071.2 ps).starting mdrun
> 'Protein'500000000 steps, 500000.0 ps (continuing from step 6071150,
> 6071.2 ps).starting mdrun 'Protein'500000000 steps, 500000.0 ps (continuing
> from step 6071150,   6071.2 ps).starting mdrun 'Protein'500000000 steps,
> 500000.0 ps (continuing from step 6071150,   6071.2 ps).starting mdrun
> 'Protein'starting mdrun 'Protein'500000000 steps, 500000.0 ps (continuing
> from step 6071150,   6071.2 ps).starting mdrun 'Protein'500000000 steps,
> 500000.0 ps (continuing from step 6071150,   6071.2 ps).500000000 steps,
> 500000.0 ps (continuing from step 6071150,   6071.2 ps).500000000 steps,
> 500000.0 ps (continuing from step 6071150,   6071.2 ps).step 6286100, will
> finish Sun Nov  2 15:09:42 2014[nizarPC:11605] *** Process received signal
> ***[nizarPC:11605] Signal: Segmentation fault (11)[nizarPC:11605] Signal
> code: Address not mapped (1)[nizarPC:11605] Failing at address:
> 0x4769060[nizarPC:11605] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)
> [0x7f5931c8bc30][nizarPC:11605] [ 1]
>
> /usr/local/gromacs/bin/../lib/libgromacs_mpi.so.0(nb_kernel_ElecGB_VdwLJ_GeomP1P1_F_avx_256_single+0x1153)
> [0x7f593338ed83][nizarPC:11605] [ 2]
> /usr/local/gromacs/bin/../lib/libgromacs_mpi.so.0(do_nonbonded+0x240)
> [0x7f5933221a30][nizarPC:11605] [ 3]
> /usr/local/gromacs/bin/../lib/libgromacs_mpi.so.0(do_force_lowlevel+0x1d3e)
> [0x7f5933967ebe][nizarPC:11605] [ 4]
>
> /usr/local/gromacs/bin/../lib/libgromacs_mpi.so.0(do_force_cutsGROUP+0x1510)
> [0x7f5933907be0][nizarPC:11605] [ 5] mdrun_mpi(do_md+0x57c1)
> [0x42e5e1][nizarPC:11605] [ 6] mdrun_mpi(mdrunner+0x12a1)
> [0x413af1][nizarPC:11605] [ 7] mdrun_mpi(_Z9gmx_mdruniPPc+0x18e5)
> [0x4337b5][nizarPC:11605] [ 8]
>
> /usr/local/gromacs/bin/../lib/libgromacs_mpi.so.0(_ZN3gmx24CommandLineModuleManager3runEiPPc+0x92)
> [0x7f5932e01a42][nizarPC:11605] [ 9] mdrun_mpi(main+0x7c)
> [0x40cb8c][nizarPC:11605] [10]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)
> [0x7f5931c76ec5][nizarPC:11605] [11] mdrun_mpi() [0x40ccce][nizarPC:11605]
> *** End of error message
>
> ***--------------------------------------------------------------------------mpirun
> noticed that process rank 4 with PID 11605 on node nizarPC exited on signal
> 11 (Segmentation fault).*
>
> What that error messages appeared? I thought that my mdp file was OK.
> Could it possibly due to I change the CPU frequency during simulation?
>
>
>
> --
> Thanks
> My Best Regards, Nizar
> Medical Faculty of Brawijaya University
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>


More information about the gromacs.org_gmx-users mailing list