[gmx-users] error in the middle of running mdrun_mpi

Nizar Masbukhin nizar.fkub08 at gmail.com
Thu Oct 23 23:38:58 CEST 2014


Dear gromacs users,

I try simulate protein folding using REMD sampling method in implicit
solvent. I run my simulation on MPI-compiled gromacs 5.0.2 on single node.
I have succesfully minimized &equilibrated (NVT-constrained, and NPT
constrained) my system. However, In the middle of mdrun_mpi process, the
warning messages appear.





























*starting mdrun 'Protein'500000000 steps, 500000.0 ps.starting mdrun
'Protein'500000000 steps, 500000.0 ps.starting mdrun 'Protein'500000000
steps, 500000.0 ps.starting mdrun 'Protein'500000000 steps, 500000.0
ps.starting mdrun 'Protein'starting mdrun 'Protein'500000000 steps,
500000.0 ps.starting mdrun 'Protein'500000000 steps, 500000.0 ps.starting
mdrun 'Protein'500000000 steps, 500000.0 ps.500000000 steps, 500000.0
ps.step 2873100, will finish Sat Nov  1 10:03:07 2014WARNING: Listed
nonbonded interaction between particles 192 and 197at distance 16.773 which
is larger than the table limit 10.500 nm.This is likely either a 1,4
interaction, or a listed interaction insidea smaller molecule you are
decoupling during a free energy calculation.Since interactions at distances
beyond the table cannot be computed,they are skipped until they are inside
the table limit again. You willonly see this message once, even if it
occurs for several interactions.IMPORTANT: This should not happen in a
stable simulation, so there isprobably something wrong with your system.
Only change the table-extensiondistance in the mdp file if you are really
sure that is the reason.*




















*[nizarPC:07548] *** Process received signal ***[nizarPC:07548] Signal:
Segmentation fault (11)[nizarPC:07548] Signal code: Address not mapped
(1)[nizarPC:07548] Failing at address: 0x1ef8d90[nizarPC:07548] [ 0]
/lib/x86_64-linux-gnu/libc.so.6(+0x36c30) [0x7f610bc9fc30][nizarPC:07548] [
1]
/usr/local/gromacs/bin/../lib/libgromacs_mpi.so.0(nb_kernel_ElecGB_VdwLJ_GeomP1P1_F_avx_256_single+0x836)
[0x7f610d3a2466][nizarPC:07548] [ 2]
/usr/local/gromacs/bin/../lib/libgromacs_mpi.so.0(do_nonbonded+0x240)
[0x7f610d235a30][nizarPC:07548] [ 3]
/usr/local/gromacs/bin/../lib/libgromacs_mpi.so.0(do_force_lowlevel+0x1d3e)
[0x7f610d97bebe][nizarPC:07548] [ 4]
/usr/local/gromacs/bin/../lib/libgromacs_mpi.so.0(do_force_cutsGROUP+0x1510)
[0x7f610d91bbe0][nizarPC:07548] [ 5] mdrun_mpi(do_md+0x57c1)
[0x42e5e1][nizarPC:07548] [ 6] mdrun_mpi(mdrunner+0x12a1)
[0x413af1][nizarPC:07548] [ 7] mdrun_mpi(_Z9gmx_mdruniPPc+0x18e5)
[0x4337b5][nizarPC:07548] [ 8]
/usr/local/gromacs/bin/../lib/libgromacs_mpi.so.0(_ZN3gmx24CommandLineModuleManager3runEiPPc+0x92)
[0x7f610ce15a42][nizarPC:07548] [ 9] mdrun_mpi(main+0x7c)
[0x40cb8c][nizarPC:07548] [10]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)
[0x7f610bc8aec5][nizarPC:07548] [11] mdrun_mpi() [0x40ccce][nizarPC:07548]
*** End of error message
***--------------------------------------------------------------------------mpirun
noticed that process rank 5 with PID 7548 on node nizarPC exited on signal
11 (Segmentation fault).*
I have increased the table-extension to 500.00 (how much this value should
be?), and re-grompp and mdrun again. there were no warning message
regarding table-extension anymore, However, this error messages showed:




































*starting mdrun 'Protein'500000000 steps, 500000.0 ps.starting mdrun
'Protein'500000000 steps, 500000.0 ps.starting mdrun 'Protein'500000000
steps, 500000.0 ps.starting mdrun 'Protein'500000000 steps, 500000.0
ps.starting mdrun 'Protein'500000000 steps, 500000.0 ps.starting mdrun
'Protein'starting mdrun 'Protein'500000000 steps, 500000.0 ps.starting
mdrun 'Protein'500000000 steps, 500000.0 ps.500000000 steps, 500000.0
ps.step 4142800, will finish Sat Nov  1 10:35:55 2014[nizarPC:09984] ***
Process received signal ***[nizarPC:09984] Signal: Segmentation fault
(11)[nizarPC:09984] Signal code: Address not mapped (1)[nizarPC:09984]
Failing at address: 0x1464040[nizarPC:09984] [ 0]
/lib/x86_64-linux-gnu/libc.so.6(+0x36c30) [0x7fa764b65c30][nizarPC:09984] [
1]
/usr/local/gromacs/bin/../lib/libgromacs_mpi.so.0(nb_kernel_ElecGB_VdwLJ_GeomP1P1_F_avx_256_single+0x85f)
[0x7fa76626848f][nizarPC:09984] [ 2]
/usr/local/gromacs/bin/../lib/libgromacs_mpi.so.0(do_nonbonded+0x240)
[0x7fa7660fba30][nizarPC:09984] [ 3]
/usr/local/gromacs/bin/../lib/libgromacs_mpi.so.0(do_force_lowlevel+0x1d3e)
[0x7fa766841ebe][nizarPC:09984] [ 4]
/usr/local/gromacs/bin/../lib/libgromacs_mpi.so.0(do_force_cutsGROUP+0x1510)
[0x7fa7667e1be0][nizarPC:09984] [ 5] mdrun_mpi(do_md+0x57c1)
[0x42e5e1][nizarPC:09984] [ 6] mdrun_mpi(mdrunner+0x12a1)
[0x413af1][nizarPC:09984] [ 7] mdrun_mpi(_Z9gmx_mdruniPPc+0x18e5)
[0x4337b5][nizarPC:09984] [ 8]
/usr/local/gromacs/bin/../lib/libgromacs_mpi.so.0(_ZN3gmx24CommandLineModuleManager3runEiPPc+0x92)
[0x7fa765cdba42][nizarPC:09984] [ 9] mdrun_mpi(main+0x7c)
[0x40cb8c][nizarPC:09984] [10]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)
[0x7fa764b50ec5][nizarPC:09984] [11] mdrun_mpi() [0x40ccce][nizarPC:09984]
*** End of error message
***--------------------------------------------------------------------------mpirun
noticed that process rank 6 with PID 9984 on node nizarPC exited on signal
11 (Segmentation fault).*

Then I just continued the mdrun_mpi (using .cpt file). The simulation run
fine 1 ps after this the same error messages appeared:



































*starting mdrun 'Protein'starting mdrun 'Protein'500000000 steps, 500000.0
ps (continuing from step 3961630,   3961.6 ps).starting mdrun
'Protein'500000000 steps, 500000.0 ps (continuing from step 3961630,
3961.6 ps).starting mdrun 'Protein'500000000 steps, 500000.0 ps (continuing
from step 3961630,   3961.6 ps).starting mdrun 'Protein'500000000 steps,
500000.0 ps (continuing from step 3961630,   3961.6 ps).starting mdrun
'Protein'500000000 steps, 500000.0 ps (continuing from step 3961630,
3961.6 ps).starting mdrun 'Protein'500000000 steps, 500000.0 ps (continuing
from step 3961630,   3961.6 ps).starting mdrun 'Protein'500000000 steps,
500000.0 ps (continuing from step 3961630,   3961.6 ps).500000000 steps,
500000.0 ps (continuing from step 3961630,   3961.6 ps).step 4790900, will
finish Sun Nov  2 03:03:06 2014[nizarPC:11170] *** Process received signal
***[nizarPC:11170] Signal: Segmentation fault (11)[nizarPC:11170] Signal
code: Address not mapped (1)[nizarPC:11170] Failing at address:
0x29a0260[nizarPC:11170] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)
[0x7f8b07ba0c30][nizarPC:11170] [ 1]
/usr/local/gromacs/bin/../lib/libgromacs_mpi.so.0(nb_kernel_ElecGB_VdwLJ_GeomP1P1_F_avx_256_single+0x836)
[0x7f8b092a3466][nizarPC:11170] [ 2]
/usr/local/gromacs/bin/../lib/libgromacs_mpi.so.0(do_nonbonded+0x240)
[0x7f8b09136a30][nizarPC:11170] [ 3]
/usr/local/gromacs/bin/../lib/libgromacs_mpi.so.0(do_force_lowlevel+0x1d3e)
[0x7f8b0987cebe][nizarPC:11170] [ 4]
/usr/local/gromacs/bin/../lib/libgromacs_mpi.so.0(do_force_cutsGROUP+0x1510)
[0x7f8b0981cbe0][nizarPC:11170] [ 5] mdrun_mpi(do_md+0x57c1)
[0x42e5e1][nizarPC:11170] [ 6] mdrun_mpi(mdrunner+0x12a1)
[0x413af1][nizarPC:11170] [ 7] mdrun_mpi(_Z9gmx_mdruniPPc+0x18e5)
[0x4337b5][nizarPC:11170] [ 8]
/usr/local/gromacs/bin/../lib/libgromacs_mpi.so.0(_ZN3gmx24CommandLineModuleManager3runEiPPc+0x92)
[0x7f8b08d16a42][nizarPC:11170] [ 9] mdrun_mpi(main+0x7c)
[0x40cb8c][nizarPC:11170] [10]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)
[0x7f8b07b8bec5][nizarPC:11170] [11] mdrun_mpi() [0x40ccce][nizarPC:11170]
*** End of error message
***--------------------------------------------------------------------------mpirun
noticed that process rank 1 with PID 11170 on node nizarPC exited on signal
11 (Segmentation fault).*

I did that (continuing simulation) several times, till the las error
messages showed:



































*starting mdrun 'Protein'starting mdrun 'Protein'500000000 steps, 500000.0
ps (continuing from step 6071150,   6071.2 ps).starting mdrun
'Protein'500000000 steps, 500000.0 ps (continuing from step 6071150,
6071.2 ps).starting mdrun 'Protein'500000000 steps, 500000.0 ps (continuing
from step 6071150,   6071.2 ps).starting mdrun 'Protein'500000000 steps,
500000.0 ps (continuing from step 6071150,   6071.2 ps).starting mdrun
'Protein'starting mdrun 'Protein'500000000 steps, 500000.0 ps (continuing
from step 6071150,   6071.2 ps).starting mdrun 'Protein'500000000 steps,
500000.0 ps (continuing from step 6071150,   6071.2 ps).500000000 steps,
500000.0 ps (continuing from step 6071150,   6071.2 ps).500000000 steps,
500000.0 ps (continuing from step 6071150,   6071.2 ps).step 6286100, will
finish Sun Nov  2 15:09:42 2014[nizarPC:11605] *** Process received signal
***[nizarPC:11605] Signal: Segmentation fault (11)[nizarPC:11605] Signal
code: Address not mapped (1)[nizarPC:11605] Failing at address:
0x4769060[nizarPC:11605] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)
[0x7f5931c8bc30][nizarPC:11605] [ 1]
/usr/local/gromacs/bin/../lib/libgromacs_mpi.so.0(nb_kernel_ElecGB_VdwLJ_GeomP1P1_F_avx_256_single+0x1153)
[0x7f593338ed83][nizarPC:11605] [ 2]
/usr/local/gromacs/bin/../lib/libgromacs_mpi.so.0(do_nonbonded+0x240)
[0x7f5933221a30][nizarPC:11605] [ 3]
/usr/local/gromacs/bin/../lib/libgromacs_mpi.so.0(do_force_lowlevel+0x1d3e)
[0x7f5933967ebe][nizarPC:11605] [ 4]
/usr/local/gromacs/bin/../lib/libgromacs_mpi.so.0(do_force_cutsGROUP+0x1510)
[0x7f5933907be0][nizarPC:11605] [ 5] mdrun_mpi(do_md+0x57c1)
[0x42e5e1][nizarPC:11605] [ 6] mdrun_mpi(mdrunner+0x12a1)
[0x413af1][nizarPC:11605] [ 7] mdrun_mpi(_Z9gmx_mdruniPPc+0x18e5)
[0x4337b5][nizarPC:11605] [ 8]
/usr/local/gromacs/bin/../lib/libgromacs_mpi.so.0(_ZN3gmx24CommandLineModuleManager3runEiPPc+0x92)
[0x7f5932e01a42][nizarPC:11605] [ 9] mdrun_mpi(main+0x7c)
[0x40cb8c][nizarPC:11605] [10]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)
[0x7f5931c76ec5][nizarPC:11605] [11] mdrun_mpi() [0x40ccce][nizarPC:11605]
*** End of error message
***--------------------------------------------------------------------------mpirun
noticed that process rank 4 with PID 11605 on node nizarPC exited on signal
11 (Segmentation fault).*

What that error messages appeared? I thought that my mdp file was OK.
Could it possibly due to I change the CPU frequency during simulation?



-- 
Thanks
My Best Regards, Nizar
Medical Faculty of Brawijaya University


More information about the gromacs.org_gmx-users mailing list