[gmx-users] Random segmentation faults

santhu kumar mesanthu at gmail.com
Tue Feb 26 03:12:21 CET 2013


Hello all,

My gromacs run on multi node cluster works most of the times but the
simulation crashes randomly[sometimes].
I understand that random seg faults are hard to trace and solve but are
there any "Recommended" MPICH versions or any other dependent software
versions.

[node4:25538] *** Process received signal ***
[node4:25538] Signal: Segmentation fault (11)
[node4:25538] Signal code: Address not mapped (1)
[node4:25538] Failing at address: 0x8f6a5
[node4:25538] [ 0] /lib64/libc.so.6 [0x3e9ee30280]
[node4:25538] [ 1]
/usr/lib64/openmpi/1.2.7-gcc/lib/libopen-pal.so.0(_int_malloc+0x2a5)
[0x39d7a2b0d5]
[node4:25538] [ 2]
/usr/lib64/openmpi/1.2.7-gcc/lib/libopen-pal.so.0(malloc+0x93)
[0x39d7a2ce03]
[node4:25538] [ 3] /lib64/libc.so.6 [0x3e9ee616ba]
[node4:25538] [ 4] /export/gmx/bin/mdrun_mpi(do_md+0x2d48) [0x41f6b8]
[node4:25538] [ 5] /export/gmx/bin/mdrun_mpi(mdrunner+0xa19) [0x40d329]
[node4:25538] [ 6] /export/gmx/bin/mdrun_mpi(main+0x1332) [0x425b82]
[node4:25538] [ 7] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3e9ee1d974]
[node4:25538] [ 8] /export/gmx/bin/mdrun_mpi(do_nm+0x519) [0x405dc9]
[node4:25538] *** End of error message ***
[node3][0,1,0][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv]
mca_btl_tcp_frag_recv: readv failed with errno=104
mpirun noticed that job rank 1 with PID 25538 on node node4 exited on
signal 11 (Segmentation fault).

The code that I am using for this gromacs is tweeked for adding a custom
force in the system but none of the programs in the stack trace point it to
that function.
My mpirun is of version (Open MPI) 1.2.7.

Thanks
Santhosh



More information about the gromacs.org_gmx-users mailing list