[gmx-users] Problems with simulation on multi-nodes cluster

Mark Abraham Mark.Abraham at anu.edu.au
Tue Mar 20 15:02:38 CET 2012


On 20/03/2012 10:35 PM, James Starlight wrote:
> Could someone tell me what tell the below error
>
> Getting Loaded...
> Reading file MD_100.tpr, VERSION 4.5.4 (single precision)
> Loaded with Money
>
>
> Will use 30 particle-particle and 18 PME only nodes
> This is a guess, check the performance at the end of the log file
> [ib02:22825] *** Process received signal ***
> [ib02:22825] Signal: Segmentation fault (11)
> [ib02:22825] Signal code: Address not mapped (1)
> [ib02:22825] Failing at address: 0x10
> [ib02:22825] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xf030) 
> [0x7f535903e03$
> [ib02:22825] [ 1] /usr/lib/openmpi/lib/openmpi/mca_pml_ob1.so(+0x7e23) 
> [0x7f535$
> [ib02:22825] [ 2] /usr/lib/openmpi/lib/openmpi/mca_pml_ob1.so(+0x8601) 
> [0x7f535$
> [ib02:22825] [ 3] /usr/lib/openmpi/lib/openmpi/mca_pml_ob1.so(+0x8bab) 
> [0x7f535$
> [ib02:22825] [ 4] /usr/lib/openmpi/lib/openmpi/mca_btl_sm.so(+0x42af) 
> [0x7f5353$
> [ib02:22825] [ 5] /usr/lib/libopen-pal.so.0(opal_progress+0x5b) 
> [0x7f535790506b]
> [ib02:22825] [ 6] /usr/lib/libmpi.so.0(+0x37755) [0x7f5359282755]
> [ib02:22825] [ 7] 
> /usr/lib/openmpi/lib/openmpi/mca_coll_tuned.so(+0x1c3a) [0x7f$
> [ib02:22825] [ 8] 
> /usr/lib/openmpi/lib/openmpi/mca_coll_tuned.so(+0x7fae) [0x7f$
> [ib02:22825] [ 9] /usr/lib/libmpi.so.0(ompi_comm_split+0xbf) 
> [0x7f535926de8f]
> [ib02:22825] [10] /usr/lib/libmpi.so.0(MPI_Comm_split+0xdb) 
> [0x7f535929dc2b]
> [ib02:22825] [11] 
> /usr/lib/libgmx_mpi_d.openmpi.so.6(gmx_setup_nodecomm+0x19b) $
> [ib02:22825] [12] mdrun_mpi_d.openmpi(mdrunner+0x46a) [0x40be7a]
> [ib02:22825] [13] mdrun_mpi_d.openmpi(main+0x1256) [0x407206]
> [ib02:22825] [14] 
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd) [0x7f$
> [ib02:22825] [15] mdrun_mpi_d.openmpi() [0x407479]
> [ib02:22825] *** End of error message ***
> --------------------------------------------------------------------------
> mpiexec noticed that process rank 36 with PID 22825 on node ib02 
> exited on sign$
> --------------------------------------------------------------------------
>
>
> I've obtained it when I've tried to use my system on multi-node 
> station ( there is no problem on single node). Does this problem with 
> the cluster system or something wrong with parameters of my simulation?

The trace back suggests your MPI system is not configured correctly for 
your hardware.

Mark



More information about the gromacs.org_gmx-users mailing list