[gmx-users] Problems with simulation on multi-nodes cluster

James Starlight jmsstarlight at gmail.com
Mon Apr 2 11:13:52 CEST 2012


Mark,

As I've told previously I have problems with the running simulation in
multi-node mode.

I checked logs of such simulations and fond like this

Will use 10 particle-particle and 6 PME only nodes
This is a guess, check the performance at the end of the log file
Using 6 separate PME nodes

This simulation was run on the 2 nodes ( 2*8 CPUs). And I've never obtain
the same notions about PME nodes when I've launch my systems on the singe
node. Might it be that some special options for the PME nodes are needed in
the mdp file to be defined ?

James

20 марта 2012 г. 18:02 пользователь Mark Abraham
<Mark.Abraham at anu.edu.au>написал:

> On 20/03/2012 10:35 PM, James Starlight wrote:
>
>> Could someone tell me what tell the below error
>>
>> Getting Loaded...
>> Reading file MD_100.tpr, VERSION 4.5.4 (single precision)
>> Loaded with Money
>>
>>
>> Will use 30 particle-particle and 18 PME only nodes
>> This is a guess, check the performance at the end of the log file
>> [ib02:22825] *** Process received signal ***
>> [ib02:22825] Signal: Segmentation fault (11)
>> [ib02:22825] Signal code: Address not mapped (1)
>> [ib02:22825] Failing at address: 0x10
>> [ib02:22825] [ 0] /lib/x86_64-linux-gnu/**libpthread.so.0(+0xf030)
>> [0x7f535903e03$
>> [ib02:22825] [ 1] /usr/lib/openmpi/lib/openmpi/**mca_pml_ob1.so(+0x7e23)
>> [0x7f535$
>> [ib02:22825] [ 2] /usr/lib/openmpi/lib/openmpi/**mca_pml_ob1.so(+0x8601)
>> [0x7f535$
>> [ib02:22825] [ 3] /usr/lib/openmpi/lib/openmpi/**mca_pml_ob1.so(+0x8bab)
>> [0x7f535$
>> [ib02:22825] [ 4] /usr/lib/openmpi/lib/openmpi/**mca_btl_sm.so(+0x42af)
>> [0x7f5353$
>> [ib02:22825] [ 5] /usr/lib/libopen-pal.so.0(**opal_progress+0x5b)
>> [0x7f535790506b]
>> [ib02:22825] [ 6] /usr/lib/libmpi.so.0(+0x37755) [0x7f5359282755]
>> [ib02:22825] [ 7] /usr/lib/openmpi/lib/openmpi/**mca_coll_tuned.so(+0x1c3a)
>> [0x7f$
>> [ib02:22825] [ 8] /usr/lib/openmpi/lib/openmpi/**mca_coll_tuned.so(+0x7fae)
>> [0x7f$
>> [ib02:22825] [ 9] /usr/lib/libmpi.so.0(ompi_**comm_split+0xbf)
>> [0x7f535926de8f]
>> [ib02:22825] [10] /usr/lib/libmpi.so.0(MPI_Comm_**split+0xdb)
>> [0x7f535929dc2b]
>> [ib02:22825] [11] /usr/lib/libgmx_mpi_d.openmpi.**so.6(gmx_setup_nodecomm+0x19b)
>> $
>> [ib02:22825] [12] mdrun_mpi_d.openmpi(mdrunner+**0x46a) [0x40be7a]
>> [ib02:22825] [13] mdrun_mpi_d.openmpi(main+**0x1256) [0x407206]
>> [ib02:22825] [14] /lib/x86_64-linux-gnu/libc.so.**6(__libc_start_main+0xfd)
>> [0x7f$
>> [ib02:22825] [15] mdrun_mpi_d.openmpi() [0x407479]
>> [ib02:22825] *** End of error message ***
>> ------------------------------**------------------------------**
>> --------------
>> mpiexec noticed that process rank 36 with PID 22825 on node ib02 exited
>> on sign$
>> ------------------------------**------------------------------**
>> --------------
>>
>>
>> I've obtained it when I've tried to use my system on multi-node station (
>> there is no problem on single node). Does this problem with the cluster
>> system or something wrong with parameters of my simulation?
>>
>
> The trace back suggests your MPI system is not configured correctly for
> your hardware.
>
> Mark
>
> --
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/**mailman/listinfo/gmx-users<http://lists.gromacs.org/mailman/listinfo/gmx-users>
> Please search the archive at http://www.gromacs.org/**
> Support/Mailing_Lists/Search<http://www.gromacs.org/Support/Mailing_Lists/Search>before posting!
> Please don't post (un)subscribe requests to the list. Use the www
> interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/**Support/Mailing_Lists<http://www.gromacs.org/Support/Mailing_Lists>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20120402/35c5a8a8/attachment.html>


More information about the gromacs.org_gmx-users mailing list