[gmx-users] problem with replica exchange

Mark Abraham Mark.Abraham at anu.edu.au
Thu May 26 05:12:07 CEST 2011


On 26/05/2011 12:25 PM, jagannath mondal wrote:
> Hi,
>   I am having a problem in running replica exchange simulation over 
> multiple nodes.
> To run the simulation for 16 replicas over two 8-core processors, I 
> generated a hostfile as follows:
>  yethiraj30 slots=8 max_slots=8
>   yethiraj31 slots=8 max_slots=8
>
> These two machines are intra-connected and I have installed openmpi
> Then If I try to run the replica exchange simulation using the 
> following command:
> mpirun -np 16 --hostfile  hostfile  mdrun_4mpi -s topol_.tpr -multi 16 
> -replex 100 >& log_replica_test
>
> But I find following error and mdrun does not proceed at all :
>
> NNODES=16, MYRANK=0, HOSTNAME=yethiraj30
> NNODES=16, MYRANK=1, HOSTNAME=yethiraj30
> NNODES=16, MYRANK=4, HOSTNAME=yethiraj30
> NNODES=16, MYRANK=2, HOSTNAME=yethiraj30
> NNODES=16, MYRANK=6, HOSTNAME=yethiraj30
> NNODES=16, MYRANK=3, HOSTNAME=yethiraj30
> NNODES=16, MYRANK=5, HOSTNAME=yethiraj30
> NNODES=16, MYRANK=7, HOSTNAME=yethiraj30
> [yethiraj30][[22604,1],0][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] 
> connect() to 192.168.0.31 failed: No route to host (113)
> [yethiraj30][[22604,1],4][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] 
> connect() to 192.168.0.31 failed: No route to host (113)
> [yethiraj30][[22604,1],6][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] 
> connect() to 192.168.0.31 failed: No route to host (113)
> [yethiraj30][[22604,1],1][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] 
> connect() to 192.168.0.31 failed: No route to host (113)
> [yethiraj30][[22604,1],3][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] 
> connect() to 192.168.0.31 failed: No route to host (113)
> [yethiraj30][[22604,1],2][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] 
> connect() to 192.168.0.31 failed: No route to host (113)
> NNODES=16, MYRANK=10, HOSTNAME=yethiraj31
> NNODES=16, MYRANK=12, HOSTNAME=yethiraj31
>
> I am not sure how to resolve this issue. In general, I can go from one 
> machine to another without any problem using ssh. But, when I am 
> trying to run openmpi over both the machines, I get this error. Any 
> help will be appreciated.
>

Sorry, this is a problem of MPI configuration, not GROMACS, if six of 
your processors are not talking properly. You'll need to read the 
documentation and/or talk with your sysadmins.

Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20110526/b583be6a/attachment.html>


More information about the gromacs.org_gmx-users mailing list