[gmx-users] Strange MPI problems. . .

Mark Abraham Mark.Abraham at anu.edu.au
Mon Aug 10 06:48:37 CEST 2009


Marc Charendoff wrote:
> Hello,
> 
> 
>        I am trying to perform a position restrained MD run on our school cluster on a system I have successfully run on before. The only difference is that this new run uses the 43a1 force field (the original used gmx). All my preprocessing has gone well, but when I submit to the cluster, my log file shows the following:
> 
> [node126:13068] Error in mx_open_endpoint (error Failure querying MX driver(wrong driver?))
> [node126:13068] mca_btl_mx_init: mx_open_endpoint() failed with status=1
> warning:regcache incompatible with malloc
> NNODES=2, MYRANK=0, HOSTNAME=node126
> NNODES=2, MYRANK=1, HOSTNAME=node056
> NODEID=0 argc=12
> NODEID=1 argc=12
> [node126][0,1,0][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_complete_connect] connect() failed with errno=113
> 
> I immediately reran the old calculation (gmx) which worked just fine. I should also note the my energy minimization run (43a1) on a single processor ran ok. Any guidance anyone could provide would be appreciated. 

Try some more tests on these files. My guess is that your machine's 
network topology is such that some nodes are "closer" than others, and 
that your successful runs got a better combination of two nodes. If 
using PBS, consult its documentation for how to require a suitably 
constrained combination of nodes.

Mark



More information about the gromacs.org_gmx-users mailing list