[gmx-users] Strange MPI problems. . .
Mark.Abraham at anu.edu.au
Mon Aug 10 06:48:37 CEST 2009
Marc Charendoff wrote:
> I am trying to perform a position restrained MD run on our school cluster on a system I have successfully run on before. The only difference is that this new run uses the 43a1 force field (the original used gmx). All my preprocessing has gone well, but when I submit to the cluster, my log file shows the following:
> [node126:13068] Error in mx_open_endpoint (error Failure querying MX driver(wrong driver?))
> [node126:13068] mca_btl_mx_init: mx_open_endpoint() failed with status=1
> warning:regcache incompatible with malloc
> NNODES=2, MYRANK=0, HOSTNAME=node126
> NNODES=2, MYRANK=1, HOSTNAME=node056
> NODEID=0 argc=12
> NODEID=1 argc=12
> [node126][0,1,0][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_complete_connect] connect() failed with errno=113
> I immediately reran the old calculation (gmx) which worked just fine. I should also note the my energy minimization run (43a1) on a single processor ran ok. Any guidance anyone could provide would be appreciated.
Try some more tests on these files. My guess is that your machine's
network topology is such that some nodes are "closer" than others, and
that your successful runs got a better combination of two nodes. If
using PBS, consult its documentation for how to require a suitably
constrained combination of nodes.
More information about the gromacs.org_gmx-users