[gmx-developers] MPICH2 and parallel Gromacs errors

Casey,Richard Richard.Casey at ColoState.EDU
Fri Jun 20 18:56:15 CEST 2008


This issue appears to have been encountered by many people.  We've searched all the discussion archives and tried every recommended solution but no luck.

We have MPICH2 v.1.0.7 installed on an Apple G5 cluster (64 CPU's). And installed Gromacs v.3.3.3 with --enable-mpi option.

Single CPU jobs run OK; parallel jobs always fail.  For parallel jobs we use:

grompp -v -np 2 -p topol.top (or other values for np for more cpu's)

We launch MPD with:

mpdboot -n 2 -f /common/mpich2/mpd.hosts

We run jobs with:

/common/mpich2/bin/mpiexec -l -n 2 \
/common/gromacs/bin/mdrun_mpi -v -np 2 \
  -s /Users/richardcasey/topol.tpr \
  -g /Users/richardcasey/md.log \
  -e /Users/richardcasey/ener.edr \
  -o /Users/richardcasey/traj.trr \
  -x /Users/richardcasey/traj.xtc \
  -c /Users/richardcasey/confout.gro

The output always says:

1: Program mdrun_mpi, VERSION 3.3.3
1: Source code file: init.c, line: 69
1: Fatal error:
1: run input file /Users/richardcasey/topol.tpr was made for 2 nodes,
1: p0_29762:  p4_error: : -1
1:              while mdrun_mpi expected it to be for 1 nodes.
1: -------------------------------------------------------

We've tried everything (many variations on the above and recommendations from the discussion list) but for some reason mdrun_mpi insists that it use a single-cpu version of the topology file.  We've check environment variables and they appear to point to the right directories. /common is NFS mounted on all nodes.

Completely stumped - no idea what is wrong here.  Any suggestions?

Richard Casey

More information about the gromacs.org_gmx-developers mailing list