[gmx-users] MPICH2 and parallel Gromacs errors

Dr. Bernd Rupp rupp at fmp-berlin.de
Mon Jun 23 17:05:15 CEST 2008


Hello,
here our scripts for running mdrun with mpich2:

mpdboot.sh :

#!/bin/sh

NODES=7

HFILE=./mpd.hosts

# ------------------------------------------

MPI=/opt/MPICH2/bin

$MPI/mpdboot -v -n `expr $NODES + 1` -f $HFILE -r /usr/bin/rsh -m $MPI/mpd

eof

mpd.hosts:
node1
node2
node3
node4
node5
node6
node7
node8


eof


We run jobs with:
mpiexec -l   -machinefile host.file -n 8 mdrun_mpi -nice 0 -np 8 \     
		-s topol.tpr [options]	< /dev/null

host.file:
#node1:2
node2:2
node3:2
node4:2
node5:2
#node6:2
#node7:2
#node8:2
eof

format : 

# commentline
hostname: number of cores per host

In this case you ask for 8 cores at 4 hosts.

on our machine this solution runs fine.

I think you should also define a host.file.
Our jobs crashed with the same error before.
Now we define relative big mpich2 rings (via mpdboot) per User and start 
several calls ( Jobs) per ring.

Am Freitag, 20. Juni 2008 schrieb Casey,Richard:
> Hello,
>
> This issue appears to have been encountered by many people.  We've searched
> the discussion archives and tried every recommended solution but no luck.
>
> We have MPICH2 v.1.0.7 installed on an Apple G5 cluster (64 CPU's). And
> installed Gromacs v.3.3.3 with --enable-mpi option.
>
> Single CPU jobs run OK; parallel jobs always fail.  For parallel jobs we
> use:
>
> grompp -v -np 2 -p topol.top (or other values for np for more cpu's)
>
> We launch MPD with:
>
> mpdboot -n 2 -f /common/mpich2/mpd.hosts
>
> We run jobs with:
>
> /common/mpich2/bin/mpiexec -l -n 2 \
> /common/gromacs/bin/mdrun_mpi -v -np 2 \
>   -s /Users/richardcasey/topol.tpr \
>   -g /Users/richardcasey/md.log \
>   -e /Users/richardcasey/ener.edr \
>   -o /Users/richardcasey/traj.trr \
>   -x /Users/richardcasey/traj.xtc \
>   -c /Users/richardcasey/confout.gro
>
>
> The output always says:
>
> -------------------------------------------------------
> 1: Program mdrun_mpi, VERSION 3.3.3
> 1: Source code file: init.c, line: 69
> 1:
> 1: Fatal error:
> 1: run input file /Users/richardcasey/topol.tpr was made for 2 nodes,
> 1: p0_29762:  p4_error: : -1
> 1:              while mdrun_mpi expected it to be for 1 nodes.
> 1: -------------------------------------------------------
>
>
> We've tried everything (many variations on the above and recommendations
> from the discussion list) but for some reason mdrun_mpi insists that it use
> a single-cpu version of the topology file.  We've check environment
> variables and they appear to point to the right directories. /common is NFS
> mounted on all nodes.
>
> Completely stumped - no idea what is wrong here.  Any suggestions?
>
>
>
> --------------------------------------------
> Richard Casey
>
> _______________________________________________
> gmx-users mailing list    gmx-users at gromacs.org
> http://www.gromacs.org/mailman/listinfo/gmx-users
> Please search the archive at http://www.gromacs.org/search before posting!
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/mailing_lists/users.php



-- 
Dr. Bernd F. Rupp

Leibniz-Institut für Molekulare Pharmakologie (FMP)
Abt. NMR-unterstützte Strukturforschung
AG   Molecular Modeling/ Drug Design
Robert-Roessle-Str. 10
13125 Berlin
Germany

Tel.    +49/0-30-94793-279
FAX     +49/0-30-94793-169
Web     www.fmp-berlin.info/drug_design.html
E-Mail  rupp at fmp-berlin.de 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20080623/e25e066e/attachment.sig>


More information about the gromacs.org_gmx-users mailing list