[gmx-users] MPICH2 and parallel Gromacs errors
Dr. Bernd Rupp
rupp at fmp-berlin.de
Mon Jun 23 17:05:15 CEST 2008
Hello,
here our scripts for running mdrun with mpich2:
mpdboot.sh :
#!/bin/sh
NODES=7
HFILE=./mpd.hosts
# ------------------------------------------
MPI=/opt/MPICH2/bin
$MPI/mpdboot -v -n `expr $NODES + 1` -f $HFILE -r /usr/bin/rsh -m $MPI/mpd
eof
mpd.hosts:
node1
node2
node3
node4
node5
node6
node7
node8
eof
We run jobs with:
mpiexec -l -machinefile host.file -n 8 mdrun_mpi -nice 0 -np 8 \
-s topol.tpr [options] < /dev/null
host.file:
#node1:2
node2:2
node3:2
node4:2
node5:2
#node6:2
#node7:2
#node8:2
eof
format :
# commentline
hostname: number of cores per host
In this case you ask for 8 cores at 4 hosts.
on our machine this solution runs fine.
I think you should also define a host.file.
Our jobs crashed with the same error before.
Now we define relative big mpich2 rings (via mpdboot) per User and start
several calls ( Jobs) per ring.
Am Freitag, 20. Juni 2008 schrieb Casey,Richard:
> Hello,
>
> This issue appears to have been encountered by many people. We've searched
> the discussion archives and tried every recommended solution but no luck.
>
> We have MPICH2 v.1.0.7 installed on an Apple G5 cluster (64 CPU's). And
> installed Gromacs v.3.3.3 with --enable-mpi option.
>
> Single CPU jobs run OK; parallel jobs always fail. For parallel jobs we
> use:
>
> grompp -v -np 2 -p topol.top (or other values for np for more cpu's)
>
> We launch MPD with:
>
> mpdboot -n 2 -f /common/mpich2/mpd.hosts
>
> We run jobs with:
>
> /common/mpich2/bin/mpiexec -l -n 2 \
> /common/gromacs/bin/mdrun_mpi -v -np 2 \
> -s /Users/richardcasey/topol.tpr \
> -g /Users/richardcasey/md.log \
> -e /Users/richardcasey/ener.edr \
> -o /Users/richardcasey/traj.trr \
> -x /Users/richardcasey/traj.xtc \
> -c /Users/richardcasey/confout.gro
>
>
> The output always says:
>
> -------------------------------------------------------
> 1: Program mdrun_mpi, VERSION 3.3.3
> 1: Source code file: init.c, line: 69
> 1:
> 1: Fatal error:
> 1: run input file /Users/richardcasey/topol.tpr was made for 2 nodes,
> 1: p0_29762: p4_error: : -1
> 1: while mdrun_mpi expected it to be for 1 nodes.
> 1: -------------------------------------------------------
>
>
> We've tried everything (many variations on the above and recommendations
> from the discussion list) but for some reason mdrun_mpi insists that it use
> a single-cpu version of the topology file. We've check environment
> variables and they appear to point to the right directories. /common is NFS
> mounted on all nodes.
>
> Completely stumped - no idea what is wrong here. Any suggestions?
>
>
>
> --------------------------------------------
> Richard Casey
>
> _______________________________________________
> gmx-users mailing list gmx-users at gromacs.org
> http://www.gromacs.org/mailman/listinfo/gmx-users
> Please search the archive at http://www.gromacs.org/search before posting!
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/mailing_lists/users.php
--
Dr. Bernd F. Rupp
Leibniz-Institut für Molekulare Pharmakologie (FMP)
Abt. NMR-unterstützte Strukturforschung
AG Molecular Modeling/ Drug Design
Robert-Roessle-Str. 10
13125 Berlin
Germany
Tel. +49/0-30-94793-279
FAX +49/0-30-94793-169
Web www.fmp-berlin.info/drug_design.html
E-Mail rupp at fmp-berlin.de
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20080623/e25e066e/attachment.sig>
More information about the gromacs.org_gmx-users
mailing list