[gmx-users] Running with > 5 NODES
Choon Peng Chng
cpchng at bii-sg.org
Mon Dec 31 06:52:05 CET 2001
Hi,
After getting the parallel version of GROMACS up and running, I went
on to
run the benchmarking codes. Things worked fine with 2 nodes or
using both CPUs in a node on our 8-node dual-PIII 800MHz cluster.
However, when I try to run mdrun with 6 'nodes' (1 process per node on 6
nodes
or 2 processes per node on 3 nodes), the last process fails to complete
when all others have finished.
A timeout error is issued:
(Mnbf/s) (MFlops) (ps/NODE hour) (NODE hour/ns)
Performance: 26.922 844.116 204.545 4.889
gcq#207: "Just Give Me a Blip" (F. Black)
p5_15985: p4_error: Timeout in establishing connection to remote
process: 0
p0_7598: (479.797050) net_recv failed for fd = 14
p4_18538: (479.148076) net_recv failed for fd = 8
p4_18538: p4_error: net_recv read, errno = : 104
p2_24766: p4_error: net_recv read: probable EOF on socket: 1
p0_7598: p4_error: net_recv read, errno = : 104
Connection failed for reason: : Connection refused
........
Connection failed for reason: : Connection refused
/usr/bin/mpirun: line 1: 7598 Broken pipe
/usr/local/gromacs/i686-pc-linux-gnu/bin/mdrun "-s" "topol.tpr" "-o"
"traj.trr" "-c" "confout.gro" "-e" "ener.edr" "-g" "md.log" "-table"
"table.xvg" "-np" "6" -p4pg
/home/cpchng/GROMACS/benchmarking/d.villin/PI7414 -p4wd
/home/cpchng/GROMACS/benchmarking/d.villin
Connection failed for reason: : Connection refused
Command used: (for Villin)
mpirun -np 6 /usr/local/gromacs/i686-pc-linux-gnu/bin/mdrun -s
topol.tpr -o traj.trr -c confout.gro -e ener.edr -g md.log -table
table.xvg -np 6
Anyone can help?
Btw, other MPI programs worked fine, using up to all 16 CPUs on the
system.
Thanks &
regards,
choon peng
--
Choon-Peng Chng
Scientific Programmer
BioInformatics Institute (BII)
30 Medical Drive, Level 1 IMCB Building, Singapore 117609
Tel: (65)8746173
More information about the gromacs.org_gmx-users
mailing list