[gmx-users] Running with > 5 NODES

Choon Peng Chng cpchng at bii-sg.org
Mon Dec 31 06:52:05 CET 2001


Hi,

  After getting the parallel version of GROMACS up and running, I went
on to
run the benchmarking codes. Things worked fine with 2 nodes or 
using both CPUs in a node on our 8-node dual-PIII 800MHz cluster.
However, when I try to run mdrun with 6 'nodes' (1 process per node on 6
nodes 
or 2 processes per node on 3 nodes), the last process fails to complete
when all others have finished.

A timeout error is issued:

               (Mnbf/s)   (MFlops) (ps/NODE hour) (NODE hour/ns)
Performance:     26.922    844.116    204.545      4.889

gcq#207: "Just Give Me a Blip" (F. Black)

p5_15985:  p4_error: Timeout in establishing connection to remote
process: 0
p0_7598: (479.797050) net_recv failed for fd = 14
p4_18538: (479.148076) net_recv failed for fd = 8
p4_18538:  p4_error: net_recv read, errno = : 104
p2_24766:  p4_error: net_recv read:  probable EOF on socket: 1
p0_7598:  p4_error: net_recv read, errno = : 104
Connection failed for reason: : Connection refused
........
Connection failed for reason: : Connection refused
/usr/bin/mpirun: line 1:  7598 Broken pipe            
/usr/local/gromacs/i686-pc-linux-gnu/bin/mdrun "-s" "topol.tpr" "-o"
"traj.trr" "-c" "confout.gro" "-e" "ener.edr" "-g" "md.log" "-table"
"table.xvg" "-np" "6" -p4pg
/home/cpchng/GROMACS/benchmarking/d.villin/PI7414 -p4wd
/home/cpchng/GROMACS/benchmarking/d.villin
Connection failed for reason: : Connection refused

Command used: (for Villin)
mpirun -np 6 /usr/local/gromacs/i686-pc-linux-gnu/bin/mdrun -s
topol.tpr  -o traj.trr -c confout.gro -e ener.edr -g md.log -table
table.xvg -np 6

Anyone can help?

Btw, other MPI programs worked fine, using up to all 16 CPUs on the
system.

Thanks &
regards,
choon peng

-- 
Choon-Peng Chng
Scientific Programmer
BioInformatics Institute (BII)
30 Medical Drive, Level 1 IMCB Building, Singapore 117609
Tel: (65)8746173



More information about the gromacs.org_gmx-users mailing list