[gmx-users] Paralell problems...
Jorge Hernandez Fernandez
jorgehf at cbi.cnptia.embrapa.br
Thu Jun 1 21:43:49 CEST 2006
Dear GRMxers:
We need some help in running our Gromacs 3.2.1 parallel environment
in 5 nodes of SunFires V20v (dual-core Opteron 2.2, 4 GB DDR2 RAM), gigabit
eth, mpich-1.2.7, using Sun Grid Engine 6.0.
In a 2 processors job (single node), the results are O.K. but in a 4, 6 or 8
processors (2, 3 or 4 nodes) in the machine log files we have:
In “ -np 4” the error was :
p4_5991: p4_error: Timeout in establishing connection to remote process: 0
p0_8456: p4_error: net_recv read: probable EOF on socket: 1
p5_5996: p4_error: Timeout in establishing connection to remote process: 0
p0_8456: (333.761719) net_send: could not write to fd=4, errno = 32
In “ –np 8” the error was:
p0_16398: p4_error: interrupt SIGSEGV: 11
In all the cases, we obtain a “Killed by signal 2” at the end of the gromacs
log, and process stopped.
Grompp was executed with the -np option, and our final script was:
#$ -v MPIRUN
#$ -v MPICH_PROCESS_GROUP
MPIRUN -np $NSLOTS -machinefile $TMPDIR/machinefile /nfs/gromacs-3.2.
1_paralelo/x86_64-unknown-linux-gnu/bin/mdrun -np 6 -s ABint/ABint_md.tpr -o
ABint/ABint_md.trr -c ABint/ABint_md.gro -e ABint/ABintener.edr -g ABint/
ABintmd.log -nice 0
Any help will be extremely appreciated:
Jorge H.F.
--
== Dr. JORGE HERNANDEZ FERNANDEZ ==
====== Center of Applied Toxinology ======
===== CAT-CEPID - Instituto Butantan =====
====== Ave Vital Brasil, 1500 S.P. ======
Tel: 055 11 3726 7222 r.2042 Fax: 055 11 3721 6605
==== S.B.I.- EMBRAPA - BioInformatica ====
C.p. 6041 Cidade Universitária "Zeferino Vaz"
===== Barão Geraldo Campinas S.P. ========
=========== 13080-970 ====================
Tel: 055 19-37895828 Cell: 055 11-97126104
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
More information about the gromacs.org_gmx-users
mailing list