[gmx-users] Re: parallel simulation crash on 6 processors

servaas michielssens servaas.michielssens at student.kuleuven.be
Thu Nov 29 17:25:36 CET 2007


> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: http://www.gromacs.org/pipermail/gmx-users/attachments/20071128/e80a1638/attachment-0001.html
> 
> ------------------------------
> 
> Message: 5
> Date: Wed, 28 Nov 2007 14:39:29 +0100
> From: David van der Spoel <spoel at xray.bmc.uu.se>
> Subject: Re: [gmx-users] parallel simulation crash on 6 processors
> To: Discussion list for GROMACS users <gmx-users at gromacs.org>
> Message-ID: <474D6F91.3040203 at xray.bmc.uu.se>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> 
> servaas michielssens wrote:
> > I tried to run a gromacs simulation (gromacs 3.3.1, MD, 18000 atoms) on 
> > 2 systems:
> >  
> > Intel(R) Pentium(R) CPU 2.40GHz with 100Mbit network
> > and
> > AMD Opteron(tm) Processor 250 with 1Gbit network
> > On both systems I had a crash when I tried to run with more then 5 
> > processors. From 1-5 there was no problem.
> >  
> more details please.
> >  

I ran same the simulation on 1,2,3,4 and 5 processors without any
problem, so I there is no problem with the system that I'm using. But
from to moment I tried to use 6 processors of the same cluster the
simulation crashes, this is the error:

Error on node 0, will try to stop all the nodes
Halting parallel program mdrun on CPU 0 out of 6


On AMD:
[0] MPI Abort by user Aborting program !
[0] Aborting program!
    p4_error: latest msg from perror: No such file or directory
p0_3303:  p4_error: : -1
Killed by signal 2.^M
Killed by signal 2.^M
Killed by signal 2.^M
Killed by signal 2.^M
Killed by signal 2.^M
p0_3303: (1.088153) net_send: could not write to fd=4, errno = 32
error while executing run nb 1


On intel:
p4_1781:  p4_error: Timeout in establishing connection to remote
process: 0
rm_l_4_1786: (318.577125) net_send: could not write to fd=5, errno = 32
p4_1781: (318.580132) net_send: could not write to fd=5, errno = 32
p0_26458: (319.239545) net_recv failed for fd = 8
p0_26458:  p4_error: net_recv read, errno = : 104
Killed by signal 2.^M
Killed by signal 2.^M
Killed by signal 2.^M
Killed by signal 2.^M
Killed by signal 2.^M
p0_26458: (325.249810) net_send: could not write to fd=4, errno = 32
error while executing run nb 1



hope this is the information you need,

greets,

servaas

> > kind regards,
> >  
> > servaas michielssens
> > 
> > 
> > ------------------------------------------------------------------------
> > 
> > _______________________________________________
> > gmx-users mailing list    gmx-users at gromacs.org
> > http://www.gromacs.org/mailman/listinfo/gmx-users
> > Please search the archive at http://www.gromacs.org/search before posting!
> > Please don't post (un)subscribe requests to the list. Use the 
> > www interface or send it to gmx-users-request at gromacs.org.
> > Can't post? Read http://www.gromacs.org/mailing_lists/users.php
> 
> 
> -- 
> David.
> ________________________________________________________________________
> David van der Spoel, PhD, Assoc. Prof., Molecular Biophysics group,
> Dept. of Cell and Molecular Biology, Uppsala University.
> Husargatan 3, Box 596,  	75124 Uppsala, Sweden
> phone:	46 18 471 4205		fax: 46 18 511 755
> spoel at xray.bmc.uu.se	spoel at gromacs.org   http://folding.bmc.uu.se
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 





More information about the gromacs.org_gmx-users mailing list