[gmx-developers] MPI stall?

Roland Schulz roland at utk.edu
Sun Dec 13 01:13:11 CET 2009


Hi,

print_time writes to stderr so if it really get stuck in there I would think
it has to do with wrong stderr redirection. Could you verify that it really
is stuck on the head node by trying to step in the debugger? Also try to
change where the stderr is written to.

Roland

On Sun, Dec 6, 2009 at 10:03 AM, Michael Shirts <michael.shirts at virginia.edu
> wrote:

> Hi, all-
>
> I'm getting a weird MPI stall with the git master repository version.
> I compiled with with debugging on and double precision, running on a 8
> processor MacPro.
> After running for 10 min or so parallelized 8 ways, it appears to
> stall.  Attaching a debugger to the threads to see where it's stuck,
> the backtrace on the head node was (removing arguments for clarity)
>
> #0  0x907fb29a in write$NOCANCEL$UNIX2003 ()
> #1  0x907fb1f2 in _swrite ()
> #2  0x907fb11f in __sflush ()
> #3  0x907ffcfc in __swbuf ()
> #4  0x90838e92 in fputc ()
> #5  0x000c2dfd in print_time (out=0xa00c7690, runtime=0xbfffd5e0,
> step=44600, ir=0x1017e00, cr=0x9004e0) at sim_util.c:164
> #6  0x00019215 in do_md  at md.c:2316
> #7  0x00013138 in mdrunner  at md.c:216
> #9  0x0001b9cc in main (argc=14, argv=0xbffff3a0) at mdrun.c:518
>
> And for the other nodes;
>
> #0  0x907c536a in swtch_pri ()
> #1  0x90832e65 in sched_yield ()
> #2  0x00a05515 in mca_pml_ob1_send ()
> #3  0x00710445 in MPI_Sendrecv ()
> #4  0x00048fe4 in dd_sendrecv_rvec (dd=0x91dc00, ddimind=0,
> direction=1, buf_s=0x1034c00, n_s=333, buf_r=0xd22f38, n_r=360) at
> domdec_network.c:115
> #5  0x00029c32 in dd_move_x (dd=0x91dc00, box=0x9260fc, x=0xd21000) at
> domdec.c:657
> #6  0x000c3f77 in do_force  at sim_util.c:521
> #7  0x00017478 in do_md  at md.c:1794
> #8  0x00013138 in mdrunner at md.c:687
> #9  0x00011cbb in mdrunner_threads  at md.c:216
> #10 0x0001b9cc in main (argc=14, argv=0x9184e0) at mdrun.c:518
>
> Any other observations of this?  Has this been seen on other MacPros?
> With debugging on?
>
> Best,
> ~~~~~~~~~~~~
> Michael Shirts
> Assistant Professor
> Department of Chemical Engineering
> University of Virginia
> michael.shirts at virginia.edu
> (434)-243-1821
> --
> gmx-developers mailing list
> gmx-developers at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-developers
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-developers-request at gromacs.org.
>



-- 
ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
865-241-1537, ORNL PO BOX 2008 MS6309
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20091212/901452c2/attachment.html>


More information about the gromacs.org_gmx-developers mailing list