Subject: Re: Re: [gmx-users] Gromacs 4 bug?

Berk Hess gmx3 at hotmail.com
Wed Jan 14 12:27:48 CET 2009


Hi,



We have for now concluded that this is probably an issue related to lam7.1.4.



There were a few other users with mdrun crashes/hangs.

What it the status of your problems?



Berk


> Date: Tue, 13 Jan 2009 13:02:47 +0100
> From: patrick.fuchs at univ-paris-diderot.fr
> To: gmx-users at gromacs.org
> Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
> 
> Hi Berk,
> it hangs after approximatively 45000 steps (the system is a simple DLPC 
> bilayer), and there was a cpt file that has been generated (but it was 
> generated [09:48] before it started to hang [9:58]) :
> ---------
> [fuchs at cumin 2]$ ls -ltrh
> [snip]
> -rw-r--r-- 1 fuchs dsimb 384K janv. 13 09:33 traj.trr
> -rw-r--r-- 1 fuchs dsimb 385K janv. 13 09:48 state.cpt
> -rw-r--r-- 1 fuchs dsimb  66K janv. 13 09:57 md.log
> -rw-r--r-- 1 fuchs dsimb 5,4M janv. 13 09:58 traj.xtc
> -rw-r--r-- 1 fuchs dsimb  92K janv. 13 09:58 ener.edr
> [fuchs at cumin 2]$ date
> Tue Jan 13 10:16:22 CET 2009
> ---------
> The version of MPI is: LAM 7.1.4/MPI 2 C++/ROMIO - Indiana University.
> So shall I send you the tpr and cpt files off list ?
> Ciao,
> 
> Patrick
> 
> Berk Hess a écrit :
> > Hi,
> > 
> > This is strange.
> > You run on 4 nodes and all processes hang at the same MPI call.
> > I see no reason why they should hang if they are all at the correct call.
> > 
> > After how many steps does this happen?
> > If it is not much I can try to see if it also hangs on our system.
> > Otherwise, could you try to generate a checkpoint file with
> > which it hangs quickly?
> > 
> > What version of MPI are you using?
> > 
> > Berk
> > 
> > 
> >  > Date: Tue, 13 Jan 2009 10:53:25 +0100
> >  > From: patrick.fuchs at univ-paris-diderot.fr
> >  > To: gmx-users at gromacs.org
> >  > Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
> >  >
> >  > Hi Berk,
> >  > I did a test on gromacs-4.0.2 under Fedora 10 (with fftw-3.0.1 and
> >  > lam-7.1.4), using a slightly upgraded version of gcc compared to my
> >  > previous post (gcc version 4.3.2 20081105 (Red hat 4.3.2-7)) on the same
> >  > hardware but it still hangs (so both FC9 and FC10 give the same problem,
> >  > while FC8 does not). Finally I could test mdrun_mpi in the debugger and
> >  > here are the results of my tests. You were right, it seems that mdrun
> >  > hangs at an MPI call, here are the outputs of each xterm:
> >  >
> >  > XTERM1
> >  > ===================================================================
> >  > GNU gdb Fedora (6.8-29.fc10)
> >  > Copyright (C) 2008 Free Software Foundation, Inc.
> >  > License GPLv3+: GNU GPL version 3 or later
> >  > <http://gnu.org/licenses/gpl.html>
> >  > This is free software: you are free to change and redistribute it.
> >  > There is NO WARRANTY, to the extent permitted by law. Type "show copying"
> >  > and "show warranty" for details.
> >  > This GDB was configured as "x86_64-redhat-linux-gnu"...
> >  > (gdb) run
> >  > Starting program: /usr/local/gromacs-4.0.2/bin/mdrun_mpi
> >  > [Thread debugging using libthread_db enabled]
> >  > [New Thread 0x12df30 (LWP 8285)]
> >  > NNODES=4, MYRANK=0, HOSTNAME=cumin.dsimb.inserm.fr
> >  > NODEID=0 argc=1
> >  > :-) G R O M A C S (-:
> >  >
> >  > Giant Rising Ordinary Mutants for A Clerical Setup
> >  >
> >  > :-) VERSION 4.0.2 (-:
> >  >
> >  > [snip]
> >  >
> >  > starting mdrun 'Pure DLPC bilayer with 128 lipids and 3655 SPC water'
> >  > 5000000 steps, 10000.0 ps.
> >  > ^C
> >  > Program received signal SIGINT, Interrupt.
> >  > 0x0000003b978cc087 in sched_yield () from /lib64/libc.so.6
> >  > Missing separate debuginfos, use: debuginfo-install
> >  > e2fsprogs-libs-1.41.3-2.fc10.x86_64 glibc-2.9-3.x86_64
> >  > libICE-1.0.4-4.fc10.x86_64 libSM-1.1.0-2.fc10.x86_64
> >  > libX11-1.1.4-6.fc10.x86_64 libXau-1.0.4-1.fc10.x86_64
> >  > libXdmcp-1.0.2-6.fc10.x86_64 libxcb-1.1.91-5.fc10.x86_64
> >  > (gdb) where
> >  > #0 0x0000003b978cc087 in sched_yield () from /lib64/libc.so.6
> >  > #1 0x0000000000770c83 in lam_ssi_rpi_usysv_proc_read_env ()
> >  > #2 0x0000000000784a39 in lam_ssi_rpi_usysv_advance_common ()
> >  > #3 0x000000000074a1e0 in _mpi_req_advance ()
> >  > #4 0x000000000073ced0 in lam_send ()
> >  > #5 0x000000000075328e in MPI_Send ()
> >  > #6 0x000000000074d7ec in MPI_Sendrecv ()
> >  > #7 0x00000000004aebfd in gmx_sum_qgrid_dd ()
> >  > #8 0x00000000004b40bb in gmx_pme_do ()
> >  > #9 0x0000000000479a58 in do_force_lowlevel ()
> >  > #10 0x00000000004d1d32 in do_force ()
> >  > #11 0x00000000004214d2 in do_md ()
> >  > #12 0x000000000041bea0 in mdrunner ()
> >  > #13 0x0000000000422b94 in main ()
> >  > (gdb)
> >  > ===================================================================
> >  >
> >  >
> >  > XTERM2
> >  > ===================================================================
> >  > GNU gdb Fedora (6.8-29.fc10)
> >  > Copyright (C) 2008 Free Software Foundation, Inc.
> >  > License GPLv3+: GNU GPL version 3 or later
> >  > <http://gnu.org/licenses/gpl.html>
> >  > This is free software: you are free to change and redistribute it.
> >  > There is NO WARRANTY, to the extent permitted by law. Type "show copying"
> >  > and "show warranty" for details.
> >  > This GDB was configured as "x86_64-redhat-linux-gnu"...
> >  > (gdb) run
> >  > Starting program: /usr/local/gromacs-4.0.2/bin/mdrun_mpi
> >  > [Thread debugging using libthread_db enabled]
> >  > [New Thread 0x12df30 (LWP 8294)]
> >  > NNODES=4, MYRANK=1, HOSTNAME=cumin.dsimb.inserm.fr
> >  > NODEID=1 argc=1
> >  > ^C
> >  > Program received signal SIGINT, Interrupt.
> >  > 0x0000003b978cc087 in sched_yield () from /lib64/libc.so.6
> >  > Missing separate debuginfos, use: debuginfo-install
> >  > e2fsprogs-libs-1.41.3-2.fc10.x86_64 glibc-2.9-3.x86_64
> >  > libICE-1.0.4-4.fc10.x86_64 libSM-1.1.0-2.fc10.x86_64
> >  > libX11-1.1.4-6.fc10.x86_64 libXau-1.0.4-1.fc10.x86_64
> >  > libXdmcp-1.0.2-6.fc10.x86_64 libxcb-1.1.91-5.fc10.x86_64
> >  > (gdb) where
> >  > #0 0x0000003b978cc087 in sched_yield () from /lib64/libc.so.6
> >  > #1 0x0000000000770c83 in lam_ssi_rpi_usysv_proc_read_env ()
> >  > #2 0x0000000000784a39 in lam_ssi_rpi_usysv_advance_common ()
> >  > #3 0x000000000074a1e0 in _mpi_req_advance ()
> >  > #4 0x000000000073ea90 in MPI_Wait ()
> >  > #5 0x000000000074d800 in MPI_Sendrecv ()
> >  > #6 0x00000000004aed44 in gmx_sum_qgrid_dd ()
> >  > #7 0x00000000004b40bb in gmx_pme_do ()
> >  > #8 0x0000000000479a58 in do_force_lowlevel ()
> >  > #9 0x00000000004d1d32 in do_force ()
> >  > #10 0x00000000004214d2 in do_md ()
> >  > #11 0x000000000041bea0 in mdrunner ()
> >  > #12 0x0000000000422b94 in main ()
> >  > (gdb)
> >  > ===================================================================
> >  >
> >  >
> >  > XTERM3
> >  > ===================================================================
> >  > GNU gdb Fedora (6.8-29.fc10)
> >  > Copyright (C) 2008 Free Software Foundation, Inc.
> >  > License GPLv3+: GNU GPL version 3 or later
> >  > <http://gnu.org/licenses/gpl.html>
> >  > This is free software: you are free to change and redistribute it.
> >  > There is NO WARRANTY, to the extent permitted by law. Type "show copying"
> >  > and "show warranty" for details.
> >  > This GDB was configured as "x86_64-redhat-linux-gnu"...
> >  > (gdb) run
> >  > Starting program: /usr/local/gromacs-4.0.2/bin/mdrun_mpi
> >  > [Thread debugging using libthread_db enabled]
> >  > [New Thread 0x12df30 (LWP 8276)]
> >  > NNODES=4, MYRANK=2, HOSTNAME=cumin.dsimb.inserm.fr
> >  > NODEID=2 argc=1
> >  > ^C
> >  > Program received signal SIGINT, Interrupt.
> >  > 0x0000000000770c70 in lam_ssi_rpi_usysv_proc_read_env ()
> >  > Missing separate debuginfos, use: debuginfo-install
> >  > e2fsprogs-libs-1.41.3-2.fc10.x86_64 glibc-2.9-3.x86_64
> >  > libICE-1.0.4-4.fc10.x86_64 libSM-1.1.0-2.fc10.x86_64
> >  > libX11-1.1.4-6.fc10.x86_64 libXau-1.0.4-1.fc10.x86_64
> >  > libXdmcp-1.0.2-6.fc10.x86_64 libxcb-1.1.91-5.fc10.x86_64
> >  > (gdb) where
> >  > #0 0x0000000000770c70 in lam_ssi_rpi_usysv_proc_read_env ()
> >  > #1 0x0000000000784a39 in lam_ssi_rpi_usysv_advance_common ()
> >  > #2 0x000000000074a1e0 in _mpi_req_advance ()
> >  > #3 0x000000000073ced0 in lam_send ()
> >  > #4 0x000000000075328e in MPI_Send ()
> >  > #5 0x000000000074d7ec in MPI_Sendrecv ()
> >  > #6 0x00000000004aed44 in gmx_sum_qgrid_dd ()
> >  > #7 0x00000000004b40bb in gmx_pme_do ()
> >  > #8 0x0000000000479a58 in do_force_lowlevel ()
> >  > #9 0x00000000004d1d32 in do_force ()
> >  > #10 0x00000000004214d2 in do_md ()
> >  > #11 0x000000000041bea0 in mdrunner ()
> >  > #12 0x0000000000422b94 in main ()
> >  > (gdb)
> >  > ===================================================================
> >  >
> >  >
> >  > XTERM4
> >  > ===================================================================
> >  > GNU gdb Fedora (6.8-29.fc10)
> >  > Copyright (C) 2008 Free Software Foundation, Inc.
> >  > License GPLv3+: GNU GPL version 3 or later
> >  > <http://gnu.org/licenses/gpl.html>
> >  > This is free software: you are free to change and redistribute it.
> >  > There is NO WARRANTY, to the extent permitted by law. Type "show copying"
> >  > and "show warranty" for details.
> >  > This GDB was configured as "x86_64-redhat-linux-gnu"...
> >  > (gdb) run
> >  > Starting program: /usr/local/gromacs-4.0.2/bin/mdrun_mpi
> >  > [Thread debugging using libthread_db enabled]
> >  > [New Thread 0x12df30 (LWP 8267)]
> >  > NNODES=4, MYRANK=3, HOSTNAME=cumin.dsimb.inserm.fr
> >  > NODEID=3 argc=1
> >  > ^C
> >  > Program received signal SIGINT, Interrupt.
> >  > 0x0000000000770c70 in lam_ssi_rpi_usysv_proc_read_env ()
> >  > Missing separate debuginfos, use: debuginfo-install
> >  > e2fsprogs-libs-1.41.3-2.fc10.x86_64 glibc-2.9-3.x86_64
> >  > libICE-1.0.4-4.fc10.x86_64 libSM-1.1.0-2.fc10.x86_64
> >  > libX11-1.1.4-6.fc10.x86_64 libXau-1.0.4-1.fc10.x86_64
> >  > libXdmcp-1.0.2-6.fc10.x86_64 libxcb-1.1.91-5.fc10.x86_64
> >  > (gdb) where
> >  > #0 0x0000000000770c70 in lam_ssi_rpi_usysv_proc_read_env ()
> >  > #1 0x0000000000784a39 in lam_ssi_rpi_usysv_advance_common ()
> >  > #2 0x000000000074a1e0 in _mpi_req_advance ()
> >  > #3 0x000000000073ea90 in MPI_Wait ()
> >  > #4 0x000000000074d800 in MPI_Sendrecv ()
> >  > #5 0x00000000004aebfd in gmx_sum_qgrid_dd ()
> >  > #6 0x00000000004b40bb in gmx_pme_do ()
> >  > #7 0x0000000000479a58 in do_force_lowlevel ()
> >  > #8 0x00000000004d1d32 in do_force ()
> >  > #9 0x00000000004214d2 in do_md ()
> >  > #10 0x000000000041bea0 in mdrunner ()
> >  > #11 0x0000000000422b94 in main ()
> >  > (gdb)
> >  > ===================================================================
> >  >
> >  >
> >  > Cheers,
> >  >
> >  > Patrick
> >  >
> > 
> > 
> > ------------------------------------------------------------------------
> > Express yourself instantly with MSN Messenger! MSN Messenger 
> > <http://clk.atdmt.com/AVE/go/onm00200471ave/direct/01/>
> > 
> > 
> > ------------------------------------------------------------------------
> > 
> > _______________________________________________
> > gmx-users mailing list    gmx-users at gromacs.org
> > http://www.gromacs.org/mailman/listinfo/gmx-users
> > Please search the archive at http://www.gromacs.org/search before posting!
> > Please don't post (un)subscribe requests to the list. Use the 
> > www interface or send it to gmx-users-request at gromacs.org.
> > Can't post? Read http://www.gromacs.org/mailing_lists/users.php
> 
> -- 
> _________________________________________________________________
> !!!! new E-mail address: patrick.fuchs at univ-paris-diderot.fr !!!!
> !!!! new postal address !!!
> Patrick FUCHS
> Equipe de Bioinformatique Genomique et Moleculaire
> INTS, INSERM UMR-S726, Université Paris Diderot,
> 6 rue Alexandre Cabanel, 75015 Paris
> Tel : +33 (0)1-44-49-30-57 - Fax : +33 (0)1-47-34-74-31
> Web Site: http://www.dsimb.inserm.fr/~fuchs
> _______________________________________________
> gmx-users mailing list    gmx-users at gromacs.org
> http://www.gromacs.org/mailman/listinfo/gmx-users
> Please search the archive at http://www.gromacs.org/search before posting!
> Please don't post (un)subscribe requests to the list. Use the 
> www interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/mailing_lists/users.php

_________________________________________________________________
What can you do with the new Windows Live? Find out
http://www.microsoft.com/windows/windowslive/default.aspx
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20090114/cd76141a/attachment.html>


More information about the gromacs.org_gmx-users mailing list