[gmx-users] Re: Gromacs 4 bug?
Bernhard Knapp
bernhard.knapp at meduniwien.ac.at
Fri Jan 30 15:58:36 CET 2009
If someone is interested: Finally our nodes are working fine using
Fedora Core 8, fftw-3.1.3, openmpi-1.3 and gromacs-4.0.3. Thank you very
much for your time and effort!
cheers
Bernhard
>Message: 2
>Date: Thu, 15 Jan 2009 08:37:16 +0100
>From: patrick fuchs <patrick.fuchs at univ-paris-diderot.fr>
>Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
>To: Discussion list for GROMACS users <gmx-users at gromacs.org>
>Message-ID: <496EE7AC.6040008 at univ-paris-diderot.fr>
>Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
>Hi all,
>finally we (Berk and I) could find that there is a problem with
>lam-7.1.4 under Fedora9/Fedora10. Initially I thought it affected only
>gromacs-4 but a PhD student of my lab reported identical problems with
>gromacs-3.3 (hanging problems), while under FC8 I had no problem at all
>with the same hardware. So if you want to run gromacs-4 (or any version)
>under FC9/FC10, the fix I tested and that works is to use openmpi as an
>alternative to lam-7.1.4 (I only tested the last version openmpi-1.2.8).
>I didn't test other versions of lam (7.0.?) but it seems that the
>developers advice to switch to openmpi. So for the two other users
>(Bernhard and Antoine) who reported identical problems to the mailing
>list (see
>http://www.gromacs.org/pipermail/gmx-users/2008-December/038594.html and
>http://www.gromacs.org/pipermail/gmx-users/2008-December/038623.html)
>can you please check out that it works on your hardware using openmpi?
>Hope it helps,
>
>Patrick
>
>Berk Hess a écrit :
>
>
>>Hi,
>>
>>We have for now concluded that this is probably an issue related to
>>lam7.1.4.
>>
>>There were a few other users with mdrun crashes/hangs.
>>What it the status of your problems?
>>
>>Berk
>>
>>
>> > Date: Tue, 13 Jan 2009 13:02:47 +0100
>> > From: patrick.fuchs at univ-paris-diderot.fr
>> > To: gmx-users at gromacs.org
>> > Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
>> >
>> > Hi Berk,
>> > it hangs after approximatively 45000 steps (the system is a simple DLPC
>> > bilayer), and there was a cpt file that has been generated (but it was
>> > generated [09:48] before it started to hang [9:58]) :
>> > ---------
>> > [fuchs at cumin 2]$ ls -ltrh
>> > [snip]
>> > -rw-r--r-- 1 fuchs dsimb 384K janv. 13 09:33 traj.trr
>> > -rw-r--r-- 1 fuchs dsimb 385K janv. 13 09:48 state.cpt
>> > -rw-r--r-- 1 fuchs dsimb 66K janv. 13 09:57 md.log
>> > -rw-r--r-- 1 fuchs dsimb 5,4M janv. 13 09:58 traj.xtc
>> > -rw-r--r-- 1 fuchs dsimb 92K janv. 13 09:58 ener.edr
>> > [fuchs at cumin 2]$ date
>> > Tue Jan 13 10:16:22 CET 2009
>> > ---------
>> > The version of MPI is: LAM 7.1.4/MPI 2 C++/ROMIO - Indiana University.
>> > So shall I send you the tpr and cpt files off list ?
>> > Ciao,
>> >
>> > Patrick
>> >
>> > Berk Hess a écrit :
>> > > Hi,
>> > >
>> > > This is strange.
>> > > You run on 4 nodes and all processes hang at the same MPI call.
>> > > I see no reason why they should hang if they are all at the correct
>>call.
>> > >
>> > > After how many steps does this happen?
>> > > If it is not much I can try to see if it also hangs on our system.
>> > > Otherwise, could you try to generate a checkpoint file with
>> > > which it hangs quickly?
>> > >
>> > > What version of MPI are you using?
>> > >
>> > > Berk
>> > >
>> > >
>> > > > Date: Tue, 13 Jan 2009 10:53:25 +0100
>> > > > From: patrick.fuchs at univ-paris-diderot.fr
>> > > > To: gmx-users at gromacs.org
>> > > > Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
>> > > >
>> > > > Hi Berk,
>> > > > I did a test on gromacs-4.0.2 under Fedora 10 (with fftw-3.0.1 and
>> > > > lam-7.1.4), using a slightly upgraded version of gcc compared to my
>> > > > previous post (gcc version 4.3.2 20081105 (Red hat 4.3.2-7)) on
>>the same
>> > > > hardware but it still hangs (so both FC9 and FC10 give the same
>>problem,
>> > > > while FC8 does not). Finally I could test mdrun_mpi in the
>>debugger and
>> > > > here are the results of my tests. You were right, it seems that mdrun
>> > > > hangs at an MPI call, here are the outputs of each xterm:
>> > > >
>> > > > XTERM1
>> > > > ===================================================================
>> > > > GNU gdb Fedora (6.8-29.fc10)
>> > > > Copyright (C) 2008 Free Software Foundation, Inc.
>> > > > License GPLv3+: GNU GPL version 3 or later
>> > > > <http://gnu.org/licenses/gpl.html>
>> > > > This is free software: you are free to change and redistribute it.
>> > > > There is NO WARRANTY, to the extent permitted by law. Type "show
>>copying"
>> > > > and "show warranty" for details.
>> > > > This GDB was configured as "x86_64-redhat-linux-gnu"...
>> > > > (gdb) run
>> > > > Starting program: /usr/local/gromacs-4.0.2/bin/mdrun_mpi
>> > > > [Thread debugging using libthread_db enabled]
>> > > > [New Thread 0x12df30 (LWP 8285)]
>> > > > NNODES=4, MYRANK=0, HOSTNAME=cumin.dsimb.inserm.fr
>> > > > NODEID=0 argc=1
>> > > > :-) G R O M A C S (-:
>> > > >
>> > > > Giant Rising Ordinary Mutants for A Clerical Setup
>> > > >
>> > > > :-) VERSION 4.0.2 (-:
>> > > >
>> > > > [snip]
>> > > >
>> > > > starting mdrun 'Pure DLPC bilayer with 128 lipids and 3655 SPC water'
>> > > > 5000000 steps, 10000.0 ps.
>> > > > ^C
>> > > > Program received signal SIGINT, Interrupt.
>> > > > 0x0000003b978cc087 in sched_yield () from /lib64/libc.so.6
>> > > > Missing separate debuginfos, use: debuginfo-install
>> > > > e2fsprogs-libs-1.41.3-2.fc10.x86_64 glibc-2.9-3.x86_64
>> > > > libICE-1.0.4-4.fc10.x86_64 libSM-1.1.0-2.fc10.x86_64
>> > > > libX11-1.1.4-6.fc10.x86_64 libXau-1.0.4-1.fc10.x86_64
>> > > > libXdmcp-1.0.2-6.fc10.x86_64 libxcb-1.1.91-5.fc10.x86_64
>> > > > (gdb) where
>> > > > #0 0x0000003b978cc087 in sched_yield () from /lib64/libc.so.6
>> > > > #1 0x0000000000770c83 in lam_ssi_rpi_usysv_proc_read_env ()
>> > > > #2 0x0000000000784a39 in lam_ssi_rpi_usysv_advance_common ()
>> > > > #3 0x000000000074a1e0 in _mpi_req_advance ()
>> > > > #4 0x000000000073ced0 in lam_send ()
>> > > > #5 0x000000000075328e in MPI_Send ()
>> > > > #6 0x000000000074d7ec in MPI_Sendrecv ()
>> > > > #7 0x00000000004aebfd in gmx_sum_qgrid_dd ()
>> > > > #8 0x00000000004b40bb in gmx_pme_do ()
>> > > > #9 0x0000000000479a58 in do_force_lowlevel ()
>> > > > #10 0x00000000004d1d32 in do_force ()
>> > > > #11 0x00000000004214d2 in do_md ()
>> > > > #12 0x000000000041bea0 in mdrunner ()
>> > > > #13 0x0000000000422b94 in main ()
>> > > > (gdb)
>> > > > ===================================================================
>> > > >
>> > > >
>> > > > XTERM2
>> > > > ===================================================================
>> > > > GNU gdb Fedora (6.8-29.fc10)
>> > > > Copyright (C) 2008 Free Software Foundation, Inc.
>> > > > License GPLv3+: GNU GPL version 3 or later
>> > > > <http://gnu.org/licenses/gpl.html>
>> > > > This is free software: you are free to change and redistribute it.
>> > > > There is NO WARRANTY, to the extent permitted by law. Type "show
>>copying"
>> > > > and "show warranty" for details.
>> > > > This GDB was configured as "x86_64-redhat-linux-gnu"...
>> > > > (gdb) run
>> > > > Starting program: /usr/local/gromacs-4.0.2/bin/mdrun_mpi
>> > > > [Thread debugging using libthread_db enabled]
>> > > > [New Thread 0x12df30 (LWP 8294)]
>> > > > NNODES=4, MYRANK=1, HOSTNAME=cumin.dsimb.inserm.fr
>> > > > NODEID=1 argc=1
>> > > > ^C
>> > > > Program received signal SIGINT, Interrupt.
>> > > > 0x0000003b978cc087 in sched_yield () from /lib64/libc.so.6
>> > > > Missing separate debuginfos, use: debuginfo-install
>> > > > e2fsprogs-libs-1.41.3-2.fc10.x86_64 glibc-2.9-3.x86_64
>> > > > libICE-1.0.4-4.fc10.x86_64 libSM-1.1.0-2.fc10.x86_64
>> > > > libX11-1.1.4-6.fc10.x86_64 libXau-1.0.4-1.fc10.x86_64
>> > > > libXdmcp-1.0.2-6.fc10.x86_64 libxcb-1.1.91-5.fc10.x86_64
>> > > > (gdb) where
>> > > > #0 0x0000003b978cc087 in sched_yield () from /lib64/libc.so.6
>> > > > #1 0x0000000000770c83 in lam_ssi_rpi_usysv_proc_read_env ()
>> > > > #2 0x0000000000784a39 in lam_ssi_rpi_usysv_advance_common ()
>> > > > #3 0x000000000074a1e0 in _mpi_req_advance ()
>> > > > #4 0x000000000073ea90 in MPI_Wait ()
>> > > > #5 0x000000000074d800 in MPI_Sendrecv ()
>> > > > #6 0x00000000004aed44 in gmx_sum_qgrid_dd ()
>> > > > #7 0x00000000004b40bb in gmx_pme_do ()
>> > > > #8 0x0000000000479a58 in do_force_lowlevel ()
>> > > > #9 0x00000000004d1d32 in do_force ()
>> > > > #10 0x00000000004214d2 in do_md ()
>> > > > #11 0x000000000041bea0 in mdrunner ()
>> > > > #12 0x0000000000422b94 in main ()
>> > > > (gdb)
>> > > > ===================================================================
>> > > >
>> > > >
>> > > > XTERM3
>> > > > ===================================================================
>> > > > GNU gdb Fedora (6.8-29.fc10)
>> > > > Copyright (C) 2008 Free Software Foundation, Inc.
>> > > > License GPLv3+: GNU GPL version 3 or later
>> > > > <http://gnu.org/licenses/gpl.html>
>> > > > This is free software: you are free to change and redistribute it.
>> > > > There is NO WARRANTY, to the extent permitted by law. Type "show
>>copying"
>> > > > and "show warranty" for details.
>> > > > This GDB was configured as "x86_64-redhat-linux-gnu"...
>> > > > (gdb) run
>> > > > Starting program: /usr/local/gromacs-4.0.2/bin/mdrun_mpi
>> > > > [Thread debugging using libthread_db enabled]
>> > > > [New Thread 0x12df30 (LWP 8276)]
>> > > > NNODES=4, MYRANK=2, HOSTNAME=cumin.dsimb.inserm.fr
>> > > > NODEID=2 argc=1
>> > > > ^C
>> > > > Program received signal SIGINT, Interrupt.
>> > > > 0x0000000000770c70 in lam_ssi_rpi_usysv_proc_read_env ()
>> > > > Missing separate debuginfos, use: debuginfo-install
>> > > > e2fsprogs-libs-1.41.3-2.fc10.x86_64 glibc-2.9-3.x86_64
>> > > > libICE-1.0.4-4.fc10.x86_64 libSM-1.1.0-2.fc10.x86_64
>> > > > libX11-1.1.4-6.fc10.x86_64 libXau-1.0.4-1.fc10.x86_64
>> > > > libXdmcp-1.0.2-6.fc10.x86_64 libxcb-1.1.91-5.fc10.x86_64
>> > > > (gdb) where
>> > > > #0 0x0000000000770c70 in lam_ssi_rpi_usysv_proc_read_env ()
>> > > > #1 0x0000000000784a39 in lam_ssi_rpi_usysv_advance_common ()
>> > > > #2 0x000000000074a1e0 in _mpi_req_advance ()
>> > > > #3 0x000000000073ced0 in lam_send ()
>> > > > #4 0x000000000075328e in MPI_Send ()
>> > > > #5 0x000000000074d7ec in MPI_Sendrecv ()
>> > > > #6 0x00000000004aed44 in gmx_sum_qgrid_dd ()
>> > > > #7 0x00000000004b40bb in gmx_pme_do ()
>> > > > #8 0x0000000000479a58 in do_force_lowlevel ()
>> > > > #9 0x00000000004d1d32 in do_force ()
>> > > > #10 0x00000000004214d2 in do_md ()
>> > > > #11 0x000000000041bea0 in mdrunner ()
>> > > > #12 0x0000000000422b94 in main ()
>> > > > (gdb)
>> > > > ===================================================================
>> > > >
>> > > >
>> > > > XTERM4
>> > > > ===================================================================
>> > > > GNU gdb Fedora (6.8-29.fc10)
>> > > > Copyright (C) 2008 Free Software Foundation, Inc.
>> > > > License GPLv3+: GNU GPL version 3 or later
>> > > > <http://gnu.org/licenses/gpl.html>
>> > > > This is free software: you are free to change and redistribute it.
>> > > > There is NO WARRANTY, to the extent permitted by law. Type "show
>>copying"
>> > > > and "show warranty" for details.
>> > > > This GDB was configured as "x86_64-redhat-linux-gnu"...
>> > > > (gdb) run
>> > > > Starting program: /usr/local/gromacs-4.0.2/bin/mdrun_mpi
>> > > > [Thread debugging using libthread_db enabled]
>> > > > [New Thread 0x12df30 (LWP 8267)]
>> > > > NNODES=4, MYRANK=3, HOSTNAME=cumin.dsimb.inserm.fr
>> > > > NODEID=3 argc=1
>> > > > ^C
>> > > > Program received signal SIGINT, Interrupt.
>> > > > 0x0000000000770c70 in lam_ssi_rpi_usysv_proc_read_env ()
>> > > > Missing separate debuginfos, use: debuginfo-install
>> > > > e2fsprogs-libs-1.41.3-2.fc10.x86_64 glibc-2.9-3.x86_64
>> > > > libICE-1.0.4-4.fc10.x86_64 libSM-1.1.0-2.fc10.x86_64
>> > > > libX11-1.1.4-6.fc10.x86_64 libXau-1.0.4-1.fc10.x86_64
>> > > > libXdmcp-1.0.2-6.fc10.x86_64 libxcb-1.1.91-5.fc10.x86_64
>> > > > (gdb) where
>> > > > #0 0x0000000000770c70 in lam_ssi_rpi_usysv_proc_read_env ()
>> > > > #1 0x0000000000784a39 in lam_ssi_rpi_usysv_advance_common ()
>> > > > #2 0x000000000074a1e0 in _mpi_req_advance ()
>> > > > #3 0x000000000073ea90 in MPI_Wait ()
>> > > > #4 0x000000000074d800 in MPI_Sendrecv ()
>> > > > #5 0x00000000004aebfd in gmx_sum_qgrid_dd ()
>> > > > #6 0x00000000004b40bb in gmx_pme_do ()
>> > > > #7 0x0000000000479a58 in do_force_lowlevel ()
>> > > > #8 0x00000000004d1d32 in do_force ()
>> > > > #9 0x00000000004214d2 in do_md ()
>> > > > #10 0x000000000041bea0 in mdrunner ()
>> > > > #11 0x0000000000422b94 in main ()
>> > > > (gdb)
>> > > > ===================================================================
>> > > >
>> > > >
>> > > > Cheers,
>> > > >
>> > > > Patrick
>> > > >
>> > >
>> > >
>> > >
>>------------------------------------------------------------------------
>> > > Express yourself instantly with MSN Messenger! MSN Messenger
>> > > <http://clk.atdmt.com/AVE/go/onm00200471ave/direct/01/>
>> > >
>> > >
>> > >
>>------------------------------------------------------------------------
>> > >
>> > > _______________________________________________
>> > > gmx-users mailing list gmx-users at gromacs.org
>> > > http://www.gromacs.org/mailman/listinfo/gmx-users
>> > > Please search the archive at http://www.gromacs.org/search before
>>posting!
>> > > Please don't post (un)subscribe requests to the list. Use the
>> > > www interface or send it to gmx-users-request at gromacs.org.
>> > > Can't post? Read http://www.gromacs.org/mailing_lists/users.php
>> >
>> > --
>> > _________________________________________________________________
>> > !!!! new E-mail address: patrick.fuchs at univ-paris-diderot.fr !!!!
>> > !!!! new postal address !!!
>> > Patrick FUCHS
>> > Equipe de Bioinformatique Genomique et Moleculaire
>> > INTS, INSERM UMR-S726, Université Paris Diderot,
>> > 6 rue Alexandre Cabanel, 75015 Paris
>> > Tel : +33 (0)1-44-49-30-57 - Fax : +33 (0)1-47-34-74-31
>> > Web Site: http://www.dsimb.inserm.fr/~fuchs
>> > _______________________________________________
>> > gmx-users mailing list gmx-users at gromacs.org
>> > http://www.gromacs.org/mailman/listinfo/gmx-users
>> > Please search the archive at http://www.gromacs.org/search before
>>posting!
>> > Please don't post (un)subscribe requests to the list. Use the
>> > www interface or send it to gmx-users-request at gromacs.org.
>> > Can't post? Read http://www.gromacs.org/mailing_lists/users.php
>>
>>------------------------------------------------------------------------
>>What can you do with the new Windows Live? Find out
>><http://www.microsoft.com/windows/windowslive/default.aspx>
>>
>>
>>------------------------------------------------------------------------
>>
>>_______________________________________________
>>gmx-users mailing list gmx-users at gromacs.org
>>http://www.gromacs.org/mailman/listinfo/gmx-users
>>Please search the archive at http://www.gromacs.org/search before posting!
>>Please don't post (un)subscribe requests to the list. Use the
>>www interface or send it to gmx-users-request at gromacs.org.
>>Can't post? Read http://www.gromacs.org/mailing_lists/users.php
>>
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20090130/0ee11770/attachment.html>
More information about the gromacs.org_gmx-users
mailing list