Subject: Re: Re: [gmx-users] Gromacs 4 bug?

patrick fuchs patrick.fuchs at univ-paris-diderot.fr
Tue Jan 13 10:53:25 CET 2009


Hi Berk,
I did a test on gromacs-4.0.2 under Fedora 10 (with fftw-3.0.1 and 
lam-7.1.4), using a slightly upgraded version of gcc compared to my 
previous post (gcc version 4.3.2 20081105 (Red hat 4.3.2-7)) on the same 
hardware but it still hangs (so both FC9 and FC10 give the same problem, 
while FC8 does not). Finally I could test mdrun_mpi in the debugger and 
here are the results of my tests. You were right, it seems that mdrun 
hangs at an MPI call, here are the outputs of each xterm:

XTERM1
===================================================================
GNU gdb Fedora (6.8-29.fc10)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...
(gdb) run
Starting program: /usr/local/gromacs-4.0.2/bin/mdrun_mpi
[Thread debugging using libthread_db enabled]
[New Thread 0x12df30 (LWP 8285)]
NNODES=4, MYRANK=0, HOSTNAME=cumin.dsimb.inserm.fr
NODEID=0 argc=1
                          :-)  G  R  O  M  A  C  S  (-:

                Giant Rising Ordinary Mutants for A Clerical Setup

                             :-)  VERSION 4.0.2  (-:

[snip]

starting mdrun 'Pure DLPC bilayer with 128 lipids and 3655 SPC water'
5000000 steps,  10000.0 ps.
^C
Program received signal SIGINT, Interrupt.
0x0000003b978cc087 in sched_yield () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install 
e2fsprogs-libs-1.41.3-2.fc10.x86_64 glibc-2.9-3.x86_64 
libICE-1.0.4-4.fc10.x86_64 libSM-1.1.0-2.fc10.x86_64 
libX11-1.1.4-6.fc10.x86_64 libXau-1.0.4-1.fc10.x86_64 
libXdmcp-1.0.2-6.fc10.x86_64 libxcb-1.1.91-5.fc10.x86_64
(gdb) where
#0  0x0000003b978cc087 in sched_yield () from /lib64/libc.so.6
#1  0x0000000000770c83 in lam_ssi_rpi_usysv_proc_read_env ()
#2  0x0000000000784a39 in lam_ssi_rpi_usysv_advance_common ()
#3  0x000000000074a1e0 in _mpi_req_advance ()
#4  0x000000000073ced0 in lam_send ()
#5  0x000000000075328e in MPI_Send ()
#6  0x000000000074d7ec in MPI_Sendrecv ()
#7  0x00000000004aebfd in gmx_sum_qgrid_dd ()
#8  0x00000000004b40bb in gmx_pme_do ()
#9  0x0000000000479a58 in do_force_lowlevel ()
#10 0x00000000004d1d32 in do_force ()
#11 0x00000000004214d2 in do_md ()
#12 0x000000000041bea0 in mdrunner ()
#13 0x0000000000422b94 in main ()
(gdb)
===================================================================


XTERM2
===================================================================
GNU gdb Fedora (6.8-29.fc10)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...
(gdb) run
Starting program: /usr/local/gromacs-4.0.2/bin/mdrun_mpi
[Thread debugging using libthread_db enabled]
[New Thread 0x12df30 (LWP 8294)]
NNODES=4, MYRANK=1, HOSTNAME=cumin.dsimb.inserm.fr
NODEID=1 argc=1
^C
Program received signal SIGINT, Interrupt.
0x0000003b978cc087 in sched_yield () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install 
e2fsprogs-libs-1.41.3-2.fc10.x86_64 glibc-2.9-3.x86_64 
libICE-1.0.4-4.fc10.x86_64 libSM-1.1.0-2.fc10.x86_64 
libX11-1.1.4-6.fc10.x86_64 libXau-1.0.4-1.fc10.x86_64 
libXdmcp-1.0.2-6.fc10.x86_64 libxcb-1.1.91-5.fc10.x86_64
(gdb) where
#0  0x0000003b978cc087 in sched_yield () from /lib64/libc.so.6
#1  0x0000000000770c83 in lam_ssi_rpi_usysv_proc_read_env ()
#2  0x0000000000784a39 in lam_ssi_rpi_usysv_advance_common ()
#3  0x000000000074a1e0 in _mpi_req_advance ()
#4  0x000000000073ea90 in MPI_Wait ()
#5  0x000000000074d800 in MPI_Sendrecv ()
#6  0x00000000004aed44 in gmx_sum_qgrid_dd ()
#7  0x00000000004b40bb in gmx_pme_do ()
#8  0x0000000000479a58 in do_force_lowlevel ()
#9  0x00000000004d1d32 in do_force ()
#10 0x00000000004214d2 in do_md ()
#11 0x000000000041bea0 in mdrunner ()
#12 0x0000000000422b94 in main ()
(gdb)
===================================================================


XTERM3
===================================================================
GNU gdb Fedora (6.8-29.fc10)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...
(gdb) run
Starting program: /usr/local/gromacs-4.0.2/bin/mdrun_mpi
[Thread debugging using libthread_db enabled]
[New Thread 0x12df30 (LWP 8276)]
NNODES=4, MYRANK=2, HOSTNAME=cumin.dsimb.inserm.fr
NODEID=2 argc=1
^C
Program received signal SIGINT, Interrupt.
0x0000000000770c70 in lam_ssi_rpi_usysv_proc_read_env ()
Missing separate debuginfos, use: debuginfo-install 
e2fsprogs-libs-1.41.3-2.fc10.x86_64 glibc-2.9-3.x86_64 
libICE-1.0.4-4.fc10.x86_64 libSM-1.1.0-2.fc10.x86_64 
libX11-1.1.4-6.fc10.x86_64 libXau-1.0.4-1.fc10.x86_64 
libXdmcp-1.0.2-6.fc10.x86_64 libxcb-1.1.91-5.fc10.x86_64
(gdb) where
#0  0x0000000000770c70 in lam_ssi_rpi_usysv_proc_read_env ()
#1  0x0000000000784a39 in lam_ssi_rpi_usysv_advance_common ()
#2  0x000000000074a1e0 in _mpi_req_advance ()
#3  0x000000000073ced0 in lam_send ()
#4  0x000000000075328e in MPI_Send ()
#5  0x000000000074d7ec in MPI_Sendrecv ()
#6  0x00000000004aed44 in gmx_sum_qgrid_dd ()
#7  0x00000000004b40bb in gmx_pme_do ()
#8  0x0000000000479a58 in do_force_lowlevel ()
#9  0x00000000004d1d32 in do_force ()
#10 0x00000000004214d2 in do_md ()
#11 0x000000000041bea0 in mdrunner ()
#12 0x0000000000422b94 in main ()
(gdb)
===================================================================


XTERM4
===================================================================
GNU gdb Fedora (6.8-29.fc10)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...
(gdb) run
Starting program: /usr/local/gromacs-4.0.2/bin/mdrun_mpi
[Thread debugging using libthread_db enabled]
[New Thread 0x12df30 (LWP 8267)]
NNODES=4, MYRANK=3, HOSTNAME=cumin.dsimb.inserm.fr
NODEID=3 argc=1
^C
Program received signal SIGINT, Interrupt.
0x0000000000770c70 in lam_ssi_rpi_usysv_proc_read_env ()
Missing separate debuginfos, use: debuginfo-install 
e2fsprogs-libs-1.41.3-2.fc10.x86_64 glibc-2.9-3.x86_64 
libICE-1.0.4-4.fc10.x86_64 libSM-1.1.0-2.fc10.x86_64 
libX11-1.1.4-6.fc10.x86_64 libXau-1.0.4-1.fc10.x86_64 
libXdmcp-1.0.2-6.fc10.x86_64 libxcb-1.1.91-5.fc10.x86_64
(gdb) where
#0  0x0000000000770c70 in lam_ssi_rpi_usysv_proc_read_env ()
#1  0x0000000000784a39 in lam_ssi_rpi_usysv_advance_common ()
#2  0x000000000074a1e0 in _mpi_req_advance ()
#3  0x000000000073ea90 in MPI_Wait ()
#4  0x000000000074d800 in MPI_Sendrecv ()
#5  0x00000000004aebfd in gmx_sum_qgrid_dd ()
#6  0x00000000004b40bb in gmx_pme_do ()
#7  0x0000000000479a58 in do_force_lowlevel ()
#8  0x00000000004d1d32 in do_force ()
#9  0x00000000004214d2 in do_md ()
#10 0x000000000041bea0 in mdrunner ()
#11 0x0000000000422b94 in main ()
(gdb)
===================================================================


Cheers,

Patrick

Berk Hess a écrit :
> Hi,
> 
> You can do something like:
>  mpirun -np 4 xterm -e gdb ~/check_gmx/obj/g_x86_64/src/kernel/mdrun
> 
> with the appropriate settings for your system.
> 
> You will have to type run in every xterm to make mdrun run.
> Or you can make some scripts
> (gdb -x gdb_cmds will read the gdb commands from the file gdb_cmds).
> 
> When you think it hangs, type ctrl-c in an xterm
> and type where to see where it hangs.
> I would guess this would be in an MPI call.
> 
> Berk
> 
> 
>  > Date: Mon, 15 Dec 2008 23:53:45 +0100
>  > From: patrick.fuchs at univ-paris-diderot.fr
>  > To: gmx-users at gromacs.org
>  > Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
>  >
>  > Hi Berk,
>  > I used gcc version 4.3.0 20080428 (Red Hat 4.3.0-8) (GCC).
>  > I recompiled it with CFLAGS=-g and it still hangs...
>  > Now, how can we run it in the debugger ?
>  > Thanks,
>  >
>  > Patrick
>  >
>  > Berk Hess a écrit :
>  > > Hi,
>  > >
>  > > What compiler (and compiler version) are you using?
>  > >
>  > > Could you configure with CFLAGS=-g
>  > > and see if it still hangs?
>  > > If it also hangs in that case, we can run it in the debugger
>  > > and find out where it hangs.
>  > >
>  > > Berk
>  > >
>  > > > Date: Mon, 15 Dec 2008 16:32:31 +0100
>  > > > From: patrick.fuchs at univ-paris-diderot.fr
>  > > > To: gmx-users at gromacs.org
>  > > > Subject: Re: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
>  > > >
>  > > > Hi,
>  > > > I have exactly the same problem under Fedora 9 on a dual-quadricore
>  > > > (Intel Xeon E5430, 2.66 GHz) computer. Gromacs-4.0.2 is hanging (same
>  > > > for gromacs-4.0.0) after a couple of minutes of simulation. 
> Sometimes,
>  > > > it even hangs very quickly before the simulation reaches the 
> writing of
>  > > > the first checkpoint file (in fact the time length before the hang
>  > > > occurs is chaotic, sometimes a couple of minutes, or a few 
> seconds). The
>  > > > CPUs are still loaded but nothing goes to the output (on any file 
> log,
>  > > > xtc, trr, edr...). All gromacs binaries were standardly compiled with
>  > > > --enable-mpi and the latest lam-7.1.4. As Bernhard and Antoine I 
> don't
>  > > > see anything strange in the log file.
>  > > > I have another computer single quadricore (Intel Xeon E5430, 2.66 
> GHz)
>  > > > under Fedora 8 and the same system (same mdp, topology etc...) is
>  > > > running fine with gromacs-4.0.2 (compiled with lam-7.1.4 as well). So
>  > > > would it be possible that there's something wrong going on with 
> FC9 and
>  > > > lam-7.1.4...?
>  > > > Cheers,
>  > > >
>  > > > Patrick
>  > > >
>  > > > Berk Hess a écrit :
>  > > > > Hi,
>  > > > >
>  > > > > If your simulations no longer produce output, but still run
>  > > > > and there is no error or warning message,
>  > > > > my guess would be that they are waiting for MPI communication.
>  > > > > But the developers any many users are using 4.0 and I have
>  > > > > not heard from problems like this, so I wonder if the problem
>  > > > > could be somewhere else.
>  > > > >
>  > > > > Could you (or have your tried to) continue your simulation
>  > > > > from the last checkpoint (mdrun option -cpi) before the hang,
>  > > > > to see if it crashes quickly then?
>  > > > >
>  > > > > Berk
>  > > > >
>  > > > > > Date: Fri, 12 Dec 2008 13:42:43 +0100
>  > > > > > From: bernhard.knapp at meduniwien.ac.at
>  > > > > > To: gmx-users at gromacs.org
>  > > > > > Subject: Subject: Re: Re: [gmx-users] Gromacs 4 bug?
>  > > > > >
>  > > > > > Mark wrote:
>  > > > > >
>  > > > > > > What's happening in the log files? What's the latest
>  > > information in
>  > > > > the
>  > > > > > > checkpoint files? Could there be some issue with file system
>  > > > > availability?
>  > > > > >
>  > > > > > Hi Mark
>  > > > > >
>  > > > > > Unfortunaltey I already deleted the simulation files which 
> got stuck
>  > > > > > after 847ps. But here is the output of another simulation 
> done on the
>  > > > > > same system but with an other pdb file. This one gets stuck 
> after
>  > > 179ps
>  > > > > > with the following output:
>  > > > > >
>  > > > > > The latest thing the checkpoint file says is:
>  > > > > >
>  > > > > > "imb F 3% step 89700, will finish Wed Jul 1 09:11:00 2009
>  > > > > > imb F 3% step 89800, will finish Wed Jul 1 09:02:51 2009"
>  > > > > >
>  > > > > > The predcition for 1st of July is not surprising since I am 
> always
>  > > > > > parameterizing the simulation with 200ns to avoid to restart 
> it if
>  > > > > > something interesting happens in the last frames.
>  > > > > >
>  > > > > > for the .log file it is:
>  > > > > >
>  > > > > > "Writing checkpoint, step 88000 at Thu Dec 11 16:34:31 2008
>  > > > > >
>  > > > > > Energies (kJ/mol)
>  > > > > > G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14
>  > > > > > 7.83753e+03 3.64068e+03 2.45951e+03 1.29167e+03 5.13688e+04
>  > > > > > LJ (SR) Coulomb (SR) Coul. recip. Potential Kinetic En.
>  > > > > > 3.82346e+05 -2.48883e+06 -3.51313e+05 -2.39119e+06 4.57648e+05
>  > > > > > Total Energy Temperature Pressure (bar) Cons. rmsd ()
>  > > > > > -1.93355e+06 3.10014e+02 1.09267e-01 2.14030e-05
>  > > > > >
>  > > > > > DD step 88999 load imb.: force 3.1%
>  > > > > >
>  > > > > > Step Time Lambda
>  > > > > > 89000 178.00002 0.00000
>  > > > > >
>  > > > > > Energies (kJ/mol)
>  > > > > > G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14
>  > > > > > 8.03089e+03 3.59681e+03 2.42628e+03 1.20942e+03 5.12341e+04
>  > > > > > LJ (SR) Coulomb (SR) Coul. recip. Potential Kinetic En.
>  > > > > > 3.81539e+05 -2.48602e+06 -3.51307e+05 -2.38929e+06 4.56901e+05
>  > > > > > Total Energy Temperature Pressure (bar) Cons. rmsd ()
>  > > > > > -1.93239e+06 3.09508e+02 1.64627e+01 2.08518e-05"
>  > > > > >
>  > > > > >
>  > > > > > the disk is also free df -h says 2.3G out of 666G used.
>  > > > > >
>  > > > > > The only difference between the system with gromacs 3.3 and
>  > > gromacs 4 is
>  > > > > > that gromacs 4 is running under suse 11 while gromacs 3.3 is
>  > > running on
>  > > > > > a node with suse 10. But I dont think this can be the problem?
>  > > > > >
>  > > > > > cheers
>  > > > > > Bernhard
>  > > > > >
>  > > > > >
>  > > > > > _______________________________________________
>  > > > > > gmx-users mailing list gmx-users at gromacs.org
>  > > > > > http://www.gromacs.org/mailman/listinfo/gmx-users
>  > > > > > Please search the archive at http://www.gromacs.org/search before
>  > > > > posting!
>  > > > > > Please don't post (un)subscribe requests to the list. Use the
>  > > > > > www interface or send it to gmx-users-request at gromacs.org.
>  > > > > > Can't post? Read http://www.gromacs.org/mailing_lists/users.php
>  > > > >
>  > > > >
>  > > 
> ------------------------------------------------------------------------
>  > > > > Express yourself instantly with MSN Messenger! MSN Messenger
>  > > > > <http://clk.atdmt.com/AVE/go/onm00200471ave/direct/01/>
>  > > > >
>  > > > >
>  > > > >
>  > > 
> ------------------------------------------------------------------------
>  > > > >
>  > > > > _______________________________________________
>  > > > > gmx-users mailing list gmx-users at gromacs.org
>  > > > > http://www.gromacs.org/mailman/listinfo/gmx-users
>  > > > > Please search the archive at http://www.gromacs.org/search before
>  > > posting!
>  > > > > Please don't post (un)subscribe requests to the list. Use the
>  > > > > www interface or send it to gmx-users-request at gromacs.org.
>  > > > > Can't post? Read http://www.gromacs.org/mailing_lists/users.php
>  > > >
>  > > > --
>  > > > _________________________________________________________________
>  > > > !!!! new E-mail address: patrick.fuchs at univ-paris-diderot.fr !!!!
>  > > > !!!! new postal address !!!
>  > > > Patrick FUCHS
>  > > > Equipe de Bioinformatique Genomique et Moleculaire
>  > > > INTS, INSERM UMR-S726, Université Paris Diderot,
>  > > > 6 rue Alexandre Cabanel, 75015 Paris
>  > > > Tel : +33 (0)1-44-49-30-57 - Fax : +33 (0)1-47-34-74-31
>  > > > Web Site: http://www.dsimb.inserm.fr/~fuchs
>  > > >
>  > > > _______________________________________________
>  > > > gmx-users mailing list gmx-users at gromacs.org
>  > > > http://www.gromacs.org/mailman/listinfo/gmx-users
>  > > > Please search the archive at http://www.gromacs.org/search before
>  > > posting!
>  > > > Please don't post (un)subscribe requests to the list. Use the
>  > > > www interface or send it to gmx-users-request at gromacs.org.
>  > > > Can't post? Read http://www.gromacs.org/mailing_lists/users.php
>  > >
>  > > 
> ------------------------------------------------------------------------
>  > > Express yourself instantly with MSN Messenger! MSN Messenger
>  > > <http://clk.atdmt.com/AVE/go/onm00200471ave/direct/01/>
>  > >
>  > >
>  > > 
> ------------------------------------------------------------------------
>  > >
>  > > _______________________________________________
>  > > gmx-users mailing list gmx-users at gromacs.org
>  > > http://www.gromacs.org/mailman/listinfo/gmx-users
>  > > Please search the archive at http://www.gromacs.org/search before 
> posting!
>  > > Please don't post (un)subscribe requests to the list. Use the
>  > > www interface or send it to gmx-users-request at gromacs.org.
>  > > Can't post? Read http://www.gromacs.org/mailing_lists/users.php
>  >
>  > --
>  > _________________________________________________________________
>  > !!!! new E-mail address: patrick.fuchs at univ-paris-diderot.fr !!!!
>  > !!!! new postal address !!!
>  > Patrick FUCHS
>  > Equipe de Bioinformatique Genomique et Moleculaire
>  > INTS, INSERM UMR-S726, Université Paris Diderot,
>  > 6 rue Alexandre Cabanel, 75015 Paris
>  > Tel : +33 (0)1-44-49-30-57 - Fax : +33 (0)1-47-34-74-31
>  > Web Site: http://www.dsimb.inserm.fr/~fuchs
>  >
>  > _______________________________________________
>  > gmx-users mailing list gmx-users at gromacs.org
>  > http://www.gromacs.org/mailman/listinfo/gmx-users
>  > Please search the archive at http://www.gromacs.org/search before 
> posting!
>  > Please don't post (un)subscribe requests to the list. Use the
>  > www interface or send it to gmx-users-request at gromacs.org.
>  > Can't post? Read http://www.gromacs.org/mailing_lists/users.php
> 
> ------------------------------------------------------------------------
> Express yourself instantly with MSN Messenger! MSN Messenger 
> <http://clk.atdmt.com/AVE/go/onm00200471ave/direct/01/>
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> gmx-users mailing list    gmx-users at gromacs.org
> http://www.gromacs.org/mailman/listinfo/gmx-users
> Please search the archive at http://www.gromacs.org/search before posting!
> Please don't post (un)subscribe requests to the list. Use the 
> www interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/mailing_lists/users.php

-- 
_________________________________________________________________
!!!! new E-mail address: patrick.fuchs at univ-paris-diderot.fr !!!!
!!!! new postal address !!!
Patrick FUCHS
Equipe de Bioinformatique Genomique et Moleculaire
INTS, INSERM UMR-S726, Université Paris Diderot,
6 rue Alexandre Cabanel, 75015 Paris
Tel : +33 (0)1-44-49-30-57 - Fax : +33 (0)1-47-34-74-31
Web Site: http://www.dsimb.inserm.fr/~fuchs



More information about the gromacs.org_gmx-users mailing list