[gmx-developers] Re: [gmx-users] unexpexted stop of simulation

Roland Schulz roland at utk.edu
Thu Nov 4 01:08:42 CET 2010


Hi,

the reason turned out to be that the lock daemon (lockd) on the NFS server
was hanging. The error could be found by dmesg.

BTW: Is it somehow possible to print the kernel error messages that are
shown by dmesg to the user from within GROMACS? That would help the user to
directly see the reason of the error. Thus I'm looking for a function
similar to strerror but which returns the kernel message not just the
message of the error code (which in this case was just "Input/Output
errror".

Roland



On Wed, Nov 3, 2010 at 12:05 PM, Carsten Kutzner <ckutzne at gwdg.de> wrote:

> Hi,
>
> there was also an issue with the locking of the general md.log
> output file which was resolved for 4.5.2. An update might help.
>
> Carsten
>
>
> On Nov 3, 2010, at 3:50 PM, Florian Dommert wrote:
>
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > On 11/03/2010 03:38 PM, Hong, Liang wrote:
> >> Dear all,
> >> I'm performing a three-day simulation. It runs well for the first day,
> but stops for the second one. The error message is below. Does anyone know
> what might be the problem? Thanks
> >> Liang
> >>
> >> Program mdrun, VERSION 4.5.1-dev-20101008-e2cbc-dirty
> >> Source code file:
> /home/z8g/download/gromacs.head/src/gmxlib/checkpoint.c, line: 1748
> >>
> >> Fatal error:
> >> Failed to lock: md100ns.log. Already running simulation?
> >> For more information and tips for troubleshooting, please check the
> GROMACS
> >> website at http://www.gromacs.org/Documentation/Errors
> >> -------------------------------------------------------
> >>
> >> "Sitting on a rooftop watching molecules collide" (A Camp)
> >>
> >> Error on node 0, will try to stop all the nodes
> >> Halting parallel program mdrun on CPU 0 out of 32
> >>
> >> gcq#348: "Sitting on a rooftop watching molecules collide" (A Camp)
> >>
> >>
> --------------------------------------------------------------------------
> >> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
> >> with errorcode -1.
> >>
> >> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> >> You may or may not see output from other processes, depending on
> >> exactly when Open MPI kills them.
> >>
> --------------------------------------------------------------------------
> >> [node139:04470] [[37327,0],0]-[[37327,1],0] mca_oob_tcp_msg_recv: readv
> failed: Connection reset by peer (104)
> >>
> --------------------------------------------------------------------------
> >> mpiexec has exited due to process rank 0 with PID 4471 on
> >> node node139 exiting without calling "finalize". This may
> >> have caused other processes in the application to be
> >> terminated by signals sent by mpiexec (as reported here).
> >
> > Perhaps the queueing system of your cluster does not allow running a job
> > longer than 24h. Or the default is 24h and you have to supply the
> > corresponding information to the submission script.
> >
> > /Flo
> >
> > - --
> > Florian Dommert
> > Dipl.-Phys.
> >
> > Institute for Computational Physics
> >
> > University Stuttgart
> >
> > Pfaffenwaldring 27
> > 70569 Stuttgart
> >
> > Phone: +49(0)711/685-6-3613
> > Fax:   +49-(0)711/685-6-3658
> >
> > EMail: dommert at icp.uni-stuttgart.de
> > Home: http://www.icp.uni-stuttgart.de/~icp/Florian_Dommert
> > -----BEGIN PGP SIGNATURE-----
> > Version: GnuPG v1.4.10 (GNU/Linux)
> > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
> >
> > iEYEARECAAYFAkzRdrEACgkQLpNNBb9GiPm1sgCg3LkRUWgiZvOOH/GIjp5ifbZI
> > bJcAn1aamCMWlWTokD1+eDCLG1WhT/rd
> > =4Vs3
> > -----END PGP SIGNATURE-----
> > --
> > gmx-users mailing list    gmx-users at gromacs.org
> > http://lists.gromacs.org/mailman/listinfo/gmx-users
> > Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> > Please don't post (un)subscribe requests to the list. Use the
> > www interface or send it to gmx-users-request at gromacs.org.
> > Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
>
>
>
>
> --
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
>


-- 
ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
865-241-1537, ORNL PO BOX 2008 MS6309
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20101103/744a2e47/attachment.html>


More information about the gromacs.org_gmx-developers mailing list