[gmx-users] unexpexted stop of simulation

Carsten Kutzner ckutzne at gwdg.de
Wed Nov 3 17:05:36 CET 2010


Hi,

there was also an issue with the locking of the general md.log
output file which was resolved for 4.5.2. An update might help.

Carsten


On Nov 3, 2010, at 3:50 PM, Florian Dommert wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 11/03/2010 03:38 PM, Hong, Liang wrote:
>> Dear all,
>> I'm performing a three-day simulation. It runs well for the first day, but stops for the second one. The error message is below. Does anyone know what might be the problem? Thanks
>> Liang
>> 
>> Program mdrun, VERSION 4.5.1-dev-20101008-e2cbc-dirty
>> Source code file: /home/z8g/download/gromacs.head/src/gmxlib/checkpoint.c, line: 1748
>> 
>> Fatal error:
>> Failed to lock: md100ns.log. Already running simulation?
>> For more information and tips for troubleshooting, please check the GROMACS
>> website at http://www.gromacs.org/Documentation/Errors
>> -------------------------------------------------------
>> 
>> "Sitting on a rooftop watching molecules collide" (A Camp)
>> 
>> Error on node 0, will try to stop all the nodes
>> Halting parallel program mdrun on CPU 0 out of 32
>> 
>> gcq#348: "Sitting on a rooftop watching molecules collide" (A Camp)
>> 
>> --------------------------------------------------------------------------
>> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
>> with errorcode -1.
>> 
>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>> You may or may not see output from other processes, depending on
>> exactly when Open MPI kills them.
>> --------------------------------------------------------------------------
>> [node139:04470] [[37327,0],0]-[[37327,1],0] mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104)
>> --------------------------------------------------------------------------
>> mpiexec has exited due to process rank 0 with PID 4471 on
>> node node139 exiting without calling "finalize". This may
>> have caused other processes in the application to be
>> terminated by signals sent by mpiexec (as reported here).
> 
> Perhaps the queueing system of your cluster does not allow running a job
> longer than 24h. Or the default is 24h and you have to supply the
> corresponding information to the submission script.
> 
> /Flo
> 
> - -- 
> Florian Dommert
> Dipl.-Phys.
> 
> Institute for Computational Physics
> 
> University Stuttgart
> 
> Pfaffenwaldring 27
> 70569 Stuttgart
> 
> Phone: +49(0)711/685-6-3613
> Fax:   +49-(0)711/685-6-3658
> 
> EMail: dommert at icp.uni-stuttgart.de
> Home: http://www.icp.uni-stuttgart.de/~icp/Florian_Dommert
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.10 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
> 
> iEYEARECAAYFAkzRdrEACgkQLpNNBb9GiPm1sgCg3LkRUWgiZvOOH/GIjp5ifbZI
> bJcAn1aamCMWlWTokD1+eDCLG1WhT/rd
> =4Vs3
> -----END PGP SIGNATURE-----
> -- 
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> Please don't post (un)subscribe requests to the list. Use the 
> www interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/Support/Mailing_Lists








More information about the gromacs.org_gmx-users mailing list