[gmx-users] Job crash: checkpoint file
cshi at ou.edu
Mon Oct 25 17:58:15 CEST 2010
I am having exactly the same problem recently.
The replica exchange job stops around 11000 steps.
Switch to another cluster and the job is running fine.
I wonder how often you've seen this type of crashing and any solutions for this?
Department of Chemistry & Biochemistry
University of Oklahoma
Email: cshi at ou.edu
From: gmx-users-bounces at gromacs.org [gmx-users-bounces at gromacs.org] on behalf of Justin A. Lemkul [jalemkul at vt.edu]
Sent: Friday, October 08, 2010 1:19 PM
To: Discussion list for GROMACS users
Subject: Re: [gmx-users] Job crash: checkpoint file
Jianhui Tian wrote:
> Dear GMX users,
> I am running a replica simulation and the job crashed with the following
> File input/output error:
> Cannot rename checkpoint file; maybe you are out of quota?
> From the mailling list, I see this might be a permission problem.
> However, I checked the file permission and nothing wrong was noticed.
> If I rerun the crashed simulation, it goes through the second time. This
> seems strong. Any suggestion is welcomed.
I've seen this happen when our filesystem blips. It seems like you're able to
run your job, so I don't think there's anything to do about it, except perhaps
inquire with your sysadmins about the stability of the filesystem, and whether
or not you can expect to have this happen frequently.
Justin A. Lemkul
ICTAS Doctoral Scholar
Department of Biochemistry
jalemkul[at]vt.edu | (540) 231-9080
gmx-users mailing list gmx-users at gromacs.org
Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-request at gromacs.org.
Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
More information about the gromacs.org_gmx-users