[gmx-users] Job crash: checkpoint file
cshi at ou.edu
Mon Oct 25 18:37:40 CEST 2010
I wish the error would come from the quota.
But the disk quota is fine, we still have 2.0TB on the scratch space which is shared by everyone.
And I have other jobs running and no problem with writing outputs.
I've tried smaller REMD jobs on the cluster using only 1 node (8 cpus) and seems to be no problem.
But using 7 nodes and one (or two) node complains about that.
And several weird files are generated: mdrun_mpi.80s-12939,v002.local.btr, mdrun_mpi.80s-12940,v002.local.btr, ....
v002 is the name of the node.
Department of Chemistry & Biochemistry
University of Oklahoma
Email: cshi at ou.edu
From: gmx-users-bounces at gromacs.org [gmx-users-bounces at gromacs.org] on behalf of David van der Spoel [spoel at xray.bmc.uu.se]
Sent: Monday, October 25, 2010 11:21 AM
To: Discussion list for GROMACS users
Subject: Re: [gmx-users] Job crash: checkpoint file
On 2010-10-25 17.58, Shi, Chuanyin wrote:
> I am having exactly the same problem recently.
> The replica exchange job stops around 11000 steps.
> Switch to another cluster and the job is running fine.
> I wonder how often you've seen this type of crashing and any solutions for this?
Have you checked your quota?
I had the same problem recently, and I was indeed out of quota.
> Chuanyin Shi
> Department of Chemistry& Biochemistry
> University of Oklahoma
> Email: cshi at ou.edu
> From: gmx-users-bounces at gromacs.org [gmx-users-bounces at gromacs.org] on behalf of Justin A. Lemkul [jalemkul at vt.edu]
> Sent: Friday, October 08, 2010 1:19 PM
> To: Discussion list for GROMACS users
> Subject: Re: [gmx-users] Job crash: checkpoint file
> Jianhui Tian wrote:
>> Dear GMX users,
>> I am running a replica simulation and the job crashed with the following
>> File input/output error:
>> Cannot rename checkpoint file; maybe you are out of quota?
>> From the mailling list, I see this might be a permission problem.
>> However, I checked the file permission and nothing wrong was noticed.
>> If I rerun the crashed simulation, it goes through the second time. This
>> seems strong. Any suggestion is welcomed.
> I've seen this happen when our filesystem blips. It seems like you're able to
> run your job, so I don't think there's anything to do about it, except perhaps
> inquire with your sysadmins about the stability of the filesystem, and whether
> or not you can expect to have this happen frequently.
> Justin A. Lemkul
> Ph.D. Candidate
> ICTAS Doctoral Scholar
> MILES-IGERT Trainee
> Department of Biochemistry
> Virginia Tech
> Blacksburg, VA
> jalemkul[at]vt.edu | (540) 231-9080
> gmx-users mailing list gmx-users at gromacs.org
> Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
David van der Spoel, Ph.D., Professor of Biology
Dept. of Cell & Molec. Biol., Uppsala University.
Box 596, 75124 Uppsala, Sweden. Phone: +46184714205.
spoel at xray.bmc.uu.se http://folding.bmc.uu.se
gmx-users mailing list gmx-users at gromacs.org
Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-request at gromacs.org.
Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
More information about the gromacs.org_gmx-users