[gmx-users] Update: (Problem Solved) -- Job crash: checkpoint file

Shi, Chuanyin cshi at ou.edu
Tue Nov 2 16:41:50 CET 2010


Thanks to our HPC support team who solved my problem.
They did something to my account and now I am able to run large replica exchange jobs.
I don't know whether this is really related to disk quota or special permission.
But for anyone who encountered this type of problem,
you can try running the job from home directory.
If this works, then you need to contact the administrator to remove the limitation on scratch space.


Chuanyin Shi
Department of Chemistry & Biochemistry
University of Oklahoma
Email: cshi at ou.edu
________________________________________
From: gmx-users-bounces at gromacs.org [gmx-users-bounces at gromacs.org] on behalf of Shi, Chuanyin [cshi at ou.edu]
Sent: Monday, October 25, 2010 11:37 AM
To: Discussion list for GROMACS users
Subject: RE: [gmx-users] Job crash: checkpoint file

I wish the error would come from the quota.
But the disk quota is fine, we still have 2.0TB on the scratch space which is shared by everyone.
And I have other jobs running and no problem with writing outputs.
I've tried smaller REMD jobs on the cluster using only 1 node (8 cpus) and seems to be no problem.
But using 7 nodes and one (or two) node complains about that.
And several weird files are generated: mdrun_mpi.80s-12939,v002.local.btr, mdrun_mpi.80s-12940,v002.local.btr, ....
v002 is the name of the node.


Chuanyin Shi
Department of Chemistry & Biochemistry
University of Oklahoma
Email: cshi at ou.edu
________________________________________
From: gmx-users-bounces at gromacs.org [gmx-users-bounces at gromacs.org] on behalf of David van der Spoel [spoel at xray.bmc.uu.se]
Sent: Monday, October 25, 2010 11:21 AM
To: Discussion list for GROMACS users
Subject: Re: [gmx-users] Job crash: checkpoint file

On 2010-10-25 17.58, Shi, Chuanyin wrote:
> I am having exactly the same problem recently.
> The replica exchange job stops around 11000 steps.
> Switch to another cluster and the job is running fine.
> I wonder how often you've seen this type of crashing and any solutions for this?
> Thanks.

Have you checked your quota?
I had the same problem recently, and I was indeed out of quota.
>
>
>
> Chuanyin Shi
> Department of Chemistry&  Biochemistry
> University of Oklahoma
> Email: cshi at ou.edu
> ________________________________________
> From: gmx-users-bounces at gromacs.org [gmx-users-bounces at gromacs.org] on behalf of Justin A. Lemkul [jalemkul at vt.edu]
> Sent: Friday, October 08, 2010 1:19 PM
> To: Discussion list for GROMACS users
> Subject: Re: [gmx-users] Job crash: checkpoint file
>
> Jianhui Tian wrote:
>> Dear GMX users,
>>
>> I am running a replica simulation and the job crashed with the following
>> message:
>>
>> File input/output error:
>> Cannot rename checkpoint file; maybe you are out of quota?
>>
>>   From the mailling list, I see this might be a permission problem.
>> However, I checked the file permission and nothing wrong was noticed.
>> If I rerun the crashed simulation, it goes through the second time. This
>> seems strong. Any suggestion is welcomed.
>>
>
> I've seen this happen when our filesystem blips.  It seems like you're able to
> run your job, so I don't think there's anything to do about it, except perhaps
> inquire with your sysadmins about the stability of the filesystem, and whether
> or not you can expect to have this happen frequently.
>
> -Justin
>
>> JH
>>
>
> --
> ========================================
>
> Justin A. Lemkul
> Ph.D. Candidate
> ICTAS Doctoral Scholar
> MILES-IGERT Trainee
> Department of Biochemistry
> Virginia Tech
> Blacksburg, VA
> jalemkul[at]vt.edu | (540) 231-9080
> http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin
>
> ========================================
> --
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/Support/Mailing_Lists


--
David van der Spoel, Ph.D., Professor of Biology
Dept. of Cell & Molec. Biol., Uppsala University.
Box 596, 75124 Uppsala, Sweden. Phone:  +46184714205.
spoel at xray.bmc.uu.se    http://folding.bmc.uu.se
--
gmx-users mailing list    gmx-users at gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-request at gromacs.org.
Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
--
gmx-users mailing list    gmx-users at gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-request at gromacs.org.
Can't post? Read http://www.gromacs.org/Support/Mailing_Lists



More information about the gromacs.org_gmx-users mailing list