[gmx-users] restart error
Mark Abraham
mark.j.abraham at gmail.com
Thu Jun 23 09:39:37 CEST 2016
Hi,
There's two possibilities here.
1) GROMACS has a bug with multi-simulation checkpointing - several people
are reporting problems, and it's probably getting an overhaul for the 2016
release because it was far from clear the old version was always working
2) Your (parallel) file system isn't working well, so that output files
that are reported to the old run of GROMACS as being flushed to disk are
actually not flushed to disk, so that when the old run GROMACS reads the
output files "from disk" to compute the checksum it gets lied to again.
This information gets written into the checkpoint file. That's OK if the
output file really gets written to disk later on, but sometimes this
doesn't happen, particularly upon some kind of failure such as loss of
power. You can diagnose this by looking at the modification times of e.g.
your .log files. Those of the first two replicas have probably been
modified 15 minutes before all the other ones, ie at the previous
checkpointing stage. If so, complain to your system admins.
You've truncated the error message there, but you can note that GROMACS is
merely refusing to do appending to the old files. You can make a backup of
your files and re-start with mdrun -noappend, but whatever information
didn't get written won't be available to you for subsequent analysis.
Mark
On Thu, Jun 23, 2016 at 2:34 AM ingram <ingram at fhi-berlin.mpg.de> wrote:
> Dear Grommunity,
>
> When I try and restart with the command "mpiexec -np 192 mdrun_mpi -v
> -deffnm md_golp_vacuo -s topol.tpr -cpi md_golp_vacuo.cpt -multidir
> simann59 simann60 simann61 simann62 simann63 simann64 simann65 simann66
> simann67 simann68 simann69 simann70 simann71 simann72 simann73 simann74"
> I get the error " Fatal error: Can't read 187477 bytes of
> 'md_golp_vacuo.log' to compute checksum". I then see that the
> simulations where this occurs are much behind the others, for example:
>
> Step Time Lambda
> 31000000 31000.00000 0.00000
> Step Time Lambda
> 32000000 32000.00000 0.00000
> Step Time Lambda
> 57500000 57500.00000 0.00000
> Step Time Lambda
> 57500000 57500.00000 0.00000
> Step Time Lambda
> 57500000 57500.00000 0.00000
> Step Time Lambda
> 57500000 57500.00000 0.00000
> Step Time Lambda
> 57500000 57500.00000 0.00000
> Step Time Lambda
> 57500000 57500.00000 0.00000
> Step Time Lambda
> 57500000 57500.00000 0.00000
> Step Time Lambda
> 57500000 57500.00000 0.00000
> Step Time Lambda
> 57500000 57500.00000 0.00000
> Step Time Lambda
> 57500000 57500.00000 0.00000
> Step Time Lambda
> 57500000 57500.00000 0.00000
> Step Time Lambda
> 57500000 57500.00000 0.00000
> Step Time Lambda
> 57500000 57500.00000 0.00000
> Step Time Lambda
> 57500000 57500.00000 0.00000
>
> I have already posted about this issue, and I thought I had made the
> mistake. But I believe this to be a bug in GROMACS but please tell me if
> this still seems like a user error and not GROMACS. I am using GROMACS
> 5.1.2
>
> Best
>
> Teresa
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>
More information about the gromacs.org_gmx-users
mailing list