[gmx-users] restart error
ingram
ingram at fhi-berlin.mpg.de
Thu Jun 23 10:34:52 CEST 2016
Great thank you!
On 2016-06-23 09:39, Mark Abraham wrote:
> Hi,
>
> There's two possibilities here.
>
> 1) GROMACS has a bug with multi-simulation checkpointing - several
> people
> are reporting problems, and it's probably getting an overhaul for the
> 2016
> release because it was far from clear the old version was always
> working
>
> 2) Your (parallel) file system isn't working well, so that output
> files
> that are reported to the old run of GROMACS as being flushed to disk
> are
> actually not flushed to disk, so that when the old run GROMACS reads
> the
> output files "from disk" to compute the checksum it gets lied to
> again.
> This information gets written into the checkpoint file. That's OK if
> the
> output file really gets written to disk later on, but sometimes this
> doesn't happen, particularly upon some kind of failure such as loss
> of
> power. You can diagnose this by looking at the modification times of
> e.g.
> your .log files. Those of the first two replicas have probably been
> modified 15 minutes before all the other ones, ie at the previous
> checkpointing stage. If so, complain to your system admins.
>
> You've truncated the error message there, but you can note that
> GROMACS is
> merely refusing to do appending to the old files. You can make a
> backup of
> your files and re-start with mdrun -noappend, but whatever
> information
> didn't get written won't be available to you for subsequent analysis.
>
> Mark
>
> On Thu, Jun 23, 2016 at 2:34 AM ingram <ingram at fhi-berlin.mpg.de>
> wrote:
>
>> Dear Grommunity,
>>
>> When I try and restart with the command "mpiexec -np 192 mdrun_mpi
>> -v
>> -deffnm md_golp_vacuo -s topol.tpr -cpi md_golp_vacuo.cpt -multidir
>> simann59 simann60 simann61 simann62 simann63 simann64 simann65
>> simann66
>> simann67 simann68 simann69 simann70 simann71 simann72 simann73
>> simann74"
>> I get the error " Fatal error: Can't read 187477 bytes of
>> 'md_golp_vacuo.log' to compute checksum". I then see that the
>> simulations where this occurs are much behind the others, for
>> example:
>>
>> Step Time Lambda
>> 31000000 31000.00000 0.00000
>> Step Time Lambda
>> 32000000 32000.00000 0.00000
>> Step Time Lambda
>> 57500000 57500.00000 0.00000
>> Step Time Lambda
>> 57500000 57500.00000 0.00000
>> Step Time Lambda
>> 57500000 57500.00000 0.00000
>> Step Time Lambda
>> 57500000 57500.00000 0.00000
>> Step Time Lambda
>> 57500000 57500.00000 0.00000
>> Step Time Lambda
>> 57500000 57500.00000 0.00000
>> Step Time Lambda
>> 57500000 57500.00000 0.00000
>> Step Time Lambda
>> 57500000 57500.00000 0.00000
>> Step Time Lambda
>> 57500000 57500.00000 0.00000
>> Step Time Lambda
>> 57500000 57500.00000 0.00000
>> Step Time Lambda
>> 57500000 57500.00000 0.00000
>> Step Time Lambda
>> 57500000 57500.00000 0.00000
>> Step Time Lambda
>> 57500000 57500.00000 0.00000
>> Step Time Lambda
>> 57500000 57500.00000 0.00000
>>
>> I have already posted about this issue, and I thought I had made the
>> mistake. But I believe this to be a bug in GROMACS but please tell
>> me if
>> this still seems like a user error and not GROMACS. I am using
>> GROMACS
>> 5.1.2
>>
>> Best
>>
>> Teresa
>>
>> --
>> Gromacs Users mailing list
>>
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>> or
>> send a mail to gmx-users-request at gromacs.org.
>>
More information about the gromacs.org_gmx-users
mailing list