[gmx-users] problem to restart REMD

Mark Abraham mark.j.abraham at gmail.com
Fri Jun 26 19:11:10 CEST 2015


Hi,

I can't tell what you've done so that md0.log doesn't match, but that's why
I suggested you make a backup. You also don't have to have appending,
that's just for convenience. The advice about node count mismatch doesn't
matter here... Use your judgement!

Mark

On Thu, 25 Jun 2015 16:23 leila salimi <leilasalimi at gmail.com> wrote:

> Thanks very much. Ok I will check again, it seems that they are at the same
> step!
> only the thing that comes to my mind is that I used different number of
> cpus when I tried to update few steps for some replicas, and then I used
> the primary numbers of cpu that I used.
>
> Also I got this error when I update it the  some state.cpt
> Fatal error:
> Checksum wrong for 'md0.log'. The file has been replaced or its contents
> have been modified. Cannot do appending because of this condition.
> For more information and tips for troubleshooting, please check the GROMACS
> website at http://www.gromacs.org/Documentation/Errors
>
> and also this!
>
>  #nodes mismatch,
>     current program: 2
>     checkpoint file: 128
>
>   #PME-nodes mismatch,
>     current program: -1
>     checkpoint file: 32
>
> I hope to figure out this problem, otherwise I have to run it from
> beginning!
> Thanks!
>
> Leila
>
>
>
> On Thu, Jun 25, 2015 at 4:15 PM, Mark Abraham <mark.j.abraham at gmail.com>
> wrote:
>
> > Hi,
> >
> > I can't tell either. Please run gmxcheck on all your input files, to
> check
> > the simulation part, time and step number are all what you think they are
> > (and that they match across the simulations) and try again.
> >
> > Mark
> >
> > On Thu, Jun 25, 2015 at 4:12 PM leila salimi <leilasalimi at gmail.com>
> > wrote:
> >
> > > Dear Mark,
> > >
> > > When I tried with new update of the state.cpt files, I got this error.
> > >
> > > Abort(1) on node 896 (rank 896 in comm 1140850688): Fatal error in
> > > MPI_Allreduce: Message truncated, error stack:
> > > MPI_Allreduce(912).......: MPI_Allreduce(sbuf=MPI_IN_PLACE,
> > > rbuf=0x7ffc783af760, count=4, MPI_DOUBLE, MPI_SUM, comm=0x84000002)
> > failed
> > > MPIR_Allreduce_impl(769).:
> > > MPIR_Allreduce_intra(419):
> > > MPIC_Sendrecv(467).......:
> > > MPIDI_Buffer_copy(73)....: Message truncated; 64 bytes received but
> > buffer
> > > size is 32
> > > Abort(1) on node 768 (rank 768 in comm 1140850688): Fatal error in
> > > MPI_Allreduce: Message truncated, error stack:
> > > MPI_Allreduce(912).......: MPI_Allreduce(sbuf=MPI_IN_PLACE,
> > > rbuf=0x7ffdba5176a0, count=4, MPI_DOUBLE, MPI_SUM, comm=0x84000002)
> > failed
> > > MPIR_Allreduce_impl(769).:
> > > MPIR_Allreduce_intra(419):
> > > MPIC_Sendrecv(467).......:
> > > MPIDI_Buffer_copy(73)....: Message truncated; 64 bytes received but
> > buffer
> > > size is 32
> > > ERROR: 0031-300  Forcing all remote tasks to exit due to exit code 1 in
> > > task 896
> > > "job.err.1011016.out" 399L, 17608C
> > >
> > > Actually I don't know what is the problem!
> > >
> > > Regards,
> > > Leila
> > >
> > >
> > > On Thu, Jun 18, 2015 at 12:00 AM, leila salimi <leilasalimi at gmail.com>
> > > wrote:
> > >
> > > > I understand what you meant, I run only few steps for the other
> > replicas
> > > > and then continue with the whole replicas.
> > > > I hope every thing is going well.
> > > >
> > > > Thanks very much.
> > > >
> > > > On Wed, Jun 17, 2015 at 11:43 PM, leila salimi <
> leilasalimi at gmail.com>
> > > > wrote:
> > > >
> > > >> Thanks Mark for your suggestion.
> > > >> Actually I don't understand the new two state6.cpt and state7,cpt
> > files,
> > > >> because the time that it shows is  127670.062  !
> > > >> That is strange! because my time step is 2 fs and I saved the output
> > > >> every 250 steps, means every 500 fs. I expect the time should be
> like
> > > >> 127670.000 or 127670.500 .
> > > >>
> > > >> By the way you mean with mdrun_mpi ... -nsteps ... , I can get the
> > steps
> > > >> that I need for the old state.cpt files?
> > > >>
> > > >> Regards,
> > > >> Leila
> > > >>
> > > >> On Wed, Jun 17, 2015 at 11:22 PM, Mark Abraham <
> > > mark.j.abraham at gmail.com>
> > > >> wrote:
> > > >>
> > > >>> Hi,
> > > >>>
> > > >>> That's all extremely strange. Given that you aren't going to
> exchange
> > > in
> > > >>> that short period of time, you can probably do some arithmetic and
> > work
> > > >>> out
> > > >>> how many steps you'd need to advance whichever set of files is
> behind
> > > the
> > > >>> other. Then mdrun_mpi ... -nsteps y can write a set of checkpoint
> > files
> > > >>> that will be all at the same time!
> > > >>>
> > > >>> Mark
> > > >>>
> > > >>> On Wed, Jun 17, 2015 at 10:18 PM leila salimi <
> leilasalimi at gmail.com
> > >
> > > >>> wrote:
> > > >>>
> > > >>> > Hi Mark,
> > > >>> >
> > > >>> > Thanks very much. Unfortunately both the state6.cpt,
> > state6_prev,cpt
> > > >>> and
> > > >>> > state7.cpt and state7_prev.cpt updated and their time are
> different
> > > >>> from
> > > >>> > other replicas file (also with *_prev.cpt )!
> > > >>> >
> > > >>> > I am thinking maybe I can use init-step in mdp file, and start
> from
> > > the
> > > >>> > time that I have, because all trr files have the same time! I
> > checked
> > > >>> with
> > > >>> > gmxcheck. But I am not sure that I will get correct results!
> > > >>> > Actually I got confused that with the mentioned Note, only two
> > > replicas
> > > >>> > were running and the state file is changed and the others not!
> > > >>> >
> > > >>> > ​regards,
> > > >>> > Leila
> > > >>> > --
> > > >>> > Gromacs Users mailing list
> > > >>> >
> > > >>> > * Please search the archive at
> > > >>> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List
> before
> > > >>> > posting!
> > > >>> >
> > > >>> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > > >>> >
> > > >>> > * For (un)subscribe requests visit
> > > >>> >
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> > > or
> > > >>> > send a mail to gmx-users-request at gromacs.org.
> > > >>> --
> > > >>> Gromacs Users mailing list
> > > >>>
> > > >>> * Please search the archive at
> > > >>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > > >>> posting!
> > > >>>
> > > >>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > > >>>
> > > >>> * For (un)subscribe requests visit
> > > >>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> > or
> > > >>> send a mail to gmx-users-request at gromacs.org.
> > > >>>
> > > >>
> > > >>
> > > >
> > > --
> > > Gromacs Users mailing list
> > >
> > > * Please search the archive at
> > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > > posting!
> > >
> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >
> > > * For (un)subscribe requests visit
> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > > send a mail to gmx-users-request at gromacs.org.
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> >
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.


More information about the gromacs.org_gmx-users mailing list