[gmx-users] Restarting a REMD simulation (error)

Mark Abraham mark.j.abraham at gmail.com
Mon Apr 8 23:20:34 CEST 2013


It helped that I *really* knew one must differ ;-)

Mark
On Apr 8, 2013 2:24 PM, "João Henriques" <joao.henriques.32353 at gmail.com>
wrote:

> Thank you very much. I didn't notice it until now considering all those
> numbers look so similar. Great eye for detail!
>
> João
>
>
> On Mon, Apr 8, 2013 at 3:17 PM, Mark Abraham <mark.j.abraham at gmail.com
> >wrote:
>
> > On Apr 8, 2013 8:53 AM, "João Henriques" <joao.henriques.32353 at gmail.com
> >
> > wrote:
> > >
> > > Dear all,
> > >
> > > Due to cluster wall-time limitations, I was forced to restart two REMD
> > > simulations. It ran absolutely fine until hitting the wall-time. To
> > restart
> > > I used the following command:
> > >
> > > mpirun -np 64 -output-filename MPIoutput $GromDir/mdrun_mpi -s H5_.tpr
> > > -multi 64 -replex 1000 -deffnm H5_ -cpi -noappend
> > >
> > > (I'm using GMX-4.0.7 and yes I know it's old but I have my own reasons
> > for
> > > using it.)
> > >
> > > Here is a random replica (#1) MPI output:
> > >
> > > ######START#######
> > > NNODES=64, MYRANK=1, HOSTNAME=an091
> > > NODEID=1 argc=11
> > > Checkpoint file is from part 1, new output files will be suffixed
> > part0002.
> > > Reading file H5_1.tpr, VERSION 4.0.7 (single precision)
> > >
> > > Reading checkpoint file H5_1.cpt generated: Wed Apr  3 17:13:14 2013
> > >
> > > -------------------------------------------------------
> > > Program mdrun_mpi, VERSION 4.0.7
> > > Source code file: main.c, line: 116
> > >
> > > Fatal error:
> > > The 64 subsystems are not compatible
> > >
> > > -------------------------------------------------------
> > >
> > > Error on node 1, will try to stop all the nodes
> > > Halting parallel program mdrun_mpi on CPU 1 out of 64
> > > ######END#######
> > >
> > > It's reading from the correct cpt and tpr files, so it must be
> something
> > > else.
> > >
> > > Here is a tail of the respective log file:
> > >
> > > ######START#######
> > > Initializing Replica Exchange
> > > Repl  There are 64 replicas:
> > > Multi-checking the number of atoms ... OK
> > > Multi-checking the integrator ... OK
> > > Multi-checking init_step+nsteps ... OK
> > > Multi-checking first exchange step: init_step/-replex ...
> > > first exchange step: init_step/-replex is not equal for all subsystems
> > >   subsystem 0: 3062
> > >   subsystem 1: 3062
> > >   subsystem 2: 3062
> > >   subsystem 3: 3062
> > >   subsystem 4: 3062
> > >   subsystem 5: 3062
> > >   subsystem 6: 3062
> > >   subsystem 7: 3062
> > >   subsystem 8: 3062
> > >   subsystem 9: 3062
> > >   subsystem 10: 3062
> > >   subsystem 11: 3062
> > >   subsystem 12: 3062
> > >   subsystem 13: 3062
> > >   subsystem 14: 3062
> > >   subsystem 15: 3062
> > >   subsystem 16: 3062
> > >   subsystem 17: 3062
> > >   subsystem 18: 3062
> > >   subsystem 19: 3062
> > >   subsystem 20: 3062
> > >   subsystem 21: 3062
> > >   subsystem 22: 3062
> > >   subsystem 23: 3062
> > >   subsystem 24: 3062
> > >   subsystem 25: 3062
> > >   subsystem 26: 3062
> > >   subsystem 27: 3062
> > >   subsystem 28: 3062
> > >   subsystem 29: 3062
> > >   subsystem 30: 3062
> > >   subsystem 31: 3062
> > >   subsystem 32: 3062
> > >   subsystem 33: 3062
> > >   subsystem 34: 3062
> > >   subsystem 35: 3062
> > >   subsystem 36: 3062
> > >   subsystem 37: 3062
> > >   subsystem 38: 3062
> > >   subsystem 39: 3066
> >
> > Seems system 39 got its IO done faster. Its state_prev.cpt will be 3062.
> > Back up your files. Use gmxcheck to see what's in files. Rename as
> suitable
> > so your set of files is consistent.
> >
> > Mark
> >
> > >   subsystem 40: 3062
> > >   subsystem 41: 3062
> > >   subsystem 42: 3062
> > >   subsystem 43: 3062
> > >   subsystem 44: 3062
> > >   subsystem 45: 3062
> > >   subsystem 46: 3062
> > >   subsystem 47: 3062
> > >   subsystem 48: 3062
> > >   subsystem 49: 3062
> > >   subsystem 50: 3062
> > >   subsystem 51: 3062
> > >   subsystem 52: 3062
> > >   subsystem 53: 3062
> > >   subsystem 54: 3062
> > >   subsystem 55: 3062
> > >   subsystem 56: 3062
> > >   subsystem 57: 3062
> > >   subsystem 58: 3062
> > >   subsystem 59: 3062
> > >   subsystem 60: 3062
> > >   subsystem 61: 3062
> > >   subsystem 62: 3062
> > >   subsystem 63: 3062
> > >
> > > -------------------------------------------------------
> > > Program mdrun_mpi, VERSION 4.0.7
> > > Source code file: main.c, line: 116
> > >
> > > Fatal error:
> > > The 64 subsystems are not compatible
> > >
> > > -------------------------------------------------------
> > > ######END#######
> > >
> > > It's clear that "init_step/-replex is not equal for all subsystems" is
> > the
> > > problem, but does anyone know why this is happening and how to solve
> it?
> > >
> > > Thank you for your patience,
> > > Best regards,
> > >
> > > João Henriques
> > > --
> > > gmx-users mailing list    gmx-users at gromacs.org
> > > http://lists.gromacs.org/mailman/listinfo/gmx-users
> > > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> > > * Please don't post (un)subscribe requests to the list. Use the
> > > www interface or send it to gmx-users-request at gromacs.org.
> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > --
> > gmx-users mailing list    gmx-users at gromacs.org
> > http://lists.gromacs.org/mailman/listinfo/gmx-users
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> > * Please don't post (un)subscribe requests to the list. Use the
> > www interface or send it to gmx-users-request at gromacs.org.
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
>
>
>
> --
> João Henriques
> --
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>



More information about the gromacs.org_gmx-users mailing list