[gmx-developers] Code question about checkpointing

hess at sbc.su.se hess at sbc.su.se
Wed Jul 21 08:44:20 CEST 2010


Hi,

Sander reordered the initialization code with my help,
but something went wrong. It indeed seems like the state
could be set incorrectly on the non-master nodes.

I have committed a change in the order of the calls.

Berk

> I'm working on some of the free energy methods, and some of them are
> history-dependent, thus requiring some checkpointing.  I've got it working
> just fine for single threads, creating a new structure in the state file
> parallel to energyhistory but for multiple threads, I've run into some
> issues in starting up from the checkpoint.
>
> I think the fundamental issue has to do with not understanding exactly how
> information is supposed to flow in startup.  Right now, in mdrunner,
> there's
> the following code.
>
>     /* now make sure the state is initialized and propagated */
>     set_state_entries(state,inputrec,cr->nnodes);
>     if (PAR(cr))
>     {
>         /* now broadcast everything to the non-master nodes/threads: */
>     init_parallel(fplog, cr, inputrec, mtop, state);
>     }
>
> The problem is, the inputrec in the set_state_entries call is blank for
> the
> non-master notes until after the next line is called -- so I can't
> initialize the state (including allocating space, etc), until after
> init_parallel is called, and I have access to that information.  This is
> before the checkpoint is read, of course, so it can't come from the header
> there.
>
> This also seems to be prone to bugs, since I noticed in set_state_entries,
> it has lines like:
>   if (EI_SD(ir->eI) || ir->eI == eiBD || ir->etc == etcVRESCALE) {
>
> Which assume that ir has been set, whereas it has not.
>
> Also, init_parallel right now just consists of:
>
> *******************
> void init_parallel(FILE *log, t_commrec *cr, t_inputrec *inputrec,
>                    gmx_mtop_t *mtop, t_state *state)
> {
>     bcast_ir_mtop(cr,inputrec,mtop);
>
>     if (inputrec->eI == eiBD || EI_SD(inputrec->eI)) {
>         /* Make sure the random seeds are different on each node */
>         inputrec->ld_seed += cr->nodeid;
>     }
> }
> **********************
>
> So it doesn't actually handle the seed or the top, so it seems like the
> set_state_entries code could be called after as well, and then eliminate
> the
> problem.
>
> Any thoughts for the right way to assume what information is available at
> which point?
>
> ~~~~~~~~~~~~
> Michael Shirts
> Assistant Professor
> Department of Chemical Engineering
> University of Virginia
> michael.shirts at virginia.edu
> (434)-243-1821
>
> --
> gmx-developers mailing list
> gmx-developers at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-developers
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-developers-request at gromacs.org.
>




More information about the gromacs.org_gmx-developers mailing list