[gmx-developers] Code question about checkpointing

Shirts, Michael (mrs5pt) mrs5pt at eservices.virginia.edu
Wed Jul 21 00:15:37 CEST 2010


I'm working on some of the free energy methods, and some of them are
history-dependent, thus requiring some checkpointing.  I've got it working
just fine for single threads, creating a new structure in the state file
parallel to energyhistory but for multiple threads, I've run into some
issues in starting up from the checkpoint.

I think the fundamental issue has to do with not understanding exactly how
information is supposed to flow in startup.  Right now, in mdrunner, there's
the following code.

    /* now make sure the state is initialized and propagated */
    set_state_entries(state,inputrec,cr->nnodes);
    if (PAR(cr))
    {
        /* now broadcast everything to the non-master nodes/threads: */
    init_parallel(fplog, cr, inputrec, mtop, state);
    }

The problem is, the inputrec in the set_state_entries call is blank for the
non-master notes until after the next line is called -- so I can't
initialize the state (including allocating space, etc), until after
init_parallel is called, and I have access to that information.  This is
before the checkpoint is read, of course, so it can't come from the header
there.

This also seems to be prone to bugs, since I noticed in set_state_entries,
it has lines like:
  if (EI_SD(ir->eI) || ir->eI == eiBD || ir->etc == etcVRESCALE) {

Which assume that ir has been set, whereas it has not.

Also, init_parallel right now just consists of:

*******************
void init_parallel(FILE *log, t_commrec *cr, t_inputrec *inputrec,
                   gmx_mtop_t *mtop, t_state *state)
{
    bcast_ir_mtop(cr,inputrec,mtop);

    if (inputrec->eI == eiBD || EI_SD(inputrec->eI)) {
        /* Make sure the random seeds are different on each node */
        inputrec->ld_seed += cr->nodeid;
    }
}
**********************

So it doesn't actually handle the seed or the top, so it seems like the
set_state_entries code could be called after as well, and then eliminate the
problem.

Any thoughts for the right way to assume what information is available at
which point?

~~~~~~~~~~~~
Michael Shirts
Assistant Professor
Department of Chemical Engineering
University of Virginia
michael.shirts at virginia.edu
(434)-243-1821




More information about the gromacs.org_gmx-developers mailing list