[gmx-developers] Code question about checkpointing
hess at sbc.su.se
hess at sbc.su.se
Wed Jul 21 08:44:20 CEST 2010
Hi,
Sander reordered the initialization code with my help,
but something went wrong. It indeed seems like the state
could be set incorrectly on the non-master nodes.
I have committed a change in the order of the calls.
Berk
> I'm working on some of the free energy methods, and some of them are
> history-dependent, thus requiring some checkpointing. I've got it working
> just fine for single threads, creating a new structure in the state file
> parallel to energyhistory but for multiple threads, I've run into some
> issues in starting up from the checkpoint.
>
> I think the fundamental issue has to do with not understanding exactly how
> information is supposed to flow in startup. Right now, in mdrunner,
> there's
> the following code.
>
> /* now make sure the state is initialized and propagated */
> set_state_entries(state,inputrec,cr->nnodes);
> if (PAR(cr))
> {
> /* now broadcast everything to the non-master nodes/threads: */
> init_parallel(fplog, cr, inputrec, mtop, state);
> }
>
> The problem is, the inputrec in the set_state_entries call is blank for
> the
> non-master notes until after the next line is called -- so I can't
> initialize the state (including allocating space, etc), until after
> init_parallel is called, and I have access to that information. This is
> before the checkpoint is read, of course, so it can't come from the header
> there.
>
> This also seems to be prone to bugs, since I noticed in set_state_entries,
> it has lines like:
> if (EI_SD(ir->eI) || ir->eI == eiBD || ir->etc == etcVRESCALE) {
>
> Which assume that ir has been set, whereas it has not.
>
> Also, init_parallel right now just consists of:
>
> *******************
> void init_parallel(FILE *log, t_commrec *cr, t_inputrec *inputrec,
> gmx_mtop_t *mtop, t_state *state)
> {
> bcast_ir_mtop(cr,inputrec,mtop);
>
> if (inputrec->eI == eiBD || EI_SD(inputrec->eI)) {
> /* Make sure the random seeds are different on each node */
> inputrec->ld_seed += cr->nodeid;
> }
> }
> **********************
>
> So it doesn't actually handle the seed or the top, so it seems like the
> set_state_entries code could be called after as well, and then eliminate
> the
> problem.
>
> Any thoughts for the right way to assume what information is available at
> which point?
>
> ~~~~~~~~~~~~
> Michael Shirts
> Assistant Professor
> Department of Chemical Engineering
> University of Virginia
> michael.shirts at virginia.edu
> (434)-243-1821
>
> --
> gmx-developers mailing list
> gmx-developers at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-developers
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-developers-request at gromacs.org.
>
More information about the gromacs.org_gmx-developers
mailing list