[gmx-developers] Code question about checkpointing
Shirts, Michael (mrs5pt)
mrs5pt at eservices.virginia.edu
Wed Jul 21 19:01:47 CEST 2010
Thanks! Good to know that I'm not just confused.
Best,
~~~~~~~~~~~~
Michael Shirts
Assistant Professor
Department of Chemical Engineering
University of Virginia
michael.shirts at virginia.edu
(434)-243-1821
> From: "hess at sbc.su.se" <hess at sbc.su.se>
> Date: Wed, 21 Jul 2010 02:44:20 -0400
> To: "michael.shirts at virginia.edu" <michael.shirts at virginia.edu>, Discussion
> list for GROMACS development <gmx-developers at gromacs.org>
> Subject: Re: [gmx-developers] Code question about checkpointing
>
> Hi,
>
> Sander reordered the initialization code with my help,
> but something went wrong. It indeed seems like the state
> could be set incorrectly on the non-master nodes.
>
> I have committed a change in the order of the calls.
>
> Berk
>
>> I'm working on some of the free energy methods, and some of them are
>> history-dependent, thus requiring some checkpointing. I've got it working
>> just fine for single threads, creating a new structure in the state file
>> parallel to energyhistory but for multiple threads, I've run into some
>> issues in starting up from the checkpoint.
>>
>> I think the fundamental issue has to do with not understanding exactly how
>> information is supposed to flow in startup. Right now, in mdrunner,
>> there's
>> the following code.
>>
>> /* now make sure the state is initialized and propagated */
>> set_state_entries(state,inputrec,cr->nnodes);
>> if (PAR(cr))
>> {
>> /* now broadcast everything to the non-master nodes/threads: */
>> init_parallel(fplog, cr, inputrec, mtop, state);
>> }
>>
>> The problem is, the inputrec in the set_state_entries call is blank for
>> the
>> non-master notes until after the next line is called -- so I can't
>> initialize the state (including allocating space, etc), until after
>> init_parallel is called, and I have access to that information. This is
>> before the checkpoint is read, of course, so it can't come from the header
>> there.
>>
>> This also seems to be prone to bugs, since I noticed in set_state_entries,
>> it has lines like:
>> if (EI_SD(ir->eI) || ir->eI == eiBD || ir->etc == etcVRESCALE) {
>>
>> Which assume that ir has been set, whereas it has not.
>>
>> Also, init_parallel right now just consists of:
>>
>> *******************
>> void init_parallel(FILE *log, t_commrec *cr, t_inputrec *inputrec,
>> gmx_mtop_t *mtop, t_state *state)
>> {
>> bcast_ir_mtop(cr,inputrec,mtop);
>>
>> if (inputrec->eI == eiBD || EI_SD(inputrec->eI)) {
>> /* Make sure the random seeds are different on each node */
>> inputrec->ld_seed += cr->nodeid;
>> }
>> }
>> **********************
>>
>> So it doesn't actually handle the seed or the top, so it seems like the
>> set_state_entries code could be called after as well, and then eliminate
>> the
>> problem.
>>
>> Any thoughts for the right way to assume what information is available at
>> which point?
>>
>> ~~~~~~~~~~~~
>> Michael Shirts
>> Assistant Professor
>> Department of Chemical Engineering
>> University of Virginia
>> michael.shirts at virginia.edu
>> (434)-243-1821
>>
>> --
>> gmx-developers mailing list
>> gmx-developers at gromacs.org
>> http://lists.gromacs.org/mailman/listinfo/gmx-developers
>> Please don't post (un)subscribe requests to the list. Use the
>> www interface or send it to gmx-developers-request at gromacs.org.
>>
>
More information about the gromacs.org_gmx-developers
mailing list