[gmx-developers] Code question about checkpointing

Shirts, Michael (mrs5pt) mrs5pt at eservices.virginia.edu
Wed Jul 21 19:01:47 CEST 2010


Thanks! Good to know that I'm not just confused.

Best,
~~~~~~~~~~~~
Michael Shirts
Assistant Professor
Department of Chemical Engineering
University of Virginia
michael.shirts at virginia.edu
(434)-243-1821


> From: "hess at sbc.su.se" <hess at sbc.su.se>
> Date: Wed, 21 Jul 2010 02:44:20 -0400
> To: "michael.shirts at virginia.edu" <michael.shirts at virginia.edu>, Discussion
> list for GROMACS development <gmx-developers at gromacs.org>
> Subject: Re: [gmx-developers] Code question about checkpointing
> 
> Hi,
> 
> Sander reordered the initialization code with my help,
> but something went wrong. It indeed seems like the state
> could be set incorrectly on the non-master nodes.
> 
> I have committed a change in the order of the calls.
> 
> Berk
> 
>> I'm working on some of the free energy methods, and some of them are
>> history-dependent, thus requiring some checkpointing.  I've got it working
>> just fine for single threads, creating a new structure in the state file
>> parallel to energyhistory but for multiple threads, I've run into some
>> issues in starting up from the checkpoint.
>> 
>> I think the fundamental issue has to do with not understanding exactly how
>> information is supposed to flow in startup.  Right now, in mdrunner,
>> there's
>> the following code.
>> 
>>     /* now make sure the state is initialized and propagated */
>>     set_state_entries(state,inputrec,cr->nnodes);
>>     if (PAR(cr))
>>     {
>>         /* now broadcast everything to the non-master nodes/threads: */
>>     init_parallel(fplog, cr, inputrec, mtop, state);
>>     }
>> 
>> The problem is, the inputrec in the set_state_entries call is blank for
>> the
>> non-master notes until after the next line is called -- so I can't
>> initialize the state (including allocating space, etc), until after
>> init_parallel is called, and I have access to that information.  This is
>> before the checkpoint is read, of course, so it can't come from the header
>> there.
>> 
>> This also seems to be prone to bugs, since I noticed in set_state_entries,
>> it has lines like:
>>   if (EI_SD(ir->eI) || ir->eI == eiBD || ir->etc == etcVRESCALE) {
>> 
>> Which assume that ir has been set, whereas it has not.
>> 
>> Also, init_parallel right now just consists of:
>> 
>> *******************
>> void init_parallel(FILE *log, t_commrec *cr, t_inputrec *inputrec,
>>                    gmx_mtop_t *mtop, t_state *state)
>> {
>>     bcast_ir_mtop(cr,inputrec,mtop);
>> 
>>     if (inputrec->eI == eiBD || EI_SD(inputrec->eI)) {
>>         /* Make sure the random seeds are different on each node */
>>         inputrec->ld_seed += cr->nodeid;
>>     }
>> }
>> **********************
>> 
>> So it doesn't actually handle the seed or the top, so it seems like the
>> set_state_entries code could be called after as well, and then eliminate
>> the
>> problem.
>> 
>> Any thoughts for the right way to assume what information is available at
>> which point?
>> 
>> ~~~~~~~~~~~~
>> Michael Shirts
>> Assistant Professor
>> Department of Chemical Engineering
>> University of Virginia
>> michael.shirts at virginia.edu
>> (434)-243-1821
>> 
>> --
>> gmx-developers mailing list
>> gmx-developers at gromacs.org
>> http://lists.gromacs.org/mailman/listinfo/gmx-developers
>> Please don't post (un)subscribe requests to the list. Use the
>> www interface or send it to gmx-developers-request at gromacs.org.
>> 
> 




More information about the gromacs.org_gmx-developers mailing list