[gmx-developers] Checkpointing

Erik Lindahl lindahl at stanford.edu
Thu Jun 6 21:40:49 CEST 2002


Hi,


> 
> Actually we should just implement writing out a restart file every n
> steps, which can be the same file (or alternate between two filenames).

Since we anyway should get rid of all static data to support threads, it 
would probably be a good idea to create one or a few structures that 
contain all data and the current state of the system.

If we then write these structures in full precision to the file we 
should be able to get a really transparent restart feature.

When writing files I'd suggest we use a temporary file, and once the 
write is finished we just move it to the 'real' checkpoint file.

>>
>>1. Executing qdel kills mpirun, but leaves lamd and mpirun process happily
>>running. I guess I will have to find a good script to deal with this.
> 

Another alternative might be to use a script like 'pbslam' (search the 
net) and do an 'exec' command when you start it (to replace the shell 
process). Not sure if it will help, but could be worth a try.

Cheers,

Erik







More information about the gromacs.org_gmx-developers mailing list