[gmx-developers] Checkpointing
Erik Lindahl
lindahl at stanford.edu
Thu Jun 6 21:40:49 CEST 2002
Hi,
>
> Actually we should just implement writing out a restart file every n
> steps, which can be the same file (or alternate between two filenames).
Since we anyway should get rid of all static data to support threads, it
would probably be a good idea to create one or a few structures that
contain all data and the current state of the system.
If we then write these structures in full precision to the file we
should be able to get a really transparent restart feature.
When writing files I'd suggest we use a temporary file, and once the
write is finished we just move it to the 'real' checkpoint file.
>>
>>1. Executing qdel kills mpirun, but leaves lamd and mpirun process happily
>>running. I guess I will have to find a good script to deal with this.
>
Another alternative might be to use a script like 'pbslam' (search the
net) and do an 'exec' command when you start it (to replace the shell
process). Not sure if it will help, but could be worth a try.
Cheers,
Erik
More information about the gromacs.org_gmx-developers
mailing list