[gmx-developers] Checkpointing

Erik Lindahl lindahl at stanford.edu
Thu Jun 6 20:05:09 CEST 2002


On Wed, 2002-06-05 at 16:38, Justin MacCallum wrote:
> 
> Hi,
> 
> what is the current status of checkpointing in Gromacs? I know that
> sending USR1 to a job will cause it to checkpoint itself (sortof). Does
> this work in parallel? Does anyone know how to make mpirun from lam-mpi
> less stupid?  IE, can you get it to pass signals on to the processes it
> spawns? I'm trying to get checkpointing to work in a PBS environment on
> Linux.
> 

This would indeed be nice to have, but we will probably have to
implement it 'manually' on Linux - there is no OS support for
checkpointing like on SGI or CRAY.

One problem is that I don't think the MPI standard includes any way of
sending signals to other nodes - we will have to intercept the signal on
the node where we get it and do the communication ourselves.

The second problem is that some MPI implementations (notably IBM)
doesn't pass all signals on to the actual program... I'm not sure about
lam.

Cheers,

Erik







More information about the gromacs.org_gmx-developers mailing list