[gmx-users] MPI_BCAST : Message truncated [when restarting from checkpoint]

Mark Abraham Mark.Abraham at anu.edu.au
Thu Nov 5 08:37:07 CET 2009


Vasilii Artyukhov wrote:
> Dear colleagues,
> 
> I encountered the following error messages when trying out the 
> checkpoint continuation functionality with MPI:
> 
>     Reading file topol.tpr, VERSION 4.0.5 (single precision)
> 
>     Reading checkpoint file state.cpt generated: Tue Oct 27 22:37:23 2009
> 
>     8 - MPI_BCAST : Message truncated
>     [8] [] Aborting Program!
>     Abort signaled by rank 8:  Aborting program !
>     Exit code -3 signaled from [node address]
>     Killing remote processes...
> 
> 
>  From a short google search I've come under the impression that this has 
> to do with sending messages between the nodes that are larger than the 
> receive buffer (note my total lack of practical experience with MPI 
> programming). However, I have no idea where, if this is true, it might 
> be happening.
> 
> My system has 44091 atoms, and the .cpt file size is 1.1M, which doesn't 
> seems too large. Restarting works seemingly fine with serial mdrun.

Hmm, that's an issue I've not seen before. Perhaps your MPI is 
configured with quite a small buffer for such things. What MPI style and 
version is it? Checking about this with your system admins is a good idea.

Mark



More information about the gromacs.org_gmx-users mailing list