[gmx-users] MPI_BCAST : Message truncated [when restarting from checkpoint]
Mark.Abraham at anu.edu.au
Thu Nov 5 08:37:07 CET 2009
Vasilii Artyukhov wrote:
> Dear colleagues,
> I encountered the following error messages when trying out the
> checkpoint continuation functionality with MPI:
> Reading file topol.tpr, VERSION 4.0.5 (single precision)
> Reading checkpoint file state.cpt generated: Tue Oct 27 22:37:23 2009
> 8 - MPI_BCAST : Message truncated
>   Aborting Program!
> Abort signaled by rank 8: Aborting program !
> Exit code -3 signaled from [node address]
> Killing remote processes...
> From a short google search I've come under the impression that this has
> to do with sending messages between the nodes that are larger than the
> receive buffer (note my total lack of practical experience with MPI
> programming). However, I have no idea where, if this is true, it might
> be happening.
> My system has 44091 atoms, and the .cpt file size is 1.1M, which doesn't
> seems too large. Restarting works seemingly fine with serial mdrun.
Hmm, that's an issue I've not seen before. Perhaps your MPI is
configured with quite a small buffer for such things. What MPI style and
version is it? Checking about this with your system admins is a good idea.
More information about the gromacs.org_gmx-users