[gmx-developers] trajectory files corrupted

Berk Hess hessb at mpip-mainz.mpg.de
Mon Jun 25 10:09:11 CEST 2007


Erik Lindahl wrote:

> Hi,
>
> The "usual" trajectory problem is due to the OS buffers, so when a  
> run crashes the last couple of kB haven't been written. This has been  
> adressed in the head branch of CVS; trajectory and energy file frames  
> are now explicitly flushed after each frame to make sure the OS  
> buffers are written to disk, so frames are always complete.
>
> However, the description below sounds like a different issue. I don't  
> think we've ever had any issues with Gromacs silently corrupting  
> output, so I think this is due to the operating system.

Well..., I think we have.
With MPICH and mdrun -multi the independent runs are not synchronized at 
the end,
which (although I am not sure anymore) could mean that the queing system 
might think
the job is finished after some, but not all, of the replicas have finished.

If you are using MPICH, try replacing the last routine in 
src/gmxlib/network.c with:

void gmx_finalize(const t_commrec *cr)
{
  int ret;
#ifndef GMX_MPI
  gmx_call("gmx_finalize");
#else
  /* We sync the processes here to try to avoid problems
   * with buggy MPI implementations that could cause
   * unfinished processes to terminate.
   */
  MPI_Barrier(MPI_COMM_WORLD);
  /* Apparently certain mpich implementations cause problems
   * with MPI_Finalize. In that case comment out MPI_Finalize.
   */
  if (debug)
    fprintf(debug,"Will call MPI_Finalize now\n");
  ret = MPI_Finalize();
  if (debug)
    fprintf(debug,"Return code from MPI_Finalize = %d\n",ret);
#endif
}

Berk.




More information about the gromacs.org_gmx-developers mailing list