[gmx-users] Continue run in Gromacs-4 with check point file

xuji xuji at home.ipe.ac.cn
Fri Mar 20 03:26:09 CET 2009


    Hi all: 


    I wrote an e-mail many days ago about continuing run in Gromacs-4.0 with check point file. But I can't solve this problem yet.
    I run a simulation with
      mpiexec -machinefile ./mf_24 -np 24 mdrun -v -append -cpt 5 -cpi dppc_md_prev.cpt -cpo dppc_md.cpt -s dppc_md.tpr -o dppc_md.trr -c dppc_md.gro -g dppc_md.log -e dppc_md.edr 
    in 4 nodes. But when I continue to run the simulation with
      mpiexec -machinefile ./mf_24 -np 24 mdrun -v -append -cpt 5 -cpi dppc_md.cpt -cpo dppc_md_2.cpt -s dppc_md.tpr -o dppc_md.trr -c dppc_md.gro -g dppc_md.log -e dppc_md.edr 
    or with
      mpiexec -machinefile ./mf_24 -np 24 mdrun -v -append -cpt 5 -cpi dppc_md_prev.cpt -cpo dppc_md_2.cpt -s dppc_md.tpr -o dppc_md.trr -c dppc_md.gro -g dppc_md.log -e dppc_md.edr 
    because there're 2 check point file in the simulation directory, I tried both of them.
    I always get following errors:

    Reading checkpoint file dppc_md_prev.cpt generated: Fri Mar 20 08:53:47 2009
    or
    Reading checkpoint file dppc_md.cpt generated: Fri Mar 20 08:58:08 2009 

    Loaded with Money
    Fatal error in MPI_Bcast:
    Message truncated, error stack:
    MPI_Bcast(1145)...................: MPI_Bcast(buf=0x7fffc33242dc, count=4, MPI_BYTE, root=0, MPI_COMM_WORLD) failed
    MPIR_Bcast(229)...................: 
    MPIDI_CH3U_Receive_data_found(254): Message from rank 0 and tag 2 truncated; 12 bytes received but buffer size is 4
    Fatal error in MPI_Bcast:
    Message truncated, error stack:
    MPI_Bcast(1145)...................: MPI_Bcast(buf=0x7fff6c0da09c, count=4, MPI_BYTE, root=0, MPI_COMM_WORLD) failed
    MPIR_Bcast(229)...................: 
    MPIDI_CH3U_Receive_data_found(254): Message from rank 4 and tag 2 truncated; 12 bytes received but buffer size is 4
    Fatal error in MPI_Bcast:
    Message truncated, error stack:
    MPI_Bcast(1145)...................: MPI_Bcast(buf=0x7fff9ac2ebec, count=4, MPI_BYTE, root=0, MPI_COMM_WORLD) failed
    MPIR_Bcast(229)...................: 
    MPIDI_CH3U_Receive_data_found(254): Message from rank 0 and tag 2 truncated; 12 bytes received but buffer size is 4
    rank 16 in job 5  Node115_33001   caused collective abort of all ranks
      exit status of rank 16: killed by signal 9 
    rank 8 in job 5  Node115_33001   caused collective abort of all ranks
      exit status of rank 8: killed by signal 9 
    rank 6 in job 5  Node115_33001   caused collective abort of all ranks
      exit status of rank 6: killed by signal 9 
    
    Can someone help me with this problem? Appreciate any help in advance!
    
    Best 
        wishes!
 

2009-03-20 



xuji 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20090320/57163ae5/attachment.html>


More information about the gromacs.org_gmx-users mailing list