[gmx-users] question about Gromacs4.0.7: parallel run

Justin A. Lemkul jalemkul at vt.edu
Tue May 25 01:59:17 CEST 2010



Yi Peng wrote:
> Hi, everyone,
> 
> Recently our school upgraded the clusters for us. And they install the 
> Gromacs-4.0.7 for us.  Before I always used Gromacs-4.0.3, the scripts 
> used for parallel running works well.
> 
> My script is as follows:
> 
> #PBS -l nodes=4:ppn=2
> #PBS -N pr-impd1-wt
> #PBS -j oe
> module load gromacs
> module load openmpi-intel
> cd $PBS_O_WORKDIR
> NPROCS=`wc -l < $PBS_NODEFILE`
> /usr/local/bin/pbsdcp -s pr.tpr $TMPDIR
> cd $TMPDIR
> mpiexec mdrun -multi $NPROCS -maxh 100 -s pr.tpr -e pr.edr -o pr.trr -g 
> pr.log -c pr.gro
> /usr/local/bin/pbsdcp -g '*' $PBS_O_WORKDIR
> cd $PBS_O_WORKDIR
> 
> But today I tried to use Gromacs-4.0.7 for this. It always has the error 
> message as follows:
> -------------------------------------------------------
> Program mdrun, VERSION 4.0.7
> Source code file: gmxfio.c, line: 737
> 
> Can not open file:
> pr7.tpr
> -------------------------------------------------------
> 
> "Your Bones Got a Little Machine" (Pixies)
> 
> Error on node 7, will try to stop all the nodes
> Halting parallel program mdrun on CPU 7 out of 8
> 
> gcq#212: "Your Bones Got a Little Machine" (Pixies)
> 
> 
> -------------------------------------------------------
> Program mdrun, VERSION 4.0.7
> Source code file: gmxfio.c, line: 737
> 
> Can not open file:
> pr5.tpr
> -------------------------------------------------------
> 
> "Your Bones Got a Little Machine" (Pixies)
> 
> 
> gcq#212: "Your Bones Got a Little Machine" (Pixies)
> 
> Error on node 5, will try to stop all the nodes
> Halting parallel program mdrun on CPU 5 out of 8
> 
> -------------------------------------------------------
> Program mdrun, VERSION 4.0.7
> Source code file: gmxfio.c, line: 737
> 
> Can not open file:
> pr4.tpr
> -------------------------------------------------------
> 
> "Your Bones Got a Little Machine" (Pixies)
> 
> 
> gcq#212: "Your Bones Got a Little Machine" (Pixies)
> 
> Error on node 4, will try to stop all the nodes
> Halting parallel program mdrun on CPU 4 out of 8
> --------------------------------------------------------------------------
> MPI_ABORT was invoked on rank 6 in communicator MPI_COMM_WORLD
> with errorcode -1.
> 
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> --------------------------------------------------------------------------
> 
> How can I solve this problem. Where can I add number to each input  file 
> for pr.tpr?
> 

The use of -multi implies that you have a series of .tpr files beginning with 
zero, i.e. pr0.tpr, pr1.tpr, etc.  So the input files have to be named as mdrun 
expects them to be.  The name given to the -s flag is a prefix.  See, for instance:

http://www.gromacs.org/Documentation/How-tos/REMD#Execution_Steps

-Justin

> Thanks!
> 
> Yi
> 

-- 
========================================

Justin A. Lemkul
Ph.D. Candidate
ICTAS Doctoral Scholar
MILES-IGERT Trainee
Department of Biochemistry
Virginia Tech
Blacksburg, VA
jalemkul[at]vt.edu | (540) 231-9080
http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin

========================================



More information about the gromacs.org_gmx-users mailing list