[gmx-users] multi problem

Mark Abraham Mark.Abraham at anu.edu.au
Wed Jun 13 09:42:44 CEST 2007


Andrei Neamtu wrote:
> Dear gmx users,
> 
> I have problems in running a simulation on several nodes using the 
> -multi option:
> 
> I make the .tpr files for different temperatures (I want to use the REMD 
> code)
> 
> 
> grompp -f param0.mdp -po param0.out.mdp -c conf.gro -p topol.top -o 
> sim0.tpr
> grompp -f param1.mdp -po param1.out.mdp -c conf.gro -p topol.top -o 
> sim1.tpr
> grompp -f param2.mdp -po param2.out.mdp -c conf.gro -p topol.top -o 
> sim2.tpr
> .
> .
> .
> grompp -f paramN.mdp -po paramN.out.mdp -c conf.gro -p topol.top -o 
> simN.tpr
> 
> where N is the number of nodes in my cluster (P4 with gigabit ethernet)
> 
> After the collection of input .tpr files are generated I start the 
> simulations with:
> 
> mpirun -np N mdrun_mpi -np N -multi -replex 2000 -s sim.tpr -o sim.trr 
> .......
> 
> but the program stops saying that it cannot find the sim1.tpr, sim2.tpr 
> ....
> It do *FINDS* the sim0.tpr for the node where I start the simulation but 
> for the rest of nodes it doesn't.

Have you made sure the other nodes are using the same working directory 
and/or its contents are being propagated properly? Each MPI process 
tries to load a different .tpr and (at least) the correct one needs to 
be accessible from any given node.

> I tried to put the simulation on a multi core machine (2 cores) and it 
> works fine for N=2. But when I link 2 multicore machines the program 
> stops with the same message except that the first .tpr file not found is 
> the one corresponding to the second machine.
> 
> I saw on the list that there were similar problems in the past but I 
> didn't find any solution to them. I tried to soft link the mdrun_mpi 
> executable in the working directory but the problem persists.

That isn't the problem - finding the .tpr is.

Mark



More information about the gromacs.org_gmx-users mailing list