[gmx-users] multi problem
Mark Abraham
Mark.Abraham at anu.edu.au
Wed Jun 13 09:42:44 CEST 2007
Andrei Neamtu wrote:
> Dear gmx users,
>
> I have problems in running a simulation on several nodes using the
> -multi option:
>
> I make the .tpr files for different temperatures (I want to use the REMD
> code)
>
>
> grompp -f param0.mdp -po param0.out.mdp -c conf.gro -p topol.top -o
> sim0.tpr
> grompp -f param1.mdp -po param1.out.mdp -c conf.gro -p topol.top -o
> sim1.tpr
> grompp -f param2.mdp -po param2.out.mdp -c conf.gro -p topol.top -o
> sim2.tpr
> .
> .
> .
> grompp -f paramN.mdp -po paramN.out.mdp -c conf.gro -p topol.top -o
> simN.tpr
>
> where N is the number of nodes in my cluster (P4 with gigabit ethernet)
>
> After the collection of input .tpr files are generated I start the
> simulations with:
>
> mpirun -np N mdrun_mpi -np N -multi -replex 2000 -s sim.tpr -o sim.trr
> .......
>
> but the program stops saying that it cannot find the sim1.tpr, sim2.tpr
> ....
> It do *FINDS* the sim0.tpr for the node where I start the simulation but
> for the rest of nodes it doesn't.
Have you made sure the other nodes are using the same working directory
and/or its contents are being propagated properly? Each MPI process
tries to load a different .tpr and (at least) the correct one needs to
be accessible from any given node.
> I tried to put the simulation on a multi core machine (2 cores) and it
> works fine for N=2. But when I link 2 multicore machines the program
> stops with the same message except that the first .tpr file not found is
> the one corresponding to the second machine.
>
> I saw on the list that there were similar problems in the past but I
> didn't find any solution to them. I tried to soft link the mdrun_mpi
> executable in the working directory but the problem persists.
That isn't the problem - finding the .tpr is.
Mark
More information about the gromacs.org_gmx-users
mailing list