[gmx-users] Can not open file: topol.tpr

Guillem Plasencia guillem_pg at hotmail.com
Thu Apr 27 12:32:51 CEST 2006


Hello listers,

this is my first try with Gromacs (3.3.1). I've installed LAM-MPI (7.1.2) 
and FFTW3 in my two Dual Intel P4 CPU machines (4 physical CPUs, 8 with 
hyperthreading on, i've already read in the mailing list archive that i 
should turn off hyperthreading until Gromacs 4 release to improve 
performance) running Fedora Core 4 (kernel 2.6).

Just to test the parallel processing, i downloaded and tried to run one of 
the benchmark tests (d.lzm).

I prepared it with:

grompp -f cutoff.mdp -c conf.gro -p topol.top -np 2

(here i had to read the archives to avoid temptation to include -nt 2, which 
even including --enable-threads in configure options gave me an error).


But when tried to run it in my two-nodes as a parallel task with:

mpirun n0,1 mdrun -s topol.tpr -np 2

i got the following output from mdrun:

NNODES=2, MYRANK=1, HOSTNAME=lead8
NNODES=2, MYRANK=0, HOSTNAME=lead7
NODEID=1 argc=5
NODEID=0 argc=5
>>>CUT SOME MDRUN HELP INFO >>>
-------------------------------------------------------
Program mdrun, VERSION 3.3.1
Source code file: gmxfio.c, line: 706

Can not open file:
topol.tpr
-------------------------------------------------------

"I'm a Jerk" (F. Black)

Error on node 0, will try to stop all the nodes
Halting parallel program mdrun on CPU 0 out of 2

gcq#171: "I'm a Jerk" (F. Black)

-----------------------------------------------------------------------------
One of the processes started by mpirun has exited with a nonzero exit
code.  This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.

PID 27859 failed on node n1 (192.168.1.9) with exit status 1.
-----------------------------------------------------------------------------

You can see from lamnodes that node n1 is the originating node

>lamnodes
n0      lead7:2:
n1      192.168.1.9:2:origin,this_node

and from ps -leaf | grep mdrun i can see that both processes have been 
started, but neither uses CPU at all. So far, i guess this is because if the 
originating node (n1) can't read topol.tpr file, it can't distribute tasks 
amongst nodes (which would be causing the unknown error in node 0, the other 
node).

Any ideas on what's happening? How do i solve it?

Thank you very much !

Guillem Plasencia
Spain.

P.D. I've read on the archives that there was some interest in knowing if 
hyperthreading is still doing wrong balancing in linux kernel 2.6, which 
happens to be the kernel i'm running. I'd be pleased to test both HT on and 
off on my nodes, of course as soon as i solve this problem with topol.tpr 
file.





More information about the gromacs.org_gmx-users mailing list