[gmx-users] Can not open file: topol.tpr
Guillem Plasencia
guillem_pg at hotmail.com
Thu Apr 27 12:32:51 CEST 2006
Hello listers,
this is my first try with Gromacs (3.3.1). I've installed LAM-MPI (7.1.2)
and FFTW3 in my two Dual Intel P4 CPU machines (4 physical CPUs, 8 with
hyperthreading on, i've already read in the mailing list archive that i
should turn off hyperthreading until Gromacs 4 release to improve
performance) running Fedora Core 4 (kernel 2.6).
Just to test the parallel processing, i downloaded and tried to run one of
the benchmark tests (d.lzm).
I prepared it with:
grompp -f cutoff.mdp -c conf.gro -p topol.top -np 2
(here i had to read the archives to avoid temptation to include -nt 2, which
even including --enable-threads in configure options gave me an error).
But when tried to run it in my two-nodes as a parallel task with:
mpirun n0,1 mdrun -s topol.tpr -np 2
i got the following output from mdrun:
NNODES=2, MYRANK=1, HOSTNAME=lead8
NNODES=2, MYRANK=0, HOSTNAME=lead7
NODEID=1 argc=5
NODEID=0 argc=5
>>>CUT SOME MDRUN HELP INFO >>>
-------------------------------------------------------
Program mdrun, VERSION 3.3.1
Source code file: gmxfio.c, line: 706
Can not open file:
topol.tpr
-------------------------------------------------------
"I'm a Jerk" (F. Black)
Error on node 0, will try to stop all the nodes
Halting parallel program mdrun on CPU 0 out of 2
gcq#171: "I'm a Jerk" (F. Black)
-----------------------------------------------------------------------------
One of the processes started by mpirun has exited with a nonzero exit
code. This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.
PID 27859 failed on node n1 (192.168.1.9) with exit status 1.
-----------------------------------------------------------------------------
You can see from lamnodes that node n1 is the originating node
>lamnodes
n0 lead7:2:
n1 192.168.1.9:2:origin,this_node
and from ps -leaf | grep mdrun i can see that both processes have been
started, but neither uses CPU at all. So far, i guess this is because if the
originating node (n1) can't read topol.tpr file, it can't distribute tasks
amongst nodes (which would be causing the unknown error in node 0, the other
node).
Any ideas on what's happening? How do i solve it?
Thank you very much !
Guillem Plasencia
Spain.
P.D. I've read on the archives that there was some interest in knowing if
hyperthreading is still doing wrong balancing in linux kernel 2.6, which
happens to be the kernel i'm running. I'd be pleased to test both HT on and
off on my nodes, of course as soon as i solve this problem with topol.tpr
file.
More information about the gromacs.org_gmx-users
mailing list