[gmx-users] problem with parallelization on a dual quad-core machine

Hongxian He hongxian.he at gmail.com
Thu Aug 14 23:18:57 CEST 2008


Hi All:

I know there have been a lot of messages posted on this subject in the past,
and I've tried all the solutions that have been suggested - but none has
worked for me so far...

The machine I am running on is a dual quad-core IBM Xeon, with Red Hat
Enterprise 4 OS. I am hoping to take advantage of the 8 CPUs with running
gromacs in MPI version. The LAM/MPI installed is LAM 7.1.2/MPI 2 C++/ROMIO -
Indiana University version.

The Gromacs was installed successfully from RPM built on CentOS 4
(gromacs-3.3.3-1.x86_64.rpm, gromacs-mpi-3.3.3-1.x86_64.rpm, with
fftw3-3.0.1.1-4.x86_x64.rpm) - compilation from source code failed. 2 mdrun
programs were installed as the result: mdrun and mdrun_mpi, the latter seems
to be the MPI version of mdrun.

For testing purpose, I specified only 2 CPUs with LAM/MPI:
hostfile:
mymachinename cpu=2

%lamboot hostfile

%lamnodes
n0      mymachinename:2:origin,this_node

mpirun seems to be working fine using the test program provided in lam/mpi
distribution.

In order to test Gromacs, I used the files in the tutorial package:
tutor/methanol.
Single CPU run works fine (with mdrun), but I ran into problems when trying
to run in parallel mode (with mdrun_mpi).

These are the commands I ran
%grompp -v -np 2 -shuffle -sort -o 2cpu.tpr

%mpirun -np 2 /usr/bin/mdrun_mpi -v -np 2 -s 2cpu.tpr >& run.log

In the run.log, it does not look like 2 processors are being used (MYRANK=0
line appeared twice)
NNODES=1, MYRANK=0, HOSTNAME=mymachinename
NNODES=1, MYRANK=0, HOSTNAME=mymachinename

And toward the end of the log is the following error message:

-------------------------------------------------------
Program mdrun_mpi, VERSION 3.3.3
Source code file: init.c, line: 69

Fatal error:
run input file 2cpu.tpr was made for 2 nodes,
             while mdrun_mpi expected it to be for 1 nodes.
-------------------------------------------------------

-----------------------------------------------------------------------------
It seems that [at least] one of the processes that was started with
mpirun did not invoke MPI_INIT before quitting (it is possible that
more than one process did not invoke MPI_INIT -- mpirun was only
notified of the first one, which was on node n0).

mpirun can *only* be used with MPI programs (i.e., programs that
invoke MPI_INIT and MPI_FINALIZE).  You can use the "lamexec" program
to run non-MPI programs over the lambooted nodes.
-----------------------------------------------------------------------------

I COULD run the following commands:

%grompp -v -o topol.tpr
%mpirun -np 2 /usr/bin/mdrun_mpi -v [-multi] -np 2 -s topol.tpr >& run.log

But this essentially ran the same process twice on each CPU, it is not a
real parallelization.

What did I do wrong?...I greatly appreciate any help!

Many thanks,
Hongxian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20080814/4e81fd4a/attachment.html>


More information about the gromacs.org_gmx-users mailing list