[gmx-users] Fw: About MPIRUN (Chuanjie Wu) Errors

chuanjiewu at 126.com chuanjiewu at 126.com
Fri May 27 03:01:44 CEST 2005


I have lamboot our Cluster with

lamboot -v hostfile

Then the job as follows works on distributed computer nodes:

$HOME/grotest/bin/grompp_mpi -np 9 -shuffle -sort -v -f full -o full -c after_pr -p speptide >& grompp.info &
$MPI_DIR/bin/mpirun n0-8 /home/test0/grotest/bin/mdrun_mpi -v -s full -o full -c after_full -g speptide.log >& pro.info &

However, if I use the following commands:

$MPI_DIR/bin/mpirun -np 9 /home/test0/grotest/bin/mdrun_mpi -v -s full -o full -c after_full -g speptide.log >& pro.info &

An error will prompt as:

The selected RPI failed to initialize during MPI_INIT.  This is a
fatal error; I must abort.

This occurred on host compute-0-0.local (n1).
The PID of failed process was 29396 (MPI_COMM_WORLD rank: 2)
One of the processes started by mpirun has exited with a nonzero exit
code.  This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.

PID 4448 failed on node n0 ( with exit status 1.

The difference between the two commands is that "n0-8" was changed into "-np 9".
I have tested the command with "n0-x" by changing the nodes number, and I did not find the parallel brought faster calculation than the unparalleled jobs.

I wonder what is the problem?

Best wishes,

Chuanjie Wu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20050527/dc2a6bef/attachment.html>

More information about the gromacs.org_gmx-users mailing list