[gmx-users] parallel problems...
Florian Haberl
Florian.Haberl at chemie.uni-erlangen.de
Tue Feb 1 15:54:58 CET 2005
Hi,
> I have a problem to run things on more than 4 cpus on our brand new
> cluster. I use a PBS script which launches the command
> ----------------------------------------
> /usr/local/encap/mpich-126-gcc-64/bin/mpirun -machinefile machines -np 4
> /swc/gromacs/bin/mdrun_mpi -np 4 -v -s RW12_s20_a.tpr -o RW12_s20_a.trr
> -x RW12_s20_a.xtc -c RW12_s20_a.gro -e RW12_s20_a.edr -g RW12_s20_a.log
> ---------------------------------------------
> This runs nicely, but if I use 6 cpus (or 8) the calculation crashes and
> an error is reported in the standard out file:
> ---------------------------------------------------------------------------
>-- rm_10214: p4_error: semget failed for setnum: 0
> p0_7618: (0.113281) net_recv failed for fd = 7
> p0_7618: p4_error: net_recv read, errno = : 104
> p0_7618: (4.117188) net_send: could not write to fd=4, errno = 32
> ---------------------------------------------------------------------------
>------ Moreover, there are some scattered mdrun_mpi processes that keeps on
> running (but never on the master node).
I think that your interconnection will be too slow, i guess only with faster
interconnections (like myrinet or iband) you can use more cpus, or you switch
to quad systems.
Don^t know if further versions of gromacs will scale better on normal nodes (i
hope so).
>
> I have tryed to use mpiexec instead of mpirun
> ------------------------------------
> /usr/local/bin/mpiexec -verbose -comm lam -kill -mpich-p4-no-shmem
> $APPLICATION $RUNFLAGS
> --------------------------------------------
> Then I get a different error message to the error output file:
> ---------------------------------------------------------------------------
>----------- mpiexec: Warning: parse_args: argument "-mpich-p4-[no-]shmem"
> ignored since communication library not MPICH/P4.
> NNODES=1, MYRANK=0, HOSTNAME=node24.beowulf.cluster
>
> :-) G R O M A C S (-:
>
> Good gRace! Old Maple Actually Chews Slate
>
> :-) VERSION 3.2.1 (-:
>
> Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
> Copyright (c) 1991-2000, University of Groningen, The Netherlands.
> Copyright (c) 2001-2004, The GROMACS development team,
> check out http://www.gromacs.org for more information.
>
> : And so on until
>
> [no]compact bool yes Write a compact log file
> -[no]multi bool no Do multiple simulations in parallel (only with
> -np > 1)
> -[no]glas bool no Do glass simulation with special long range
> corrections
> -[no]ionize bool no Do a simulation including the effect of an
> X-Ray bombardment on your system
>
> Fatal error: Could not open RW12_s20_a.log
> [0] MPI Abort by user Aborting program !
> [0] Aborting program!
> p4_error: latest msg from perror: Permission denied
> ---------------------------------------------------------------------------
>-----------------------------
>
> Notice that the node (node24.bewoulf.cluster) tryes to write to the file
> RW12_s20_a.log and fails. The normal thing would be to write to the file
> RW12_s20_a#.log (where # is the node number)
>
> Is there a way to use mpiexec instead of mpirun? I have some vague
> feeling that this may help...
One is host based communication one socket based (ssh keys have to be made)
but i don^t think that this will improve the communication (or latency) so it
will run with more cpus, and with good load ( > 60 % of each cpu).
Greetings,
Florian
--
-------------------------------------------------------------------------------
Florian Haberl Universitaet Erlangen/
Computer-Chemie-Centrum Nuernberg
Naegelsbachstr. 25
D-91052 Erlangen
Mailto: florian.haberl AT chemie.uni-erlangen.de
-------------------------------------------------------------------------------
More information about the gromacs.org_gmx-users
mailing list