[gmx-users] parallel problems...

Florian Haberl Florian.Haberl at chemie.uni-erlangen.de
Tue Feb 1 15:54:58 CET 2005


Hi,

> I have a problem to run things on more than 4 cpus on our brand new
> cluster. I use a PBS script which launches the command
> ----------------------------------------
> /usr/local/encap/mpich-126-gcc-64/bin/mpirun -machinefile machines -np 4
> /swc/gromacs/bin/mdrun_mpi -np 4 -v -s RW12_s20_a.tpr -o RW12_s20_a.trr
> -x RW12_s20_a.xtc -c RW12_s20_a.gro -e RW12_s20_a.edr -g RW12_s20_a.log
> ---------------------------------------------
> This runs nicely, but if I use 6 cpus (or 8) the calculation crashes and
> an error is reported in the standard out file:
> ---------------------------------------------------------------------------
>-- rm_10214:  p4_error: semget failed for setnum: 0
> p0_7618: (0.113281) net_recv failed for fd = 7
> p0_7618:  p4_error: net_recv read, errno = : 104
> p0_7618: (4.117188) net_send: could not write to fd=4, errno = 32
> ---------------------------------------------------------------------------
>------ Moreover, there are some scattered mdrun_mpi processes that keeps on
> running (but never on the master node).

I think that your interconnection will be too slow, i guess only with faster 
interconnections (like myrinet or iband) you can use more cpus, or you switch 
to quad systems.
Don^t know if further versions of gromacs will scale better on normal nodes (i 
hope so).

>
> I have tryed to use mpiexec instead of mpirun
> ------------------------------------
> /usr/local/bin/mpiexec -verbose -comm lam -kill -mpich-p4-no-shmem
> $APPLICATION $RUNFLAGS
> --------------------------------------------
> Then I get a different error message to the error output file:
> ---------------------------------------------------------------------------
>----------- mpiexec: Warning: parse_args: argument "-mpich-p4-[no-]shmem"
> ignored since communication library not MPICH/P4.
> NNODES=1, MYRANK=0, HOSTNAME=node24.beowulf.cluster
>
>                          :-)  G  R  O  M  A  C  S  (-:
>
>                    Good gRace! Old Maple Actually Chews Slate
>
>                             :-)  VERSION 3.2.1  (-:
>
>       Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
>        Copyright (c) 1991-2000, University of Groningen, The Netherlands.
>              Copyright (c) 2001-2004, The GROMACS development team,
>             check out http://www.gromacs.org for more information.
>
> : And so on until
>
> [no]compact   bool    yes  Write a compact log file
>   -[no]multi   bool     no  Do multiple simulations in parallel (only with
>                             -np > 1)
>    -[no]glas   bool     no  Do glass simulation with special long range
>                             corrections
>  -[no]ionize   bool     no  Do a simulation including the effect of an
> X-Ray bombardment on your system
>
> Fatal error: Could not open RW12_s20_a.log
> [0] MPI Abort by user Aborting program !
> [0] Aborting program!
>     p4_error: latest msg from perror: Permission denied
> ---------------------------------------------------------------------------
>-----------------------------
>
> Notice that the node (node24.bewoulf.cluster) tryes to write to the file
> RW12_s20_a.log and fails. The normal thing would be to write to the file
> RW12_s20_a#.log (where # is the node number)
>

> Is there a way to use mpiexec instead of mpirun? I have some vague
> feeling that this may help...

One is host based communication one socket based (ssh keys have to be made) 
but i don^t think that this will improve the communication (or latency) so it 
will run with more cpus, and with good load ( > 60 % of each cpu).


Greetings,

Florian

-- 
-------------------------------------------------------------------------------
 Florian Haberl                            Universitaet Erlangen/
 Computer-Chemie-Centrum    Nuernberg
                                                      Naegelsbachstr. 25
                                                      D-91052 Erlangen
  Mailto: florian.haberl AT chemie.uni-erlangen.de
 -------------------------------------------------------------------------------




More information about the gromacs.org_gmx-users mailing list