Re: [gmx-users] “Fatal error in PMPI_Bcast: Other MPI error, …..” occurs when using the ‘particle decomposition’ option.
Mark Abraham
mark.abraham at anu.edu.au
Tue Jun 1 04:29:24 CEST 2010
----- Original Message -----
From: xhomes at sohu.com
Date: Tuesday, June 1, 2010 11:53
Subject: [gmx-users] “Fatal error in PMPI_Bcast: Other MPI error, …..” occurs when using the ‘particle decomposition’ option.
To: gmx-users <gmx-users at gromacs.org>
> Hi, everyone of gmx-users,
>
> I met a problem when I use the ‘particle decomposition’ option
> in a NTP MD simulation of Engrailed Homeodomain (En) in CL-
> neutralized water box. It just crashed with an error “Fatal
> error in PMPI_Bcast: Other MPI error, error stack: …..”.
> However, I’ve tried the ‘domain decomposition’ and everything is
> ok! I use the Gromacs 4.05 and 4.07, the MPI lib is mpich2-
> 1.2.1p1. The system box size is 5.386(nm)3. The MDP file list as
> below:
> ########################################################
> title = En
> ;cpp = /lib/cpp
> ;include = -I../top
> define =
> integrator = md
> dt = 0.002
> nsteps = 3000000
> nstxout = 500
> nstvout = 500
> nstlog = 250
> nstenergy = 250
> nstxtcout = 500
> comm-
> mode = Linear
> nstcomm = 1
>
> ;xtc_grps = Protein
> energygrps = protein non-protein
>
> nstlist = 10
> ns_type = grid
> pbc = xyz ;default xyz
> ;periodic_molecules =
> yes ;default no
> rlist = 1.0
>
> coulombtype = PME
> rcoulomb = 1.0
> vdwtype = Cut-off
> rvdw = 1.4
> fourierspacing = 0.12
> fourier_nx = 0
> fourier_ny = 0
> fourier_nz = 0
> pme_order = 4
> ewald_rtol = 1e-5
> optimize_fft = yes
>
> tcoupl = v-rescale
> tc_grps = protein non-protein
> tau_t = 0.1 0.1
> ref_t = 298 298
> Pcoupl = Parrinello-Rahman
> pcoupltype = isotropic
> tau_p = 0.5
> compressibility = 4.5e-5
> ref_p = 1.0
>
> gen_vel = yes
> gen_temp = 298
> gen_seed = 173529
>
> constraints = hbonds
> lincs_order = 10
> ########################################################
>
> When I conduct MD using “nohup mpiexec -np 2 mdrun_dmpi -s
> 11_Trun.tpr -g 12_NTPmd.log -o 12_NTPmd.trr -c 12_NTPmd.pdb -e
> 12_NTPmd_ener.edr -cpo 12_NTPstate.cpt &”, everything is OK.
>
> Since the system doesn’t support more than 2 processes under
> ‘domain decomposition’ option, it took me about 30 days to
> calculate a 6ns trajectory. Then I decide to use the ‘particle
Why no more than 2? What GROMACS version? Why are you using double precision with temperature coupling?
MPICH has known issues. Use OpenMPI.
> decomposition’ option. The command line is “nohup mpiexec -np 6
> mdrun_dmpi -pd -s 11_Trun.tpr -g 12_NTPmd.log -o 12_NTPmd.trr -c
> 12_NTPmd.pdb -e 12_NTPmd_ener.edr -cpo 12_NTPstate.cpt &”. And I
> got the crash in the nohup file like below:
> ####################
> Fatal error in PMPI_Bcast: Other MPI error, error stack:
> PMPI_Bcast(1302)......................: MPI_Bcast(buf=0x8fedeb0,
> count=60720, MPI_BYTE, root=0, MPI_COMM_WORLD) failed
> MPIR_Bcast(998).......................:
> MPIR_Bcast_scatter_ring_allgather(842):
> MPIR_Bcast_binomial(187)..............:
> MPIC_Send(41).........................:
> MPIC_Wait(513)........................:
> MPIDI_CH3I_Progress(150)..............:
> MPID_nem_mpich2_blocking_recv(948)....:
> MPID_nem_tcp_connpoll(1720)...........:
> state_commrdy_handler(1561)...........:
> MPID_nem_tcp_send_queued(127).........: writev to socket failed -
> Bad address
> rank 0 in job 25 cluster.cn_52655 caused
> collective abort of all ranks
> exit status of rank 0: killed by signal 9
> ####################
>
> And the ends of the log file list as below:
> ####################
> ……..
> ……..
> ……..
> ……..
>
> bQMMM = FALSE
>
> QMconstraints = 0
> QMMMscheme = 0
>
> scalefactor = 1
> qm_opts:
>
> ngQM = 0
> ####################
>
> I’ve search the gmx-users mail list and tried to adjust the md
> parameters, and no solution was found. The "mpiexec -np x"
> option doesn't work except when x=1. I did found that when the
> whole En protein is constrained using position restraints
> (define = -DPOSRES), the ‘particle decomposition’ option works.
> However this is not the kind of MD I want to conduct.
>
> Could anyone help me about this problem? And I also want to know
> how can I accelerate this kind of MD (long time simulation of
> small system) using Gromacs? Thinks a lot!
>
> (Further information about the simulated system: The system has
> one En protein (54 residues, 629 atoms), total 4848 spce waters,
> and 7 Cl- used to neutralize the system. The system has been
> minimized first. A 20ps MD is also performed for the waters and
> ions before EM.)
This should be bread-and-butter with either decomposition up to at least 16 processors, for a correctly compiled GROMACS with a useful MPI library.
Mark
More information about the gromacs.org_gmx-users
mailing list