[gmx-users] “Fatal error in PMPI_Bcast: Other MPI error, …..” occurs when using the ‘particle decomposition’ option.

xhomes at sohu.com xhomes at sohu.com
Tue Jun 1 03:51:35 CEST 2010


Hi, everyone of gmx-users,

I met a problem when I use the ‘particle decomposition’ option in a NTP MD simulation of Engrailed Homeodomain (En) in CL- neutralized water box. It just crashed with an error “Fatal error in PMPI_Bcast: Other MPI error, error stack: …..”. However, I’ve tried the ‘domain decomposition’ and everything is ok! I use the Gromacs 4.05 and 4.07, the MPI lib is mpich2-1.2.1p1. The system box size is 5.386(nm)3. The MDP file list as below:

########################################################
title                    = En
;cpp                      = /lib/cpp
;include                  = -I../top
define                   = 
integrator               = md
dt                       = 0.002
nsteps                   = 3000000
nstxout                  = 500
nstvout                  = 500
nstlog                   = 250
nstenergy                = 250
nstxtcout                 = 500
comm-mode              = Linear
nstcomm                  = 1

;xtc_grps                 = Protein
energygrps               = protein non-protein

nstlist                  = 10
ns_type                  = grid
pbc                      = xyz	;default xyz
;periodic_molecules       = yes	;default no
rlist                    = 1.0

coulombtype              = PME
rcoulomb                 = 1.0
vdwtype                  = Cut-off
rvdw                     = 1.4
fourierspacing           = 0.12
fourier_nx               = 0
fourier_ny               = 0
fourier_nz               = 0
pme_order                = 4
ewald_rtol               = 1e-5
optimize_fft             = yes

tcoupl                   = v-rescale
tc_grps                  = protein non-protein
tau_t                    = 0.1  0.1
ref_t                    = 298  298
Pcoupl                   = Parrinello-Rahman
pcoupltype               = isotropic
tau_p                    = 0.5
compressibility          = 4.5e-5
ref_p                    = 1.0

gen_vel                  = yes
gen_temp                 = 298
gen_seed                 = 173529

constraints              = hbonds
lincs_order              = 10
########################################################

When I conduct MD using “nohup mpiexec -np 2 mdrun_dmpi -s 11_Trun.tpr -g 12_NTPmd.log -o 12_NTPmd.trr -c 12_NTPmd.pdb -e 12_NTPmd_ener.edr -cpo 12_NTPstate.cpt &”, everything is OK.

Since the system doesn’t support more than 2 processes under ‘domain decomposition’ option, it took me about 30 days to calculate a 6ns trajectory. Then I decide to use the ‘particle decomposition’ option. The command line is “nohup mpiexec -np 6 mdrun_dmpi -pd -s 11_Trun.tpr -g 12_NTPmd.log -o 12_NTPmd.trr -c 12_NTPmd.pdb -e 12_NTPmd_ener.edr -cpo 12_NTPstate.cpt &”. And I got the crash in the nohup file like below:
####################
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(1302)......................: MPI_Bcast(buf=0x8fedeb0, count=60720, MPI_BYTE, root=0, MPI_COMM_WORLD) failed
MPIR_Bcast(998).......................: 
MPIR_Bcast_scatter_ring_allgather(842): 
MPIR_Bcast_binomial(187)..............: 
MPIC_Send(41).........................: 
MPIC_Wait(513)........................: 
MPIDI_CH3I_Progress(150)..............: 
MPID_nem_mpich2_blocking_recv(948)....: 
MPID_nem_tcp_connpoll(1720)...........: 
state_commrdy_handler(1561)...........: 
MPID_nem_tcp_send_queued(127).........: writev to socket failed - Bad address
rank 0 in job 25  cluster.cn_52655   caused collective abort of all ranks
exit status of rank 0: killed by signal 9
####################

And the ends of the log file list as below:
####################
……..
……..
……..
……..
   bQMMM            = FALSE
   QMconstraints        = 0
   QMMMscheme       = 0
   scalefactor           = 1
qm_opts:
   ngQM                 = 0
####################

I’ve search the gmx-users mail list and tried to adjust the md parameters, and no solution was found. The "mpiexec -np x" option doesn't work except when x=1. I did found that when the whole En protein is constrained using position restraints (define = -DPOSRES), the ‘particle decomposition’ option works. However this is not the kind of MD I want to conduct.
 
Could anyone help me about this problem? And I also want to know how can I accelerate this kind of MD (long time simulation of small system) using Gromacs? Thinks a lot!

(Further information about the simulated system: The system has one En protein (54 residues, 629 atoms), total 4848 spce waters, and 7 Cl- used to neutralize the system. The system has been minimized first. A 20ps MD is also performed for the waters and ions before EM.)




More information about the gromacs.org_gmx-users mailing list