[gmx-users] “Fatal error in PMPI_Bcast: Other MPI error, …..” occurs when using the ‘particle decomposition’ option.
xhomes at sohu.com
xhomes at sohu.com
Tue Jun 1 03:51:35 CEST 2010
Hi, everyone of gmx-users,
I met a problem when I use the ‘particle decomposition’ option in a NTP MD simulation of Engrailed Homeodomain (En) in CL- neutralized water box. It just crashed with an error “Fatal error in PMPI_Bcast: Other MPI error, error stack: …..”. However, I’ve tried the ‘domain decomposition’ and everything is ok! I use the Gromacs 4.05 and 4.07, the MPI lib is mpich2-1.2.1p1. The system box size is 5.386(nm)3. The MDP file list as below:
########################################################
title = En
;cpp = /lib/cpp
;include = -I../top
define =
integrator = md
dt = 0.002
nsteps = 3000000
nstxout = 500
nstvout = 500
nstlog = 250
nstenergy = 250
nstxtcout = 500
comm-mode = Linear
nstcomm = 1
;xtc_grps = Protein
energygrps = protein non-protein
nstlist = 10
ns_type = grid
pbc = xyz ;default xyz
;periodic_molecules = yes ;default no
rlist = 1.0
coulombtype = PME
rcoulomb = 1.0
vdwtype = Cut-off
rvdw = 1.4
fourierspacing = 0.12
fourier_nx = 0
fourier_ny = 0
fourier_nz = 0
pme_order = 4
ewald_rtol = 1e-5
optimize_fft = yes
tcoupl = v-rescale
tc_grps = protein non-protein
tau_t = 0.1 0.1
ref_t = 298 298
Pcoupl = Parrinello-Rahman
pcoupltype = isotropic
tau_p = 0.5
compressibility = 4.5e-5
ref_p = 1.0
gen_vel = yes
gen_temp = 298
gen_seed = 173529
constraints = hbonds
lincs_order = 10
########################################################
When I conduct MD using “nohup mpiexec -np 2 mdrun_dmpi -s 11_Trun.tpr -g 12_NTPmd.log -o 12_NTPmd.trr -c 12_NTPmd.pdb -e 12_NTPmd_ener.edr -cpo 12_NTPstate.cpt &”, everything is OK.
Since the system doesn’t support more than 2 processes under ‘domain decomposition’ option, it took me about 30 days to calculate a 6ns trajectory. Then I decide to use the ‘particle decomposition’ option. The command line is “nohup mpiexec -np 6 mdrun_dmpi -pd -s 11_Trun.tpr -g 12_NTPmd.log -o 12_NTPmd.trr -c 12_NTPmd.pdb -e 12_NTPmd_ener.edr -cpo 12_NTPstate.cpt &”. And I got the crash in the nohup file like below:
####################
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(1302)......................: MPI_Bcast(buf=0x8fedeb0, count=60720, MPI_BYTE, root=0, MPI_COMM_WORLD) failed
MPIR_Bcast(998).......................:
MPIR_Bcast_scatter_ring_allgather(842):
MPIR_Bcast_binomial(187)..............:
MPIC_Send(41).........................:
MPIC_Wait(513)........................:
MPIDI_CH3I_Progress(150)..............:
MPID_nem_mpich2_blocking_recv(948)....:
MPID_nem_tcp_connpoll(1720)...........:
state_commrdy_handler(1561)...........:
MPID_nem_tcp_send_queued(127).........: writev to socket failed - Bad address
rank 0 in job 25 cluster.cn_52655 caused collective abort of all ranks
exit status of rank 0: killed by signal 9
####################
And the ends of the log file list as below:
####################
……..
……..
……..
……..
bQMMM = FALSE
QMconstraints = 0
QMMMscheme = 0
scalefactor = 1
qm_opts:
ngQM = 0
####################
I’ve search the gmx-users mail list and tried to adjust the md parameters, and no solution was found. The "mpiexec -np x" option doesn't work except when x=1. I did found that when the whole En protein is constrained using position restraints (define = -DPOSRES), the ‘particle decomposition’ option works. However this is not the kind of MD I want to conduct.
Could anyone help me about this problem? And I also want to know how can I accelerate this kind of MD (long time simulation of small system) using Gromacs? Thinks a lot!
(Further information about the simulated system: The system has one En protein (54 residues, 629 atoms), total 4848 spce waters, and 7 Cl- used to neutralize the system. The system has been minimized first. A 20ps MD is also performed for the waters and ions before EM.)
More information about the gromacs.org_gmx-users
mailing list