Re: [gmx-users] “Fatal error in PMPI_Bcast: Other MPI error, …..” occurs when using the ‘particle decomposition’ option.

xhomes at sohu.com xhomes at sohu.com
Tue Jun 1 13:57:25 CEST 2010


Hi, Mark,<o:p></o:p>

Thanks for the
reply! <o:p></o:p>

It seemed that I
got something messed up. At the beginning, I used ‘constraints = all-bonds’ and
‘domain decomposition’.
When the simulation scale to more than 2 processes, an error
like below will occur: <o:p></o:p>

####################<o:p></o:p>

Fatal error: There
is no domain decomposition for 6 nodes that is compatible with the given box
and a minimum cell size of 2.06375 nm<o:p></o:p>

Change the number
of nodes or mdrun option -rcon or -dds or your LINCS settings<o:p></o:p>

Look in the log
file for details on the domain decomposition<o:p></o:p>

####################<o:p></o:p>

<o:p>&nbsp;</o:p>

I refer to the
manual and found no answer. Then I turned to use ‘particle decomposition’, tried
all kind of method, including change mpich to lammpi, change Gromacs from V4.05
to V4.07,adjusting the mdp file (e.g. ‘constraints = hbonds’ or no PME), and none of these
take effect! I thought I have tried ‘constraints = hbonds’ with ‘domain decomposition’, at least with lammpi. <o:p></o:p>

However, when I tried ‘constraints
= hbonds’
and ‘domain decomposition’ under
mpich today, it scaled to more than 2 processes well! And now it also scaled
well under lammpi using ‘constraints
= hbonds’
and ‘domain decomposition’!<o:p></o:p>

<o:p>&nbsp;</o:p>

So, it seemed the key is ‘constraints
= hbonds’
for ‘domain decomposition’.<o:p></o:p>

<o:p>&nbsp;</o:p>

Of course, the simulation still crashed when using ‘particle decomposition’ with ‘constraints = hbonds or all-bonds’, and I
don’t know why.<o:p></o:p>

<o:p>&nbsp;</o:p>

I use double precision version and NTP ensemble to perform a PCA!<o:p></o:p> ----- Original Message -----From: xhomes at sohu.comDate: Tuesday, June 1, 2010 11:53Subject: [gmx-users] “Fatal error in PMPI_Bcast: Other MPI error, …..” occurs when using the ‘particle decomposition’ option.To: gmx-users &lt;gmx-users at gromacs.org&gt;&gt; Hi, everyone of gmx-users,&gt; &gt; I met a problem when I use the ‘particle decomposition’ option &gt; in a NTP MD simulation of Engrailed Homeodomain (En) in CL- &gt; neutralized water box. It just crashed with an error “Fatal &gt; error in PMPI_Bcast: Other MPI error, error stack: …..”. &gt; However, I’ve tried the ‘domain decomposition’ and everything is &gt; ok! I use the Gromacs 4.05 and 4.07, the MPI lib is mpich2-&gt; 1.2.1p1. The system box size is 5.386(nm)3. The MDP file list as &gt; below:&gt; ########################################################&gt; title&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = En&gt; ;cpp&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = /lib/cpp&gt; ;include&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = -I../top&gt; define&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = &gt; integrator&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = md&gt; dt&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = 0.002&gt; nsteps&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = 3000000&gt; nstxout&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = 500&gt; nstvout&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = 500&gt; nstlog&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = 250&gt; nstenergy&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = 250&gt; nstxtcout&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = 500&gt; comm-&gt; mode&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = Linear&gt; nstcomm&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = 1&gt; &gt; ;xtc_grps&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = Protein&gt; energygrps&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = protein non-protein&gt; &gt; nstlist&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = 10&gt; ns_type&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = grid&gt; pbc&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = xyz	;default xyz&gt; ;periodic_molecules&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = &gt; yes	;default no&gt; rlist&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = 1.0&gt; &gt; coulombtype&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = PME&gt; rcoulomb&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = 1.0&gt; vdwtype&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = Cut-off&gt; rvdw&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = 1.4&gt; fourierspacing&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = 0.12&gt; fourier_nx&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = 0&gt; fourier_ny&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = 0&gt; fourier_nz&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = 0&gt; pme_order&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = 4&gt; ewald_rtol&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = 1e-5&gt; optimize_fft&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = yes&gt; &gt; tcoupl&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = v-rescale&gt; tc_grps&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = protein non-protein&gt; tau_t&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = 0.1&nbsp; 0.1&gt; ref_t&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = 298&nbsp; 298&gt; Pcoupl&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = Parrinello-Rahman&gt; pcoupltype&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = isotropic&gt; tau_p&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = 0.5&gt; compressibility&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = 4.5e-5&gt; ref_p&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = 1.0&gt; &gt; gen_vel&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = yes&gt; gen_temp&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = 298&gt; gen_seed&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = 173529&gt; &gt; constraints&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = hbonds&gt; lincs_order&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = 10&gt; ########################################################&gt; &gt; When I conduct MD using “nohup mpiexec -np 2 mdrun_dmpi -s &gt; 11_Trun.tpr -g 12_NTPmd.log -o 12_NTPmd.trr -c 12_NTPmd.pdb -e &gt; 12_NTPmd_ener.edr -cpo 12_NTPstate.cpt &amp;”, everything is OK.&gt; &gt; Since the system doesn’t support more than 2 processes under &gt; ‘domain decomposition’ option, it took me about 30 days to &gt; calculate a 6ns trajectory. Then I decide to use the ‘particle Why no more than 2? What GROMACS version? Why are you using double precision with temperature coupling?MPICH has known issues. Use OpenMPI.&gt; decomposition’ option. The command line is “nohup mpiexec -np 6 &gt; mdrun_dmpi -pd -s 11_Trun.tpr -g 12_NTPmd.log -o 12_NTPmd.trr -c &gt; 12_NTPmd.pdb -e 12_NTPmd_ener.edr -cpo 12_NTPstate.cpt &amp;”. And I &gt; got the crash in the nohup file like below:&gt; ####################&gt; Fatal error in PMPI_Bcast: Other MPI error, error stack:&gt; PMPI_Bcast(1302)......................: MPI_Bcast(buf=0x8fedeb0, &gt; count=60720, MPI_BYTE, root=0, MPI_COMM_WORLD) failed&gt; MPIR_Bcast(998).......................: &gt; MPIR_Bcast_scatter_ring_allgather(842): &gt; MPIR_Bcast_binomial(187)..............: &gt; MPIC_Send(41).........................: &gt; MPIC_Wait(513)........................: &gt; MPIDI_CH3I_Progress(150)..............: &gt; MPID_nem_mpich2_blocking_recv(948)....: &gt; MPID_nem_tcp_connpoll(1720)...........: &gt; state_commrdy_handler(1561)...........: &gt; MPID_nem_tcp_send_queued(127).........: writev to socket failed -&gt; Bad address&gt; rank 0 in job 25&nbsp; cluster.cn_52655&nbsp;&nbsp; caused &gt; collective abort of all ranks&gt; exit status of rank 0: killed by signal 9&gt; ####################&gt; &gt; And the ends of the log file list as below:&gt; ####################&gt; ……..&gt; ……..&gt; ……..&gt; ……..&gt; &nbsp;&nbsp; &gt; bQMMM&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = FALSE&gt; &nbsp;&nbsp; &gt; QMconstraints&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = 0&gt; &nbsp;&nbsp; QMMMscheme&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = 0&gt; &nbsp;&nbsp; &gt; scalefactor&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = 1&gt; qm_opts:&gt; &nbsp;&nbsp; &gt; ngQM&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = 0&gt; ####################&gt; &gt; I’ve search the gmx-users mail list and tried to adjust the md &gt; parameters, and no solution was found. The "mpiexec -np x" &gt; option doesn't work except when x=1. I did found that when the &gt; whole En protein is constrained using position restraints &gt; (define = -DPOSRES), the ‘particle decomposition’ option works. &gt; However this is not the kind of MD I want to conduct.&gt; &nbsp;&gt; Could anyone help me about this problem? And I also want to know &gt; how can I accelerate this kind of MD (long time simulation of &gt; small system) using Gromacs? Thinks a lot!&gt; &gt; (Further information about the simulated system: The system has &gt; one En protein (54 residues, 629 atoms), total 4848 spce waters, &gt; and 7 Cl- used to neutralize the system. The system has been &gt; minimized first. A 20ps MD is also performed for the waters and &gt; ions before EM.)This should be bread-and-butter with either decomposition up to at least 16 processors, for a correctly compiled GROMACS with a useful MPI library.Mark-- 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20100601/0aac676e/attachment.html>


More information about the gromacs.org_gmx-users mailing list