Re: Re: [gmx-users] “Fatal error in PMPI_Bcast: Other MPI error, …..” occurs when using the ‘particle decomposition’ option.

Mark Abraham mark.abraham at anu.edu.au
Tue Jun 1 19:45:17 CEST 2010


----- Original Message -----
From: xhomes at sohu.com
Date: Tuesday, June 1, 2010 21:59
Subject:  Re: [gmx-users] “Fatal error in PMPI_Bcast: Other MPI error, …..” occurs when using the ‘particle decomposition’ option.
To: Discussion list for GROMACS users <gmx-users at gromacs.org>

> 
> Hi, Mark,
> Thanks for the reply! 
> It seemed that I got something messed up. At the beginning, I used ‘constraints = all-bonds’ and ‘domain decomposition’.
>When the simulation scale to more than 2 processes, an error like below will occur: 

The "domain_decomposition" .mdp flag is an artefact of pre-GROMACS-4 development of DD. It does nothing. Forget about it. DD is enabled by default unless you use mdrun -pd.

> ####################
> Fatal error: There is no domain decomposition for 6 nodes that is compatible with the given box and a minimum cell size of 2.06375 nm
> Change the number of nodes or mdrun option -rcon or -dds or your LINCS settings
> Look in the log file for details on the domain decomposition
> ####################
>  

With DD and all-bonds, the coupled constraints create a minimum cell diameter that must be satisfied on all processors. Your system is too small for this to be true. The manual sections on DD mention this, though perhaps you wouldn't pick that up on a first reading.

> I refer to the manual and found no answer. Then I turned to use ‘particle decomposition’, tried
> all kind of method, including change mpich to lammpi, change Gromacs from V4.05
> to V4.07,adjusting the mdp file (e.g. ‘constraints = hbonds’ or no PME), and none of these
> take effect! I thought I have tried ‘constraints = hbonds’ with ‘domain decomposition’, at least with lammpi. 

PD might fail for a similar reason, I suppose.

> However, when I tried ‘constraints = hbonds’ and ‘domain decomposition’ under mpich today, it scaled to more than 2 processes well! And now it also scaled well under lammpi using ‘constraints
= hbonds’ and ‘domain decomposition’!

Yep. Your constraints are not so tightly coupled now.

> So, it seemed the key is ‘constraints= hbonds’ for ‘domain decomposition’.

Knowing how your tools work is key :-) The problem with complex tools like GROMACS is knowing what's worth knowing :-)

>  
> Of course, the simulation still crashed when using ‘particle decomposition’ with ‘constraints = hbonds or all-bonds’, and I don’t know why.

Again, your system is probably too small to be bothered with parallelising with constraints.

> I use double precision version and NTP ensemble to perform a PCA!

I doubt that you need to collect data in double precision. Any supposed extra accuracy of integration is probably getting swapped by noise from temperature coupling.  I suppose you may wish to run the analysis tool in double, but it'll read a single-precision trajectory just fine. Using single precision will make things more than a factor of two faster.

Mark

>  > 
> ----- Original Message -----
> From: xhomes at sohu.com
> Date: Tuesday, June 1, 2010 11:53
> Subject: [gmx-users] “Fatal error in PMPI_Bcast: Other MPI error, …..” occurs when using the ‘particle decomposition’ option.
> To: gmx-users <gmx-users at gromacs.org>


> 
> > Hi, everyone of gmx-users,
> > 
> > I met a problem when I use the ‘particle decomposition’ option 
> > in a NTP MD simulation of Engrailed Homeodomain (En) in CL- 
> > neutralized water box. It just crashed with an error “Fatal 
> > error in PMPI_Bcast: Other MPI error, error stack: …..”. 
> > However, I’ve tried the ‘domain decomposition’ and everything is 
> > ok! I use the Gromacs 4.05 and 4.07, the MPI lib is mpich2-
> > 1.2.1p1. The system box size is 5.386(nm)3. The MDP file list as 
> > below:
> > ########################################################
> > title                    = En
> > ;cpp                      = /lib/cpp
> > ;include                  = -I../top
> > define                   = 
> > integrator               = md
> > dt                       = 0.002
> > nsteps                   = 3000000
> > nstxout                  = 500
> > nstvout                  = 500
> > nstlog                   = 250
> > nstenergy                = 250
> > nstxtcout                 = 500
> > comm-
> > mode              = Linear
> > nstcomm                  = 1
> > 
> > ;xtc_grps                 = Protein
> > energygrps               = protein non-protein
> > 
> > nstlist                  = 10
> > ns_type                  = grid
> > pbc                      = xyz	;default xyz
> > ;periodic_molecules       = 
> > yes	;default no
> > rlist                    = 1.0
> > 
> > coulombtype              = PME
> > rcoulomb                 = 1.0
> > vdwtype                  = Cut-off
> > rvdw                     = 1.4
> > fourierspacing           = 0.12
> > fourier_nx               = 0
> > fourier_ny               = 0
> > fourier_nz               = 0
> > pme_order                = 4
> > ewald_rtol               = 1e-5
> > optimize_fft             = yes
> > 
> > tcoupl                   = v-rescale
> > tc_grps                  = protein non-protein
> > tau_t                    = 0.1  0.1
> > ref_t                    = 298  298
> > Pcoupl                   = Parrinello-Rahman
> > pcoupltype               = isotropic
> > tau_p                    = 0.5
> > compressibility          = 4.5e-5
> > ref_p                    = 1.0
> > 
> > gen_vel                  = yes
> > gen_temp                 = 298
> > gen_seed                 = 173529
> > 
> > constraints              = hbonds
> > lincs_order              = 10
> > ########################################################
> > 
> > When I conduct MD using “nohup mpiexec -np 2 mdrun_dmpi -s 
> > 11_Trun.tpr -g 12_NTPmd.log -o 12_NTPmd.trr -c 12_NTPmd.pdb -e 
> > 12_NTPmd_ener.edr -cpo 12_NTPstate.cpt &”, everything is OK.
> > 
> > Since the system doesn’t support more than 2 processes under 
> > ‘domain decomposition’ option, it took me about 30 days to 
> > calculate a 6ns trajectory. Then I decide to use the ‘particle > 
> 
> Why no more than 2? What GROMACS version? Why are you using double precision with temperature coupling?
> 
> MPICH has known issues. Use OpenMPI.
> 
> > decomposition’ option. The command line is “nohup mpiexec -np 6 
> > mdrun_dmpi -pd -s 11_Trun.tpr -g 12_NTPmd.log -o 12_NTPmd.trr -c 
> > 12_NTPmd.pdb -e 12_NTPmd_ener.edr -cpo 12_NTPstate.cpt &”. And I 
> > got the crash in the nohup file like below:
> > ####################
> > Fatal error in PMPI_Bcast: Other MPI error, error stack:
> > PMPI_Bcast(1302)......................: MPI_Bcast(buf=0x8fedeb0, 
> > count=60720, MPI_BYTE, root=0, MPI_COMM_WORLD) failed
> > MPIR_Bcast(998).......................: 
> > > MPIR_Bcast_scatter_ring_al> lgather(842): 
> > MPIR_Bcast_binomial(187)..............: 
> > MPIC_Send(41).........................: 
> > MPIC_Wait(513)........................: 
> > MPIDI_CH3I_Progress(150)..............: 
> > > MPID_nem_mpich2_blocking_r> ecv(948)....: 
> > MPID_nem_tcp_connpoll(1720)...........: 
> > state_commrdy_handler(1561)...........: 
> > MPID_nem_tcp_send_queued(127).........: writev to socket failed -
> > Bad address
> > rank 0 in job 25  cluster.cn_52655   caused 
> > collective abort of all ranks
> > exit status of rank 0: killed by signal 9
> > ####################
> > 
> > And the ends of the log file list as below:
> > ####################
> > ……..
> > ……..
> > ……..
> > ……..
> >    
> > bQMMM            = FALSE
> >    
> > QMconstraints        = 0
> >    QMMMscheme       = 0
> >    
> > scalefactor           = 1
> > qm_opts:
> >    
> > ngQM                 = 0
> > ####################
> > 
> > I’ve search the gmx-users mail list and tried to adjust the md 
> > parameters, and no solution was found. The "mpiexec -np x" 
> > option doesn't work except when x=1. I did found that when the 
> > whole En protein is constrained using position restraints 
> > (define = -DPOSRES), the ‘particle decomposition’ option works. 
> > However this is not the kind of MD I want to conduct.
> >  
> > Could anyone help me about this problem? And I also want to know 
> > how can I accelerate this kind of MD (long time simulation of 
> > small system) using Gromacs? Thinks a lot!
> > 
> > (Further information about the simulated system: The system has 
> > one En protein (54 residues, 629 atoms), total 4848 spce waters, 
> > and 7 Cl- used to neutralize the system. The system has been 
> > minimized first. A 20ps MD is also performed for the waters and 
> > ions before EM.)> 
> 
> This should be bread-and-butter with either decomposition up to at least 16 processors, for a correctly compiled GROMACS with a useful MPI library.
> 
> Mark
> -- 



> -- 
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> Please search the archive at http://www.gromacs.org/search 
> before posting!
> Please don't post (un)subscribe requests to the list. Use the 
> www interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/mailing_lists/users.php



More information about the gromacs.org_gmx-users mailing list