[gmx-users] Improving scaling - Gromacs 4.0 RC2

Carsten Kutzner ckutzne at gwdg.de
Thu Oct 2 09:49:16 CEST 2008


Hi Justin,

I have written a small gmx tool that tries various PME/PP balances  
systematically
for a given number of nodes and afterwards gives a suggestion what  
the fastest
combindation is. Although I plan to extend it with more  
functionality, it's already
working and I can send it to you if you like to try it.

The performance loss due to load imbalance has two reasons:
1. imbalance in the force calculation, which can be levelled out by  
using -dlb yes or -dlb auto
as David suggested
2. imbalance in short-range / long range force calculation which can  
be levelled
by choosing the optimal PME/PP ratio. This the script should do for you.

You might also want to check the md.log file for more detailed  
information about where
your imbalance is coming from. My guess is that with 32 PME nodes and  
interleaved
communication the PP (short range) nodes have to wait on the PME  
(long range)
nodes, while with 16 PME nodes it is the other way around.
That you see less imbalance with pp_pme or cartesian communication  
probably
only means that the PME communication is slower in this case - the  
'smaller'
performance loss from imbalance is a bit misleading here.

Carsten


Am 01.10.2008 um 23:18 schrieb Justin A. Lemkul:

>
> Hi,
>
> I've been playing around with the latest release candidate of  
> version 4.0, and I was hoping someone out there more knowledgeable  
> than me might tell me how to improve a bit on the performance I'm  
> seeing.  To clarify, the performance I'm seeing is a ton faster  
> than 3.3.x, but I still seem to be getting bogged down with the PME/ 
> PP balance.  I'm using mostly the default options with the new mdrun:
>
> mdrun_mpi -s test.tpr -np 64 -npme 32
>
> The system contains about 150,000 atoms - a membrane protein  
> surrounded by several hundred lipids and solvent (water).  The  
> protein parameters are GROMOS, lipids are Berger, and water is  
> SPC.  My .mdp file (adapted from a generic 3.3.x file that I always  
> used to use for such simulations) is attached at the end of this  
> mail.  It seems that my system runs fastest on 64 CPU's.  Almost  
> all tests with 128 or 256 seem to run slower.  The nodes are dual- 
> core 2.3 GHz Xserve G5, connected by Infiniband.
>
> Here's a summary of some of the tests I've run:
>
> -np	-npme	-ddorder	ns/day	% performance loss from imbalance
> 64	16	interleave	5.760	19.6
> 64	32	interleave	9.600	40.9
> 64	32	pp_pme		5.252	3.9
> 64	32	cartesian	5.383	4.7
>
> All other mdrun command line options are defaults.
>
> I get ~10.3 ns/day with -np 256 -npme 64, but since -np 64 -npme 32  
> seems to give almost that same performance there seems to be no  
> compelling reason to tie up that many nodes.
>
> Any hints on how to speed things up any more?  Is it possible?  Not  
> that I'm complaining...the same system under GMX 3.3.3 gives just  
> under 1 ns/day :)  I'm really curious about the 40.9% performance  
> loss I'm seeing with -np 64 -npme 32, even though it gives the best  
> overall performance in terms of ns/day.
>
> Thanks in advance for your attention, and any comments.
>
> -Justin
>
> =======test.mdp=========
> title		= NPT simulation for a membrane protein
> ; Run parameters
> integrator	= md
> dt		= 0.002
> nsteps		= 10000		; 20 ps
> nstcomm		= 1
> ; Output parameters
> nstxout		= 500
> nstvout		= 500
> nstfout		= 500
> nstlog		= 500
> nstenergy	= 500
> ; Bond parameters
> constraint_algorithm 	= lincs
> constraints		= all-bonds
> continuation 	= no		; starting up
> ; Twin-range cutoff scheme, parameters for Gromos96
> nstlist		= 5
> ns_type		= grid
> rlist		= 0.8
> rcoulomb	= 0.8
> rvdw		= 1.4
> ; PME electrostatics parameters
> coulombtype	= PME
> fourierspacing  = 0.24
> pme_order	= 4
> ewald_rtol	= 1e-5
> optimize_fft	= yes
> ; V-rescale temperature coupling is on in three groups
> Tcoupl	 	= V-rescale
> tc_grps		= Protein POPC SOL_NA+_CL-
> tau_t		= 0.1 0.1 0.1
> ref_t		= 310 310 310
> ; Pressure coupling is on
> Pcoupl		= Berendsen
> pcoupltype	= semiisotropic
> tau_p		= 2.0		
> compressibility	= 4.5e-5 4.5e-5
> ref_p		= 1.0 1.0
> ; Generate velocities is on
> gen_vel		= yes		
> gen_temp	= 310
> gen_seed	= 173529
> ; Periodic boundary conditions are on in all directions
> pbc		= xyz
> ; Long-range dispersion correction
> DispCorr	= EnerPres
>
> ========end test.mdp==========
>
> -- 
> ========================================
>
> Justin A. Lemkul
> Graduate Research Assistant
> Department of Biochemistry
> Virginia Tech
> Blacksburg, VA
> jalemkul[at]vt.edu | (540) 231-9080
> http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin
>
> ========================================
> _______________________________________________
> gmx-users mailing list    gmx-users at gromacs.org
> http://www.gromacs.org/mailman/listinfo/gmx-users
> Please search the archive at http://www.gromacs.org/search before  
> posting!
> Please don't post (un)subscribe requests to the list. Use the www  
> interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/mailing_lists/users.php




More information about the gromacs.org_gmx-users mailing list