[gmx-users] Improving scaling - Gromacs 4.0 RC2

Thu Oct 2 13:37:18 CEST 2008

Carsten Kutzner wrote:
> Hi Justin,
> 
> I have written a small gmx tool that tries various PME/PP balances 
> systematically
> for a given number of nodes and afterwards gives a suggestion what the 
> fastest
> combindation is. Although I plan to extend it with more functionality, 
> it's already
> working and I can send it to you if you like to try it.
> 

I would like to try it; I think that it would help me get an empirical feel for 
things.

> The performance loss due to load imbalance has two reasons:
> 1. imbalance in the force calculation, which can be levelled out by 
> using -dlb yes or -dlb auto
> as David suggested

Indeed, using -dlb yes improves the % imbalance, but at the cost of output.  If 
we take my 64-core case, applying "-dlb yes" reduces my speed to around 5-6 
ns/day (depending on the # of PME nodes - 16, 24, 32).

It just seems peculiar to me that I can get great speed in terms of ns/day using 
64/32 PP/PME, but performance is hampered due to imbalance.

> 2. imbalance in short-range / long range force calculation which can be 
> levelled
> by choosing the optimal PME/PP ratio. This the script should do for you.
> 
> You might also want to check the md.log file for more detailed 
> information about where
> your imbalance is coming from. My guess is that with 32 PME nodes and 
> interleaved
> communication the PP (short range) nodes have to wait on the PME (long 
> range)
> nodes, while with 16 PME nodes it is the other way around.
> That you see less imbalance with pp_pme or cartesian communication probably
> only means that the PME communication is slower in this case - the 
> 'smaller'
> performance loss from imbalance is a bit misleading here.
> 

Ah, that makes a bit more sense.  Thanks :)

-Justin

> Carsten
> 
> 
> Am 01.10.2008 um 23:18 schrieb Justin A. Lemkul:
> 
>>
>> Hi,
>>
>> I've been playing around with the latest release candidate of version 
>> 4.0, and I was hoping someone out there more knowledgeable than me 
>> might tell me how to improve a bit on the performance I'm seeing.  To 
>> clarify, the performance I'm seeing is a ton faster than 3.3.x, but I 
>> still seem to be getting bogged down with the PME/PP balance.  I'm 
>> using mostly the default options with the new mdrun:
>>
>> mdrun_mpi -s test.tpr -np 64 -npme 32
>>
>> The system contains about 150,000 atoms - a membrane protein 
>> surrounded by several hundred lipids and solvent (water).  The protein 
>> parameters are GROMOS, lipids are Berger, and water is SPC.  My .mdp 
>> file (adapted from a generic 3.3.x file that I always used to use for 
>> such simulations) is attached at the end of this mail.  It seems that 
>> my system runs fastest on 64 CPU's.  Almost all tests with 128 or 256 
>> seem to run slower.  The nodes are dual-core 2.3 GHz Xserve G5, 
>> connected by Infiniband.
>>
>> Here's a summary of some of the tests I've run:
>>
>> -np    -npme    -ddorder    ns/day    % performance loss from imbalance
>> 64    16    interleave    5.760    19.6
>> 64    32    interleave    9.600    40.9
>> 64    32    pp_pme        5.252    3.9
>> 64    32    cartesian    5.383    4.7
>>
>> All other mdrun command line options are defaults.
>>
>> I get ~10.3 ns/day with -np 256 -npme 64, but since -np 64 -npme 32 
>> seems to give almost that same performance there seems to be no 
>> compelling reason to tie up that many nodes.
>>
>> Any hints on how to speed things up any more?  Is it possible?  Not 
>> that I'm complaining...the same system under GMX 3.3.3 gives just 
>> under 1 ns/day :)  I'm really curious about the 40.9% performance loss 
>> I'm seeing with -np 64 -npme 32, even though it gives the best overall 
>> performance in terms of ns/day.
>>
>> Thanks in advance for your attention, and any comments.
>>
>> -Justin
>>
>> =======test.mdp=========
>> title        = NPT simulation for a membrane protein
>> ; Run parameters
>> integrator    = md
>> dt        = 0.002
>> nsteps        = 10000        ; 20 ps
>> nstcomm        = 1
>> ; Output parameters
>> nstxout        = 500
>> nstvout        = 500
>> nstfout        = 500
>> nstlog        = 500
>> nstenergy    = 500
>> ; Bond parameters
>> constraint_algorithm     = lincs
>> constraints        = all-bonds
>> continuation     = no        ; starting up
>> ; Twin-range cutoff scheme, parameters for Gromos96
>> nstlist        = 5
>> ns_type        = grid
>> rlist        = 0.8
>> rcoulomb    = 0.8
>> rvdw        = 1.4
>> ; PME electrostatics parameters
>> coulombtype    = PME
>> fourierspacing  = 0.24
>> pme_order    = 4
>> ewald_rtol    = 1e-5
>> optimize_fft    = yes
>> ; V-rescale temperature coupling is on in three groups
>> Tcoupl         = V-rescale
>> tc_grps        = Protein POPC SOL_NA+_CL-
>> tau_t        = 0.1 0.1 0.1
>> ref_t        = 310 310 310
>> ; Pressure coupling is on
>> Pcoupl        = Berendsen
>> pcoupltype    = semiisotropic
>> tau_p        = 2.0       
>> compressibility    = 4.5e-5 4.5e-5
>> ref_p        = 1.0 1.0
>> ; Generate velocities is on
>> gen_vel        = yes       
>> gen_temp    = 310
>> gen_seed    = 173529
>> ; Periodic boundary conditions are on in all directions
>> pbc        = xyz
>> ; Long-range dispersion correction
>> DispCorr    = EnerPres
>>
>> ========end test.mdp==========
>>
>> -- 
>> ========================================
>>
>> Justin A. Lemkul
>> Graduate Research Assistant
>> Department of Biochemistry
>> Virginia Tech
>> Blacksburg, VA
>> jalemkul[at]vt.edu | (540) 231-9080
>> http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin
>>
>> ========================================
>> _______________________________________________
>> gmx-users mailing list    gmx-users at gromacs.org
>> http://www.gromacs.org/mailman/listinfo/gmx-users
>> Please search the archive at http://www.gromacs.org/search before 
>> posting!
>> Please don't post (un)subscribe requests to the list. Use the www 
>> interface or send it to gmx-users-request at gromacs.org.
>> Can't post? Read http://www.gromacs.org/mailing_lists/users.php
> 
> 

-- 
========================================

Justin A. Lemkul
Graduate Research Assistant
Department of Biochemistry
Virginia Tech
Blacksburg, VA
jalemkul[at]vt.edu | (540) 231-9080
http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin

========================================