[gmx-users] Improving scaling - Gromacs 4.0 RC2
Carsten Kutzner
ckutzne at gwdg.de
Thu Oct 2 09:49:16 CEST 2008
Hi Justin,
I have written a small gmx tool that tries various PME/PP balances
systematically
for a given number of nodes and afterwards gives a suggestion what
the fastest
combindation is. Although I plan to extend it with more
functionality, it's already
working and I can send it to you if you like to try it.
The performance loss due to load imbalance has two reasons:
1. imbalance in the force calculation, which can be levelled out by
using -dlb yes or -dlb auto
as David suggested
2. imbalance in short-range / long range force calculation which can
be levelled
by choosing the optimal PME/PP ratio. This the script should do for you.
You might also want to check the md.log file for more detailed
information about where
your imbalance is coming from. My guess is that with 32 PME nodes and
interleaved
communication the PP (short range) nodes have to wait on the PME
(long range)
nodes, while with 16 PME nodes it is the other way around.
That you see less imbalance with pp_pme or cartesian communication
probably
only means that the PME communication is slower in this case - the
'smaller'
performance loss from imbalance is a bit misleading here.
Carsten
Am 01.10.2008 um 23:18 schrieb Justin A. Lemkul:
>
> Hi,
>
> I've been playing around with the latest release candidate of
> version 4.0, and I was hoping someone out there more knowledgeable
> than me might tell me how to improve a bit on the performance I'm
> seeing. To clarify, the performance I'm seeing is a ton faster
> than 3.3.x, but I still seem to be getting bogged down with the PME/
> PP balance. I'm using mostly the default options with the new mdrun:
>
> mdrun_mpi -s test.tpr -np 64 -npme 32
>
> The system contains about 150,000 atoms - a membrane protein
> surrounded by several hundred lipids and solvent (water). The
> protein parameters are GROMOS, lipids are Berger, and water is
> SPC. My .mdp file (adapted from a generic 3.3.x file that I always
> used to use for such simulations) is attached at the end of this
> mail. It seems that my system runs fastest on 64 CPU's. Almost
> all tests with 128 or 256 seem to run slower. The nodes are dual-
> core 2.3 GHz Xserve G5, connected by Infiniband.
>
> Here's a summary of some of the tests I've run:
>
> -np -npme -ddorder ns/day % performance loss from imbalance
> 64 16 interleave 5.760 19.6
> 64 32 interleave 9.600 40.9
> 64 32 pp_pme 5.252 3.9
> 64 32 cartesian 5.383 4.7
>
> All other mdrun command line options are defaults.
>
> I get ~10.3 ns/day with -np 256 -npme 64, but since -np 64 -npme 32
> seems to give almost that same performance there seems to be no
> compelling reason to tie up that many nodes.
>
> Any hints on how to speed things up any more? Is it possible? Not
> that I'm complaining...the same system under GMX 3.3.3 gives just
> under 1 ns/day :) I'm really curious about the 40.9% performance
> loss I'm seeing with -np 64 -npme 32, even though it gives the best
> overall performance in terms of ns/day.
>
> Thanks in advance for your attention, and any comments.
>
> -Justin
>
> =======test.mdp=========
> title = NPT simulation for a membrane protein
> ; Run parameters
> integrator = md
> dt = 0.002
> nsteps = 10000 ; 20 ps
> nstcomm = 1
> ; Output parameters
> nstxout = 500
> nstvout = 500
> nstfout = 500
> nstlog = 500
> nstenergy = 500
> ; Bond parameters
> constraint_algorithm = lincs
> constraints = all-bonds
> continuation = no ; starting up
> ; Twin-range cutoff scheme, parameters for Gromos96
> nstlist = 5
> ns_type = grid
> rlist = 0.8
> rcoulomb = 0.8
> rvdw = 1.4
> ; PME electrostatics parameters
> coulombtype = PME
> fourierspacing = 0.24
> pme_order = 4
> ewald_rtol = 1e-5
> optimize_fft = yes
> ; V-rescale temperature coupling is on in three groups
> Tcoupl = V-rescale
> tc_grps = Protein POPC SOL_NA+_CL-
> tau_t = 0.1 0.1 0.1
> ref_t = 310 310 310
> ; Pressure coupling is on
> Pcoupl = Berendsen
> pcoupltype = semiisotropic
> tau_p = 2.0
> compressibility = 4.5e-5 4.5e-5
> ref_p = 1.0 1.0
> ; Generate velocities is on
> gen_vel = yes
> gen_temp = 310
> gen_seed = 173529
> ; Periodic boundary conditions are on in all directions
> pbc = xyz
> ; Long-range dispersion correction
> DispCorr = EnerPres
>
> ========end test.mdp==========
>
> --
> ========================================
>
> Justin A. Lemkul
> Graduate Research Assistant
> Department of Biochemistry
> Virginia Tech
> Blacksburg, VA
> jalemkul[at]vt.edu | (540) 231-9080
> http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin
>
> ========================================
> _______________________________________________
> gmx-users mailing list gmx-users at gromacs.org
> http://www.gromacs.org/mailman/listinfo/gmx-users
> Please search the archive at http://www.gromacs.org/search before
> posting!
> Please don't post (un)subscribe requests to the list. Use the www
> interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/mailing_lists/users.php
More information about the gromacs.org_gmx-users
mailing list