[gmx-users] very strange domain composition statistics
Jennifer Williams
Jennifer.Williams at ed.ac.uk
Fri Jul 31 19:01:04 CEST 2009
Quoting Jennifer Williams <Jennifer.Williams at ed.ac.uk>:
> Hi,
> Thanks for your input. Sorry I should have mentioned that I am using
> the latest version of gromacs (4.0.5).
> This morning I noticed that the strange domain decomp statistics are
> only produced when my simulations run on certain nodes. Below I have
> pasted the domain decomp statistics for the SAME .tpr file run on 2
> different nodes (each time using 6 nodes)- the first looks ok to me
> while the second produces crazy numbers.
> I have opened a call to computer support at my Uni to ask if there is
> some difference between the nodes and for now I specify that my
> simulations should run on certain nodes which I have checked are ok.
> I don't know if this is something to do with the way I compiled
> gromacs or the architecture. I did a standard installation and it
> seemed to run smoothly-no error messages. I have attached my config.log.
> If you have any ideas please let me know,
> Thanks
> D O M A I N D E C O M P O S I T I O N S T A T I S T I C S
> av. #atoms communicated per step for force: 2 x 1967.2
> av. #atoms communicated per step for LINCS: 2 x 20.4
> Average load imbalance: 30.3 %
> Part of the total run time spent waiting due to load imbalance: 5.8 %
> Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 9 %
> NOTE: 5.8 % performance was lost due to load imbalance
> in the domain decomposition.
> R E A L C Y C L E A N D T I M E A C C O U N T I N G
> Computing: Nodes Number G-Cycles Seconds %
> -----------------------------------------------------------------------
> Domain decomp. 6 100001 852.216 341.9 3.0
> Comm. coord. 6 1000001 594.411 238.5 2.1
> Neighbor search 6 100001 2271.294 911.2 7.9
> Force 6 1000001 5560.129 2230.5 19.3
> Wait + Comm. F 6 1000001 1216.171 487.9 4.2
> PME mesh 6 1000001 15071.432 6046.1 52.2
> Write traj. 6 1001 2.577 1.0 0.0
> Update 6 1000001 264.545 106.1 0.9
> Constraints 6 1000001 923.418 370.4 3.2
> Comm. energies 6 1000001 942.036 377.9 3.3
> Rest 6 1167.866 468.5 4.0
> -----------------------------------------------------------------------
> Total 6 28866.096 11580.0 100.0
> -----------------------------------------------------------------------
> Parallel run - timing based on wallclock.
> NODE (s) Real (s) (%)
> Time: 1930.000 1930.000 100.0
> 32:10
> (Mnbf/s) (GFlops) (ns/day) (hour/ns)
> Performance: 51.737 6.922 44.767 0.536
> Finished mdrun on node 0 Fri Jul 31 13:05:01 2009
> D O M A I N D E C O M P O S I T I O N S T A T I S T I C S
> av. #atoms communicated per step for force: 2 x 1969.0
> av. #atoms communicated per step for LINCS: 2 x 15.7
> Average load imbalance: 500.0 %
> Part of the total run time spent waiting due to load imbalance:
> 5822746112.0 %
> Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 9 %
> NOTE: 5822746112.0 % performance was lost due to load imbalance
> in the domain decomposition.
> R E A L C Y C L E A N D T I M E A C C O U N T I N G
> Computing: Nodes Number G-Cycles Seconds %
> -----------------------------------------------------------------------
> Write traj. 6 1001 18443320128.890 43061.8 100.0
> Update 6 1000001 314.175 0.0 0.0
> Rest 6 9223372036.855 21534.9 50.0
> -----------------------------------------------------------------------
> Total 6 18443397799.336 43062.0 100.0
> -----------------------------------------------------------------------
> NOTE: 306 % of the run time was spent communicating energies,
> you might want to use the -nosum option of mdrun
> Parallel run - timing based on wallclock.
> NODE (s) Real (s) (%)
> Time: 7177.000 7177.000 100.0
> 1h59:37
> (Mnbf/s) (GFlops) (ns/day) (hour/ns)
> Performance: 13.907 1.861 12.038 1.994
> Finished mdrun on node 0 Thu Jul 30 01:47:05 2009
> Quoting Berk Hess <gmx3 at hotmail.com>:
>>> Date: Fri, 31 Jul 2009 07:49:49 +0200
>>> From: spoel at xray.bmc.uu.se
>>> To: gmx-users at gromacs.org
>>> Subject: Re: [gmx-users] very strange domain composition statistics
>>> Mark Abraham wrote:
>>>> Jennifer Williams wrote:
>>>>> Hi ,
>>>>> I am having some problems when running in parallel. Although my jobs
>>>>> run to completion I am getting some worrying domain decomposition
>>>>> statistics in particular the average load imbalance and the
>>>>> performance loss due to load imbalance see below:
>>>> Please report your GROMACS version number. If it's not the latest
>>>> (4.0.5), then you should probably update and see if it's a problem that
>>>> may have been fixed between those releases. You might also try it
>>>> without your freeze groups, especially if they dominate the system.
>>> In addition, you do not specify how may processors you used, nor the
>>> division over processors that mdrun makes and not the expected
>>> performance either. From the numbers below it seems like you used 1 or 2
>>> processors at most. The large number is definitely erroneous though.
>> For the cycle count table the number of processors seems to be 6.
>> It seems that the cycle counters have overflowed.
>> On what kind of architecture with what kind of compilers are you
>> running this?
>> Berk
>>>>> D O M A I N D E C O M P O S I T I O N S T A T I S T I C S
>>>>> av. #atoms communicated per step for force: 2 x 1974.8
>>>>> av. #atoms communicated per step for LINCS: 2 x 15.2
>>>>> Average load imbalance: 500.0 %
>>>>> Part of the total run time spent waiting due to load imbalance:
>>>>> 4246403072.0 %
>>>>> Steps where the load balancing was limited by -rdd, -rcon and/or
>>>>> -dds: X 9 %
>>>>> NOTE: 4246403072.0 % performance was lost due to load imbalance
>>>>> in the domain decomposition.
>>>>> R E A L C Y C L E A N D T I M E A C C O U N T I N G
>>>>> Computing: Nodes Number G-Cycles Seconds %
>>>>> -----------------------------------------------------------------------
>>>>> Write traj. 6 1001 18443320139.164 42130.9 100.0
>>>>> Update 6 1000001 18442922984.491 42130.0 100.0
>>>>> Rest 6 9223372036.855 21069.4 50.0
>>>>> -----------------------------------------------------------------------
>>>>> Total 6 18446422611.669 42138.0 100.0
>>>>> -----------------------------------------------------------------------
>>>>> NOTE: 305 % of the run time was spent communicating energies,
>>>>> you might want to use the -nosum option of mdrun
>>>>> Parallel run - timing based on wallclock.
>>>>> NODE (s) Real (s) (%)
>>>>> Time: 7023.000 7023.000 100.0
>>>>> 1h57:03
>>>>> (Mnbf/s) (GFlops) (ns/day) (hour/ns)
>>>>> Performance: 14.214 1.902 12.302 1.951
>>>>> Finished mdrun on node 0 Wed Jul 29 23:47:18 2009
>>>>> Below is my .mdp file: I am using the PME but not having much of a
>>>>> feel for how to set the options under Spacing for the PME/PPPM FFT
>>>>> grid, I left these as the default values. Could this be where the
>>>>> trouble lies?
>>>>> My cut-off cannot be larger than 0.9 as my unit cell is only 18.2A in
>>>>> one direction.
>>>>> How do I choose values for PME/PPPM? Ie what kind of values to use for
>>>>> nx, ny and nz ?
>>>> See manual section 3.17.5
>>>>> I read that they should be divisible by npme to get the best
>>>>> performance. Is npme the pme_order in the .mdp file? If not where do I
>>>>> set this parameter?
>>>> No, -npme is a command line parameter to mdrun. Roughly speaking, things
>>>> that have a material effect on the physics are specified in the .mdp
>>>> file, and things that either require external file(names) to be supplied
>>>> or which only affect the implementation of the physics are specified on
>>>> the command line.
>>>> Mark
>>>>> Much appreciated,
>>>>> Jenny
>>>>> ; Preprocessor information: use cpp syntax.
>>>>> ; e.g.: -I/home/joe/doe -I/home/mary/hoe
>>>>> include = -I../top
>>>>> ; e.g.: -DI_Want_Cookies -DMe_Too
>>>>> define =
>>>>> integrator = md
>>>>> ; Start time and timestep in ps
>>>>> tinit = 0
>>>>> dt = 0.001
>>>>> nsteps = 1000000
>>>>> ; For exact run continuation or redoing part of a run
>>>>> ; Part index is updated automatically on checkpointing (keeps files
>>>>> separate)
>>>>> simulation_part = 1
>>>>> init_step = 0
>>>>> ; mode for center of mass motion removal
>>>>> comm-mode = linear
>>>>> ; number of steps for center of mass motion removal
>>>>> nstcomm = 1
>>>>> ; group(s) for center of mass motion removal
>>>>> comm-grps =
>>>>> ; Friction coefficient (amu/ps) and random seed
>>>>> bd-fric = 0
>>>>> ld-seed = 1993
>>>>> ; Force tolerance and initial step-size
>>>>> emtol =
>>>>> emstep =
>>>>> ; Max number of iterations in relax_shells
>>>>> niter =
>>>>> ; Step size (ps^2) for minimization of flexible constraints
>>>>> fcstep =
>>>>> ; Frequency of steepest descents steps when doing CG
>>>>> nstcgsteep =
>>>>> nbfgscorr =
>>>>> rtpi =
>>>>> ; Output frequency for coords (x), velocities (v) and forces (f)
>>>>> nstxout = 1000
>>>>> nstvout = 1000
>>>>> nstfout = 0
>>>>> ; Output frequency for energies to log file and energy file
>>>>> nstlog = 1000
>>>>> nstenergy = 1000
>>>>> ; Output frequency and precision for xtc file
>>>>> nstxtcout = 1000
>>>>> xtc-precision = 1000
>>>>> ; This selects the subset of atoms for the xtc file. You can
>>>>> ; select multiple groups. By default all atoms will be written.
>>>>> xtc-grps =
>>>>> ; Selection of energy groups
>>>>> energygrps =
>>>>> ; nblist update frequency
>>>>> nstlist =
>>>>> ; ns algorithm (simple or grid)
>>>>> ns_type = grid
>>>>> ; Periodic boundary conditions: xyz, no, xy
>>>>> pbc = xyz
>>>>> periodic_molecules = yes
>>>>> ; nblist cut-off
>>>>> rlist = 0.9
>>>>> ; Method for doing electrostatics
>>>>> coulombtype = PME
>>>>> rcoulomb-switch = 0
>>>>> rcoulomb = 0.9
>>>>> ; Relative dielectric constant for the medium and the reaction field
>>>>> epsilon_r =
>>>>> epsilon_rf =
>>>>> ; Method for doing Van der Waals
>>>>> vdw-type = Cut-off
>>>>> ; cut-off lengths
>>>>> rvdw-switch = 0
>>>>> rvdw = 0.9
>>>>> ; Apply long range dispersion corrections for Energy and Pressure
>>>>> DispCorr = No
>>>>> ; Extension of the potential lookup tables beyond the cut-off
>>>>> table-extension =
>>>>> ; Seperate tables between energy group pairs
>>>>> energygrp_table =
>>>>> ; Spacing for the PME/PPPM FFT grid
>>>>> fourierspacing = 0.12
>>>>> ; FFT grid size, when a value is 0 fourierspacing will be used
>>>>> fourier_nx = 0
>>>>> fourier_ny = 0
>>>>> fourier_nz = 0
>>>>> ; EWALD/PME/PPPM parameters
>>>>> pme_order =
>>>>> ewald_rtol = 1e-05
>>>>> ewald_geometry = 3d
>>>>> epsilon_surface = 0
>>>>> optimize_fft = yes
>>>>> ; Temperature coupling
>>>>> tcoupl = nose-hoover
>>>>> ; Groups to couple separately
>>>>> tc-grps = System
>>>>> ; Time constant (ps) and reference temperature (K)
>>>>> tau_t = 0.1
>>>>> ref_t = 150
>>>>> ; Pressure coupling
>>>>> Pcoupl = No
>>>>> Pcoupltype =
>>>>> ; Time constant (ps), compressibility (1/bar) and reference P (bar)
>>>>> tau-p =
>>>>> compressibility =
>>>>> ref-p =
>>>>> ; Scaling of reference coordinates, No, All or COM
>>>>> refcoord_scaling = no
>>>>> ; Random seed for Andersen thermostat
>>>>> andersen_seed =
>>>>> gen_vel = yes
>>>>> gen_temp = 150
>>>>> gen_seed = 173529
>>>>> constraints = none
>>>>> ; Type of constraint algorithm
>>>>> constraint-algorithm = Lincs
>>>>> ; Do not constrain the start configuration
>>>>> continuation = no
>>>>> ; Use successive overrelaxation to reduce the number of shake iterations
>>>>> Shake-SOR = no
>>>>> ; Relative tolerance of shake
>>>>> shake-tol = 0.0001
>>>>> ; Highest order in the expansion of the constraint coupling matrix
>>>>> lincs-order = 4
>>>>> ; Number of iterations in the final step of LINCS. 1 is fine for
>>>>> ; normal simulations, but use 2 to conserve energy in NVE runs.
>>>>> ; For energy minimization with constraints it should be 4 to 8.
>>>>> lincs-iter = 1
>>>>> ; Lincs will write a warning to the stderr if in one step a bond
>>>>> ; rotates over more degrees than
>>>>> lincs-warnangle = 30
>>>>> ; Convert harmonic bonds to morse potentials
>>>>> morse = no
>>>>> ; Pairs of energy groups for which all non-bonded interactions are
>>>>> excluded
>>>>> energygrp_excl =
>>>>> ; WALLS
>>>>> ; Number of walls, type, atom types, densities and box-z scale factor
>>>>> for Ewald
>>>>> nwall = 0
>>>>> wall_type = 9-3
>>>>> wall_r_linpot = -1
>>>>> wall_atomtype =
>>>>> wall_density =
>>>>> wall_ewald_zfac = 3
>>>>> ; Non-equilibrium MD stuff
>>>>> acc-grps =
>>>>> accelerate =
>>>>> freezegrps = SI_O
>>>>> freezedim = Y Y Y
>>>>> cos-acceleration = 0
>>>>> deform =
