[gmx-users] very strange domain composition statistics

Mark Abraham Mark.Abraham at anu.edu.au
Mon Aug 3 01:49:10 CEST 2009


Jennifer Williams wrote:
> Quoting Jennifer Williams <Jennifer.Williams at ed.ac.uk>:
> 
>>
>> Hi,
>>
>> Thanks for your input. Sorry I should have mentioned that I am using
>> the latest version of gromacs (4.0.5).
>>
>> This morning I noticed that the strange domain decomp statistics are
>> only produced when my simulations run on certain nodes. Below I have
>> pasted the domain decomp statistics for the SAME .tpr file run on 2
>> different nodes (each time using 6 nodes)- the first looks ok to me
>> while the second produces crazy numbers.

Indeed, one looks quite normal.

>> I have opened a call to computer support at my Uni to ask if there is
>> some difference between the nodes and for now I specify that my
>> simulations should run on certain nodes which I have checked are ok.
>>
>> I don't know if this is something to do with the way I compiled
>> gromacs or the architecture. I did a standard installation and it
>> seemed to run smoothly-no error messages. I have attached my config.log.

Seems like some nodes might have a suitable source of timing routines, 
and some might not. It ought to have nothing to do with how GROMACS was 
compiled if the cluster is homogeneous. If the cluster's nodes are not 
homogeneous then the admins should know about it, and have documented 
how to deal with some issues.

Mark

>> If you have any ideas please let me know,
>>
>> Thanks
>>
>>
>>
>>     D O M A I N   D E C O M P O S I T I O N   S T A T I S T I C S
>>
>>  av. #atoms communicated per step for force:  2 x 1967.2
>>  av. #atoms communicated per step for LINCS:  2 x 20.4
>>
>>  Average load imbalance: 30.3 %
>>  Part of the total run time spent waiting due to load imbalance: 5.8 %
>>  Steps where the load balancing was limited by -rdd, -rcon and/or 
>> -dds: X 9 %
>>
>> NOTE: 5.8 % performance was lost due to load imbalance
>>       in the domain decomposition.
>>
>>      R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
>>
>>  Computing:         Nodes     Number     G-Cycles    Seconds     %
>> -----------------------------------------------------------------------
>>  Domain decomp.         6     100001      852.216      341.9     3.0
>>  Comm. coord.           6    1000001      594.411      238.5     2.1
>>  Neighbor search        6     100001     2271.294      911.2     7.9
>>  Force                  6    1000001     5560.129     2230.5    19.3
>>  Wait + Comm. F         6    1000001     1216.171      487.9     4.2
>>  PME mesh               6    1000001    15071.432     6046.1    52.2
>>  Write traj.            6       1001        2.577        1.0     0.0
>>  Update                 6    1000001      264.545      106.1     0.9
>>  Constraints            6    1000001      923.418      370.4     3.2
>>  Comm. energies         6    1000001      942.036      377.9     3.3
>>  Rest                   6                1167.866      468.5     4.0
>> -----------------------------------------------------------------------
>>  Total                  6               28866.096    11580.0   100.0
>> -----------------------------------------------------------------------
>>
>>         Parallel run - timing based on wallclock.
>>
>>                NODE (s)   Real (s)      (%)
>>        Time:   1930.000   1930.000    100.0
>>                        32:10
>>                (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
>> Performance:     51.737      6.922     44.767      0.536
>> Finished mdrun on node 0 Fri Jul 31 13:05:01 2009
>>
>>
>>     D O M A I N   D E C O M P O S I T I O N   S T A T I S T I C S
>>
>>  av. #atoms communicated per step for force:  2 x 1969.0
>>  av. #atoms communicated per step for LINCS:  2 x 15.7
>>
>>  Average load imbalance: 500.0 %
>>  Part of the total run time spent waiting due to load imbalance:
>> 5822746112.0 %
>>  Steps where the load balancing was limited by -rdd, -rcon and/or 
>> -dds: X 9 %
>>
>> NOTE: 5822746112.0 % performance was lost due to load imbalance
>>       in the domain decomposition.
>>
>>      R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
>>
>>  Computing:         Nodes     Number     G-Cycles    Seconds     %
>> -----------------------------------------------------------------------
>>  Write traj.            6       1001 18443320128.890    43061.8   100.0
>>  Update                 6    1000001      314.175        0.0     0.0
>>  Rest                   6            9223372036.855    21534.9    50.0
>> -----------------------------------------------------------------------
>>  Total                  6            18443397799.336    43062.0   100.0
>> -----------------------------------------------------------------------
>>
>> NOTE: 306 % of the run time was spent communicating energies,
>>       you might want to use the -nosum option of mdrun
>>
>>         Parallel run - timing based on wallclock.
>>
>>                NODE (s)   Real (s)      (%)
>>        Time:   7177.000   7177.000    100.0
>>                        1h59:37
>>                (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
>> Performance:     13.907      1.861     12.038      1.994
>> Finished mdrun on node 0 Thu Jul 30 01:47:05 2009
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Quoting Berk Hess <gmx3 at hotmail.com>:
>>
>>>
>>>
>>>
>>>> Date: Fri, 31 Jul 2009 07:49:49 +0200
>>>> From: spoel at xray.bmc.uu.se
>>>> To: gmx-users at gromacs.org
>>>> Subject: Re: [gmx-users] very strange domain composition statistics
>>>>
>>>> Mark Abraham wrote:
>>>>> Jennifer Williams wrote:
>>>>>> Hi ,
>>>>>>
>>>>>> I am having some problems when running in parallel. Although my jobs
>>>>>> run to completion I am getting some worrying domain decomposition
>>>>>> statistics in particular the average load imbalance and the
>>>>>> performance loss due to load imbalance see below:
>>>>>
>>>>> Please report your GROMACS version number. If it's not the latest
>>>>> (4.0.5), then you should probably update and see if it's a problem 
>>>>> that
>>>>> may have been fixed between those releases. You might also try it
>>>>> without your freeze groups, especially if they dominate the system.
>>>>
>>>> In addition, you do not specify how may processors you used, nor the
>>>> division over processors that mdrun makes and not the expected
>>>> performance either. From the numbers below it seems like you used 1 
>>>> or 2
>>>> processors at most. The large number is definitely erroneous though.
>>>>
>>>
>>> For the cycle count table the number of processors seems to be 6.
>>>
>>> It seems that the cycle counters have overflowed.
>>> On what kind of architecture with what kind of compilers are you    
>>> running this?
>>>
>>> Berk
>>>
>>>>>
>>>>>> D O M A I N   D E C O M P O S I T I O N   S T A T I S T I C S
>>>>>>
>>>>>>  av. #atoms communicated per step for force:  2 x 1974.8
>>>>>>  av. #atoms communicated per step for LINCS:  2 x 15.2
>>>>>>
>>>>>>  Average load imbalance: 500.0 %
>>>>>>  Part of the total run time spent waiting due to load imbalance:
>>>>>> 4246403072.0 %
>>>>>>  Steps where the load balancing was limited by -rdd, -rcon and/or
>>>>>> -dds: X 9 %
>>>>>>
>>>>>> NOTE: 4246403072.0 % performance was lost due to load imbalance
>>>>>>       in the domain decomposition.
>>>>>>
>>>>>>      R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
>>>>>>
>>>>>>  Computing:         Nodes     Number     G-Cycles    Seconds     %
>>>>>> ----------------------------------------------------------------------- 
>>>>>>
>>>>>>  Write traj.            6       1001 18443320139.164    42130.9   
>>>>>> 100.0
>>>>>>  Update                 6    1000001 18442922984.491    42130.0   
>>>>>> 100.0
>>>>>>  Rest                   6            9223372036.855    21069.4    
>>>>>> 50.0
>>>>>> ----------------------------------------------------------------------- 
>>>>>>
>>>>>>  Total                  6            18446422611.669    42138.0   
>>>>>> 100.0
>>>>>> ----------------------------------------------------------------------- 
>>>>>>
>>>>>>
>>>>>> NOTE: 305 % of the run time was spent communicating energies,
>>>>>>       you might want to use the -nosum option of mdrun
>>>>>>
>>>>>>         Parallel run - timing based on wallclock.
>>>>>>
>>>>>>                NODE (s)   Real (s)      (%)
>>>>>>        Time:   7023.000   7023.000    100.0
>>>>>>                        1h57:03
>>>>>>                (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
>>>>>> Performance:     14.214      1.902     12.302      1.951
>>>>>> Finished mdrun on node 0 Wed Jul 29 23:47:18 2009
>>>>>>
>>>>>>
>>>>>>
>>>>>> Below is my .mdp file: I am using the PME but not having much of a
>>>>>> feel for how to set the options under  Spacing for the PME/PPPM FFT
>>>>>> grid, I left these as the default values. Could this be where the
>>>>>> trouble lies?
>>>>>>
>>>>>> My cut-off cannot be larger than 0.9 as my unit cell is only 18.2A in
>>>>>> one direction.
>>>>>>
>>>>>> How do I choose values for PME/PPPM? Ie what kind of values to use 
>>>>>> for
>>>>>> nx, ny and nz ?
>>>>>
>>>>> See manual section 3.17.5
>>>>>
>>>>>> I read that they should be divisible by npme to get the best
>>>>>> performance. Is npme the pme_order in the .mdp file? If not where 
>>>>>> do I
>>>>>> set this parameter?
>>>>>
>>>>> No, -npme is a command line parameter to mdrun. Roughly speaking, 
>>>>> things
>>>>> that have a material effect on the physics are specified in the .mdp
>>>>> file, and things that either require external file(names) to be 
>>>>> supplied
>>>>> or which only affect the implementation of the physics are 
>>>>> specified on
>>>>> the command line.
>>>>>
>>>>> Mark
>>>>>
>>>>>> Much appreciated,
>>>>>>
>>>>>> Jenny
>>>>>>
>>>>>>
>>>>>>
>>>>>> ; VARIOUS PREPROCESSING OPTIONS
>>>>>> ; Preprocessor information: use cpp syntax.
>>>>>> ; e.g.: -I/home/joe/doe -I/home/mary/hoe
>>>>>> include                  = -I../top
>>>>>> ; e.g.: -DI_Want_Cookies -DMe_Too
>>>>>> define                   =
>>>>>>
>>>>>> ; RUN CONTROL PARAMETERS
>>>>>> integrator               = md
>>>>>> ; Start time and timestep in ps
>>>>>> tinit                    = 0
>>>>>> dt                       = 0.001
>>>>>> nsteps                   = 1000000
>>>>>> ; For exact run continuation or redoing part of a run
>>>>>> ; Part index is updated automatically on checkpointing (keeps files
>>>>>> separate)
>>>>>> simulation_part          = 1
>>>>>> init_step                = 0
>>>>>> ; mode for center of mass motion removal
>>>>>> comm-mode                = linear
>>>>>> ; number of steps for center of mass motion removal
>>>>>> nstcomm                  = 1
>>>>>> ; group(s) for center of mass motion removal
>>>>>> comm-grps                =
>>>>>>
>>>>>> ; LANGEVIN DYNAMICS OPTIONS
>>>>>> ; Friction coefficient (amu/ps) and random seed
>>>>>> bd-fric                  = 0
>>>>>> ld-seed                  = 1993
>>>>>>
>>>>>> ; ENERGY MINIMIZATION OPTIONS
>>>>>> ; Force tolerance and initial step-size
>>>>>> emtol                    =
>>>>>> emstep                   =
>>>>>> ; Max number of iterations in relax_shells
>>>>>> niter                    =
>>>>>> ; Step size (ps^2) for minimization of flexible constraints
>>>>>> fcstep                   =
>>>>>> ; Frequency of steepest descents steps when doing CG
>>>>>> nstcgsteep               =
>>>>>> nbfgscorr                =
>>>>>>
>>>>>> ; TEST PARTICLE INSERTION OPTIONS
>>>>>> rtpi                     =
>>>>>>
>>>>>> ; OUTPUT CONTROL OPTIONS
>>>>>> ; Output frequency for coords (x), velocities (v) and forces (f)
>>>>>> nstxout                  = 1000
>>>>>> nstvout                  = 1000
>>>>>> nstfout                  = 0
>>>>>> ; Output frequency for energies to log file and energy file
>>>>>> nstlog                   = 1000
>>>>>> nstenergy                = 1000
>>>>>> ; Output frequency and precision for xtc file
>>>>>> nstxtcout                = 1000
>>>>>> xtc-precision            = 1000
>>>>>> ; This selects the subset of atoms for the xtc file. You can
>>>>>> ; select multiple groups. By default all atoms will be written.
>>>>>> xtc-grps                 =
>>>>>> ; Selection of energy groups
>>>>>> energygrps               =
>>>>>>
>>>>>> ; NEIGHBORSEARCHING PARAMETERS
>>>>>> ; nblist update frequency
>>>>>> nstlist                  =
>>>>>> ; ns algorithm (simple or grid)
>>>>>> ns_type                  = grid
>>>>>> ; Periodic boundary conditions: xyz, no, xy
>>>>>> pbc                      = xyz
>>>>>> periodic_molecules       = yes
>>>>>> ; nblist cut-off
>>>>>> rlist                    = 0.9
>>>>>>
>>>>>> ; OPTIONS FOR ELECTROSTATICS AND VDW
>>>>>> ; Method for doing electrostatics
>>>>>> coulombtype              = PME
>>>>>> rcoulomb-switch          = 0
>>>>>> rcoulomb                 = 0.9
>>>>>> ; Relative dielectric constant for the medium and the reaction field
>>>>>> epsilon_r                =
>>>>>> epsilon_rf               =
>>>>>>
>>>>>> ; Method for doing Van der Waals
>>>>>> vdw-type                 = Cut-off
>>>>>> ; cut-off lengths
>>>>>> rvdw-switch              = 0
>>>>>> rvdw                     = 0.9
>>>>>> ; Apply long range dispersion corrections for Energy and Pressure
>>>>>> DispCorr                 = No
>>>>>> ; Extension of the potential lookup tables beyond the cut-off
>>>>>> table-extension          =
>>>>>> ; Seperate tables between energy group pairs
>>>>>> energygrp_table          =
>>>>>>
>>>>>>
>>>>>> ; Spacing for the PME/PPPM FFT grid
>>>>>> fourierspacing           = 0.12
>>>>>> ; FFT grid size, when a value is 0 fourierspacing will be used
>>>>>> fourier_nx               = 0
>>>>>> fourier_ny               = 0
>>>>>> fourier_nz               = 0
>>>>>> ; EWALD/PME/PPPM parameters
>>>>>> pme_order                =
>>>>>> ewald_rtol               = 1e-05
>>>>>> ewald_geometry           = 3d
>>>>>> epsilon_surface          = 0
>>>>>> optimize_fft             = yes
>>>>>>
>>>>>>
>>>>>>
>>>>>> ; OPTIONS FOR WEAK COUPLING ALGORITHMS
>>>>>> ; Temperature coupling
>>>>>> tcoupl                   = nose-hoover
>>>>>> ; Groups to couple separately
>>>>>> tc-grps                  = System
>>>>>> ; Time constant (ps) and reference temperature (K)
>>>>>> tau_t                    = 0.1
>>>>>> ref_t                    = 150
>>>>>>
>>>>>> ; Pressure coupling
>>>>>> Pcoupl                   = No
>>>>>> Pcoupltype               =
>>>>>> ; Time constant (ps), compressibility (1/bar) and reference P (bar)
>>>>>> tau-p                    =
>>>>>> compressibility          =
>>>>>> ref-p                    =
>>>>>> ; Scaling of reference coordinates, No, All or COM
>>>>>> refcoord_scaling         = no
>>>>>> ; Random seed for Andersen thermostat
>>>>>> andersen_seed            =
>>>>>>
>>>>>> ; GENERATE VELOCITIES FOR STARTUP RUN
>>>>>> gen_vel                  = yes
>>>>>> gen_temp                 = 150
>>>>>> gen_seed                 = 173529
>>>>>>
>>>>>> ; OPTIONS FOR BONDS
>>>>>> constraints              = none
>>>>>> ; Type of constraint algorithm
>>>>>> constraint-algorithm     = Lincs
>>>>>> ; Do not constrain the start configuration
>>>>>> continuation             = no
>>>>>> ; Use successive overrelaxation to reduce the number of shake 
>>>>>> iterations
>>>>>> Shake-SOR                = no
>>>>>> ; Relative tolerance of shake
>>>>>> shake-tol                = 0.0001
>>>>>> ; Highest order in the expansion of the constraint coupling matrix
>>>>>> lincs-order              = 4
>>>>>> ; Number of iterations in the final step of LINCS. 1 is fine for
>>>>>> ; normal simulations, but use 2 to conserve energy in NVE runs.
>>>>>> ; For energy minimization with constraints it should be 4 to 8.
>>>>>> lincs-iter               = 1
>>>>>> ; Lincs will write a warning to the stderr if in one step a bond
>>>>>> ; rotates over more degrees than
>>>>>> lincs-warnangle          = 30
>>>>>> ; Convert harmonic bonds to morse potentials
>>>>>> morse                    = no
>>>>>>
>>>>>> ; ENERGY GROUP EXCLUSIONS
>>>>>> ; Pairs of energy groups for which all non-bonded interactions are
>>>>>> excluded
>>>>>> energygrp_excl           =
>>>>>>
>>>>>> ; WALLS
>>>>>> ; Number of walls, type, atom types, densities and box-z scale factor
>>>>>> for Ewald
>>>>>> nwall                    = 0
>>>>>> wall_type                = 9-3
>>>>>> wall_r_linpot            = -1
>>>>>> wall_atomtype            =
>>>>>> wall_density             =
>>>>>> wall_ewald_zfac          = 3
>>>>>>
>>>>>>
>>>>>> ; Non-equilibrium MD stuff
>>>>>> acc-grps                 =
>>>>>> accelerate               =
>>>>>> freezegrps               = SI_O
>>>>>> freezedim                = Y Y Y
>>>>>> cos-acceleration         = 0
>>>>>> deform                   =
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> gmx-users mailing list    gmx-users at gromacs.org
>>>>> http://lists.gromacs.org/mailman/listinfo/gmx-users
>>>>> Please search the archive at http://www.gromacs.org/search before 
>>>>> posting!
>>>>> Please don't post (un)subscribe requests to the list. Use the www
>>>>> interface or send it to gmx-users-request at gromacs.org.
>>>>> Can't post? Read http://www.gromacs.org/mailing_lists/users.php
>>>>
>>>>
>>>> -- 
>>>> David van der Spoel, Ph.D., Professor of Biology
>>>> Molec. Biophys. group, Dept. of Cell & Molec. Biol., Uppsala 
>>>> University.
>>>> Box 596, 75124 Uppsala, Sweden. Phone:    +46184714205. Fax: 
>>>> +4618511755.
>>>> spoel at xray.bmc.uu.se    spoel at gromacs.org   http://folding.bmc.uu.se
>>>> _______________________________________________
>>>> gmx-users mailing list    gmx-users at gromacs.org
>>>> http://lists.gromacs.org/mailman/listinfo/gmx-users
>>>> Please search the archive at http://www.gromacs.org/search before 
>>>> posting!
>>>> Please don't post (un)subscribe requests to the list. Use the
>>>> www interface or send it to gmx-users-request at gromacs.org.
>>>> Can't post? Read http://www.gromacs.org/mailing_lists/users.php
>>>
>>> _________________________________________________________________
>>> See all the ways you can stay connected to friends and family
>>> http://www.microsoft.com/windows/windowslive/default.aspx
>>
>>
>>
>> Dr. Jennifer Williams
>> Institute for Materials and Processes
>> School of Engineering
>> University of Edinburgh
>> Sanderson Building
>> The King's Buildings
>> Mayfield Road
>> Edinburgh, EH9 3JL, United Kingdom
>> Phone: ++44 (0)131 650 4 861
>>
>>
>> -- 
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
> 
> 
> 
> Dr. Jennifer Williams
> Institute for Materials and Processes
> School of Engineering
> University of Edinburgh
> Sanderson Building
> The King's Buildings
> Mayfield Road
> Edinburgh, EH9 3JL, United Kingdom
> Phone: ++44 (0)131 650 4 861
> 
> 



More information about the gromacs.org_gmx-users mailing list