[gmx-users] mdrun -nosum still complains that 15 % of the run time was spent communicating energies

Chris Neale chris.neale at utoronto.ca
Mon Jul 20 21:44:13 CEST 2009


I have been running simulations on a larger number of processors 
recently and am confused about the message regarding -nosum that occurs 
at the end of the .log file. In this case, I have included the -nosum 
option to mdrun and I still get this warning (gromacs 4.0.4).

My command was:
mpirun -np $(wc -l $PBS_NODEFILE | gawk '{print $1}') -machinefile 
$PBS_NODEFILE /scratch/cneale/exe/intel/gromacs-4.0.4/exec/bin/mdrun 
-deffnm test -nosum -npme 128


To confirm that I am asking mdrun for -nosum, to stderr I get:
Option       Type   Value   Description
-[no]h       bool   no      Print help info and quit
-nice        int    0       Set the nicelevel
-deffnm      string test    Set the default filename for all file options
-[no]xvgr    bool   yes     Add specific codes (legends etc.) in the output
                            xvg files for the xmgrace program
-[no]pd      bool   no      Use particle decompostion
-dd          vector 0 0 0   Domain decomposition grid, 0 is optimize
-npme        int    128     Number of separate nodes to be used for PME, -1
                            is guess
-ddorder     enum   interleave  DD node order: interleave, pp_pme or 
-[no]ddcheck bool   yes     Check for all bonded interactions with DD
-rdd         real   0       The maximum distance for bonded interactions 
                            DD (nm), 0 is determine from initial coordinates
-rcon        real   0       Maximum distance for P-LINCS (nm), 0 is estimate
-dlb         enum   auto    Dynamic load balancing (with DD): auto, no 
or yes
-dds         real   0.8     Minimum allowed dlb scaling of the DD cell size
-[no]sum     bool   no      Sum the energies at every step
-[no]v       bool   no      Be loud and noisy
-[no]compact bool   yes     Write a compact log file
-[no]seppot  bool   no      Write separate V and dVdl terms for each
                            interaction type and node to the log file(s)
-pforce      real   -1      Print all forces larger than this (kJ/mol nm)
-[no]reprod  bool   no      Try to avoid optimizations that affect binary
-cpt         real   15      Checkpoint interval (minutes)
-[no]append  bool   no      Append to previous output files when continuing
                            from checkpoint
-[no]addpart bool   yes     Add the simulation part number to all output
                            files when continuing from checkpoint
-maxh        real   -1      Terminate after 0.99 times this time (hours)
-multi       int    0       Do multiple simulations in parallel
-replex      int    0       Attempt replica exchange every # steps
-reseed      int    -1      Seed for replica exchange, -1 is generate a seed
-[no]glas    bool   no      Do glass simulation with special long range
-[no]ionize  bool   no      Do a simulation including the effect of an X-Ray
                            bombardment on your system


And the message at the end of the .log file is:
    D O M A I N   D E C O M P O S I T I O N   S T A T I S T I C S

 av. #atoms communicated per step for force:  2 x 3376415.3
 av. #atoms communicated per step for LINCS:  2 x 192096.6

 Average load imbalance: 11.7 %
 Part of the total run time spent waiting due to load imbalance: 7.9 %
 Steps where the load balancing was limited by -rdd, -rcon and/or -dds: 
X 0 % Y 0 % Z 0 %
 Average PME mesh/force load: 0.620
 Part of the total run time spent waiting due to PP/PME imbalance: 10.0 %

NOTE: 7.9 % performance was lost due to load imbalance
      in the domain decomposition.

NOTE: 10.0 % performance was lost because the PME nodes
      had less work to do than the PP nodes.
      You might want to decrease the number of PME nodes
      or decrease the cut-off and the grid spacing.

     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

 Computing:         Nodes     Number     G-Cycles    Seconds     %
 Domain decomp.       256         51      337.551      131.2     0.7
 Send X to PME        256        501       59.454       23.1     0.1
 Comm. coord.         256        501      289.936      112.7     0.6
 Neighbor search      256         51     1250.088      485.9     2.8
 Force                256        501    16105.584     6259.9    35.4
 Wait + Comm. F       256        501     2441.390      948.9     5.4
 PME mesh             128        501     5552.336     2158.1    12.2
 Wait + Comm. X/F     128        501     9586.486     3726.1    21.1
 Wait + Recv. PME F   256        501      459.752      178.7     1.0
 Write traj.          256          2      223.993       87.1     0.5
 Update               256        501      777.618      302.2     1.7
 Constraints          256       1002     1223.093      475.4     2.7
 Comm. energies       256         51     7011.309     2725.1    15.4
 Rest                 256                 127.710       49.6     0.3
 Total                384               45446.299    17664.0   100.0

NOTE: 15 % of the run time was spent communicating energies,
      you might want to use the -nosum option of mdrun

        Parallel run - timing based on wallclock.

               NODE (s)   Real (s)      (%)
       Time:     46.000     46.000    100.0
               (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
Performance:  13778.036    728.080      1.882     12.752



