[gmx-users] mdrun -nosum still complains that 15 % of the run time was spent communicating energies

Mon Jul 20 21:44:13 CEST 2009

Hello,

I have been running simulations on a larger number of processors 
recently and am confused about the message regarding -nosum that occurs 
at the end of the .log file. In this case, I have included the -nosum 
option to mdrun and I still get this warning (gromacs 4.0.4).

My command was:
mpirun -np $(wc -l $PBS_NODEFILE | gawk '{print $1}') -machinefile 
$PBS_NODEFILE /scratch/cneale/exe/intel/gromacs-4.0.4/exec/bin/mdrun 
-deffnm test -nosum -npme 128

#########

To confirm that I am asking mdrun for -nosum, to stderr I get:
...
Option       Type   Value   Description
------------------------------------------------------
-[no]h       bool   no      Print help info and quit
-nice        int    0       Set the nicelevel
-deffnm      string test    Set the default filename for all file options
-[no]xvgr    bool   yes     Add specific codes (legends etc.) in the output
                            xvg files for the xmgrace program
-[no]pd      bool   no      Use particle decompostion
-dd          vector 0 0 0   Domain decomposition grid, 0 is optimize
-npme        int    128     Number of separate nodes to be used for PME, -1
                            is guess
-ddorder     enum   interleave  DD node order: interleave, pp_pme or 
cartesian
-[no]ddcheck bool   yes     Check for all bonded interactions with DD
-rdd         real   0       The maximum distance for bonded interactions 
with
                            DD (nm), 0 is determine from initial coordinates
-rcon        real   0       Maximum distance for P-LINCS (nm), 0 is estimate
-dlb         enum   auto    Dynamic load balancing (with DD): auto, no 
or yes
-dds         real   0.8     Minimum allowed dlb scaling of the DD cell size
-[no]sum     bool   no      Sum the energies at every step
-[no]v       bool   no      Be loud and noisy
-[no]compact bool   yes     Write a compact log file
-[no]seppot  bool   no      Write separate V and dVdl terms for each
                            interaction type and node to the log file(s)
-pforce      real   -1      Print all forces larger than this (kJ/mol nm)
-[no]reprod  bool   no      Try to avoid optimizations that affect binary
                            reproducibility
-cpt         real   15      Checkpoint interval (minutes)
-[no]append  bool   no      Append to previous output files when continuing
                            from checkpoint
-[no]addpart bool   yes     Add the simulation part number to all output
                            files when continuing from checkpoint
-maxh        real   -1      Terminate after 0.99 times this time (hours)
-multi       int    0       Do multiple simulations in parallel
-replex      int    0       Attempt replica exchange every # steps
-reseed      int    -1      Seed for replica exchange, -1 is generate a seed
-[no]glas    bool   no      Do glass simulation with special long range
                            corrections
-[no]ionize  bool   no      Do a simulation including the effect of an X-Ray
                            bombardment on your system
...

########

And the message at the end of the .log file is:
...
    D O M A I N   D E C O M P O S I T I O N   S T A T I S T I C S

 av. #atoms communicated per step for force:  2 x 3376415.3
 av. #atoms communicated per step for LINCS:  2 x 192096.6

 Average load imbalance: 11.7 %
 Part of the total run time spent waiting due to load imbalance: 7.9 %
 Steps where the load balancing was limited by -rdd, -rcon and/or -dds: 
X 0 % Y 0 % Z 0 %
 Average PME mesh/force load: 0.620
 Part of the total run time spent waiting due to PP/PME imbalance: 10.0 %

NOTE: 7.9 % performance was lost due to load imbalance
      in the domain decomposition.

NOTE: 10.0 % performance was lost because the PME nodes
      had less work to do than the PP nodes.
      You might want to decrease the number of PME nodes
      or decrease the cut-off and the grid spacing.

     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

 Computing:         Nodes     Number     G-Cycles    Seconds     %
-----------------------------------------------------------------------
 Domain decomp.       256         51      337.551      131.2     0.7
 Send X to PME        256        501       59.454       23.1     0.1
 Comm. coord.         256        501      289.936      112.7     0.6
 Neighbor search      256         51     1250.088      485.9     2.8
 Force                256        501    16105.584     6259.9    35.4
 Wait + Comm. F       256        501     2441.390      948.9     5.4
 PME mesh             128        501     5552.336     2158.1    12.2
 Wait + Comm. X/F     128        501     9586.486     3726.1    21.1
 Wait + Recv. PME F   256        501      459.752      178.7     1.0
 Write traj.          256          2      223.993       87.1     0.5
 Update               256        501      777.618      302.2     1.7
 Constraints          256       1002     1223.093      475.4     2.7
 Comm. energies       256         51     7011.309     2725.1    15.4
 Rest                 256                 127.710       49.6     0.3
-----------------------------------------------------------------------
 Total                384               45446.299    17664.0   100.0
-----------------------------------------------------------------------

NOTE: 15 % of the run time was spent communicating energies,
      you might want to use the -nosum option of mdrun

        Parallel run - timing based on wallclock.

               NODE (s)   Real (s)      (%)
       Time:     46.000     46.000    100.0
               (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
Performance:  13778.036    728.080      1.882     12.752

########

Thanks,
Chris