[gmx-users] mdrun -nosum still complains that 15 % of the run time was spent communicating energies
Chris Neale
chris.neale at utoronto.ca
Mon Jul 20 21:44:13 CEST 2009
Hello,
I have been running simulations on a larger number of processors
recently and am confused about the message regarding -nosum that occurs
at the end of the .log file. In this case, I have included the -nosum
option to mdrun and I still get this warning (gromacs 4.0.4).
My command was:
mpirun -np $(wc -l $PBS_NODEFILE | gawk '{print $1}') -machinefile
$PBS_NODEFILE /scratch/cneale/exe/intel/gromacs-4.0.4/exec/bin/mdrun
-deffnm test -nosum -npme 128
#########
To confirm that I am asking mdrun for -nosum, to stderr I get:
...
Option Type Value Description
------------------------------------------------------
-[no]h bool no Print help info and quit
-nice int 0 Set the nicelevel
-deffnm string test Set the default filename for all file options
-[no]xvgr bool yes Add specific codes (legends etc.) in the output
xvg files for the xmgrace program
-[no]pd bool no Use particle decompostion
-dd vector 0 0 0 Domain decomposition grid, 0 is optimize
-npme int 128 Number of separate nodes to be used for PME, -1
is guess
-ddorder enum interleave DD node order: interleave, pp_pme or
cartesian
-[no]ddcheck bool yes Check for all bonded interactions with DD
-rdd real 0 The maximum distance for bonded interactions
with
DD (nm), 0 is determine from initial coordinates
-rcon real 0 Maximum distance for P-LINCS (nm), 0 is estimate
-dlb enum auto Dynamic load balancing (with DD): auto, no
or yes
-dds real 0.8 Minimum allowed dlb scaling of the DD cell size
-[no]sum bool no Sum the energies at every step
-[no]v bool no Be loud and noisy
-[no]compact bool yes Write a compact log file
-[no]seppot bool no Write separate V and dVdl terms for each
interaction type and node to the log file(s)
-pforce real -1 Print all forces larger than this (kJ/mol nm)
-[no]reprod bool no Try to avoid optimizations that affect binary
reproducibility
-cpt real 15 Checkpoint interval (minutes)
-[no]append bool no Append to previous output files when continuing
from checkpoint
-[no]addpart bool yes Add the simulation part number to all output
files when continuing from checkpoint
-maxh real -1 Terminate after 0.99 times this time (hours)
-multi int 0 Do multiple simulations in parallel
-replex int 0 Attempt replica exchange every # steps
-reseed int -1 Seed for replica exchange, -1 is generate a seed
-[no]glas bool no Do glass simulation with special long range
corrections
-[no]ionize bool no Do a simulation including the effect of an X-Ray
bombardment on your system
...
########
And the message at the end of the .log file is:
...
D O M A I N D E C O M P O S I T I O N S T A T I S T I C S
av. #atoms communicated per step for force: 2 x 3376415.3
av. #atoms communicated per step for LINCS: 2 x 192096.6
Average load imbalance: 11.7 %
Part of the total run time spent waiting due to load imbalance: 7.9 %
Steps where the load balancing was limited by -rdd, -rcon and/or -dds:
X 0 % Y 0 % Z 0 %
Average PME mesh/force load: 0.620
Part of the total run time spent waiting due to PP/PME imbalance: 10.0 %
NOTE: 7.9 % performance was lost due to load imbalance
in the domain decomposition.
NOTE: 10.0 % performance was lost because the PME nodes
had less work to do than the PP nodes.
You might want to decrease the number of PME nodes
or decrease the cut-off and the grid spacing.
R E A L C Y C L E A N D T I M E A C C O U N T I N G
Computing: Nodes Number G-Cycles Seconds %
-----------------------------------------------------------------------
Domain decomp. 256 51 337.551 131.2 0.7
Send X to PME 256 501 59.454 23.1 0.1
Comm. coord. 256 501 289.936 112.7 0.6
Neighbor search 256 51 1250.088 485.9 2.8
Force 256 501 16105.584 6259.9 35.4
Wait + Comm. F 256 501 2441.390 948.9 5.4
PME mesh 128 501 5552.336 2158.1 12.2
Wait + Comm. X/F 128 501 9586.486 3726.1 21.1
Wait + Recv. PME F 256 501 459.752 178.7 1.0
Write traj. 256 2 223.993 87.1 0.5
Update 256 501 777.618 302.2 1.7
Constraints 256 1002 1223.093 475.4 2.7
Comm. energies 256 51 7011.309 2725.1 15.4
Rest 256 127.710 49.6 0.3
-----------------------------------------------------------------------
Total 384 45446.299 17664.0 100.0
-----------------------------------------------------------------------
NOTE: 15 % of the run time was spent communicating energies,
you might want to use the -nosum option of mdrun
Parallel run - timing based on wallclock.
NODE (s) Real (s) (%)
Time: 46.000 46.000 100.0
(Mnbf/s) (GFlops) (ns/day) (hour/ns)
Performance: 13778.036 728.080 1.882 12.752
########
Thanks,
Chris
More information about the gromacs.org_gmx-users
mailing list