[gmx-users] Re: mdrun -nosum still complains that 15 % of the run time was spent communicating energies
Chris Neale
chris.neale at utoronto.ca
Mon Jul 20 23:06:10 CEST 2009
I have now tested with and without -nosum and it appears that the option
is working (see 51 vs. 501 Number of communications) but that the total
amount of time communicating energies didn't go down by very much. Seems
strange to me. Anybody have any ideas if this is normal?
At the very least, I suggest adding an if statement to mdrun so that it
doesn't output the -nosum usage note if the user did in fact use -nosum
in that run.
Without using -nosum:
R E A L C Y C L E A N D T I M E A C C O U N T I N G
Computing: Nodes Number G-Cycles Seconds %
-----------------------------------------------------------------------
...
Write traj. 256 2 233.218 93.7 0.5
Update 256 501 777.511 312.5 1.7
Constraints 256 1002 1203.894 483.9 2.7
Comm. energies 256 501 7397.995 2973.9 16.5
Rest 256 128.058 51.5 0.3
-----------------------------------------------------------------------
Total 384 44897.468 18048.0 100.0
-----------------------------------------------------------------------
NOTE: 16 % of the run time was spent communicating energies,
you might want to use the -nosum option of mdrun
Parallel run - timing based on wallclock.
NODE (s) Real (s) (%)
Time: 47.000 47.000 100.0
(Mnbf/s) (GFlops) (ns/day) (hour/ns)
Performance: 13485.788 712.634 1.842 13.029
Finished mdrun on node 0 Mon Jul 20 12:53:41 2009
#########
And using -nosum:
R E A L C Y C L E A N D T I M E A C C O U N T I N G
Computing: Nodes Number G-Cycles Seconds %
-----------------------------------------------------------------------
...
Write traj. 256 2 213.521 83.3 0.5
Update 256 501 776.606 303.0 1.8
Constraints 256 1002 1200.285 468.2 2.7
Comm. energies 256 51 6926.667 2702.1 15.6
Rest 256 127.503 49.7 0.3
-----------------------------------------------------------------------
Total 384 44296.670 17280.0 100.0
-----------------------------------------------------------------------
NOTE: 16 % of the run time was spent communicating energies,
you might want to use the -nosum option of mdrun
Parallel run - timing based on wallclock.
NODE (s) Real (s) (%)
Time: 45.000 45.000 100.0
(Mnbf/s) (GFlops) (ns/day) (hour/ns)
Performance: 14084.547 744.277 1.924 12.475
#########
Thanks,
Chris.
Chris Neale wrote:
> Hello,
>
> I have been running simulations on a larger number of processors
> recently and am confused about the message regarding -nosum that
> occurs at the end of the .log file. In this case, I have included the
> -nosum option to mdrun and I still get this warning (gromacs 4.0.4).
>
> My command was:
> mpirun -np $(wc -l $PBS_NODEFILE | gawk '{print $1}') -machinefile
> $PBS_NODEFILE /scratch/cneale/exe/intel/gromacs-4.0.4/exec/bin/mdrun
> -deffnm test -nosum -npme 128
>
> #########
>
> To confirm that I am asking mdrun for -nosum, to stderr I get:
> ...
> Option Type Value Description
> ------------------------------------------------------
> -[no]h bool no Print help info and quit
> -nice int 0 Set the nicelevel
> -deffnm string test Set the default filename for all file options
> -[no]xvgr bool yes Add specific codes (legends etc.) in the
> output
> xvg files for the xmgrace program
> -[no]pd bool no Use particle decompostion
> -dd vector 0 0 0 Domain decomposition grid, 0 is optimize
> -npme int 128 Number of separate nodes to be used for
> PME, -1
> is guess
> -ddorder enum interleave DD node order: interleave, pp_pme or
> cartesian
> -[no]ddcheck bool yes Check for all bonded interactions with DD
> -rdd real 0 The maximum distance for bonded
> interactions with
> DD (nm), 0 is determine from initial
> coordinates
> -rcon real 0 Maximum distance for P-LINCS (nm), 0 is
> estimate
> -dlb enum auto Dynamic load balancing (with DD): auto, no
> or yes
> -dds real 0.8 Minimum allowed dlb scaling of the DD cell
> size
> -[no]sum bool no Sum the energies at every step
> -[no]v bool no Be loud and noisy
> -[no]compact bool yes Write a compact log file
> -[no]seppot bool no Write separate V and dVdl terms for each
> interaction type and node to the log file(s)
> -pforce real -1 Print all forces larger than this (kJ/mol nm)
> -[no]reprod bool no Try to avoid optimizations that affect binary
> reproducibility
> -cpt real 15 Checkpoint interval (minutes)
> -[no]append bool no Append to previous output files when
> continuing
> from checkpoint
> -[no]addpart bool yes Add the simulation part number to all output
> files when continuing from checkpoint
> -maxh real -1 Terminate after 0.99 times this time (hours)
> -multi int 0 Do multiple simulations in parallel
> -replex int 0 Attempt replica exchange every # steps
> -reseed int -1 Seed for replica exchange, -1 is generate
> a seed
> -[no]glas bool no Do glass simulation with special long range
> corrections
> -[no]ionize bool no Do a simulation including the effect of an
> X-Ray
> bombardment on your system
> ...
>
> ########
>
> And the message at the end of the .log file is:
> ...
> D O M A I N D E C O M P O S I T I O N S T A T I S T I C S
>
> av. #atoms communicated per step for force: 2 x 3376415.3
> av. #atoms communicated per step for LINCS: 2 x 192096.6
>
> Average load imbalance: 11.7 %
> Part of the total run time spent waiting due to load imbalance: 7.9 %
> Steps where the load balancing was limited by -rdd, -rcon and/or -dds:
> X 0 % Y 0 % Z 0 %
> Average PME mesh/force load: 0.620
> Part of the total run time spent waiting due to PP/PME imbalance: 10.0 %
>
> NOTE: 7.9 % performance was lost due to load imbalance
> in the domain decomposition.
>
> NOTE: 10.0 % performance was lost because the PME nodes
> had less work to do than the PP nodes.
> You might want to decrease the number of PME nodes
> or decrease the cut-off and the grid spacing.
>
>
> R E A L C Y C L E A N D T I M E A C C O U N T I N G
>
> Computing: Nodes Number G-Cycles Seconds %
> -----------------------------------------------------------------------
> Domain decomp. 256 51 337.551 131.2 0.7
> Send X to PME 256 501 59.454 23.1 0.1
> Comm. coord. 256 501 289.936 112.7 0.6
> Neighbor search 256 51 1250.088 485.9 2.8
> Force 256 501 16105.584 6259.9 35.4
> Wait + Comm. F 256 501 2441.390 948.9 5.4
> PME mesh 128 501 5552.336 2158.1 12.2
> Wait + Comm. X/F 128 501 9586.486 3726.1 21.1
> Wait + Recv. PME F 256 501 459.752 178.7 1.0
> Write traj. 256 2 223.993 87.1 0.5
> Update 256 501 777.618 302.2 1.7
> Constraints 256 1002 1223.093 475.4 2.7
> Comm. energies 256 51 7011.309 2725.1 15.4
> Rest 256 127.710 49.6 0.3
> -----------------------------------------------------------------------
> Total 384 45446.299 17664.0 100.0
> -----------------------------------------------------------------------
>
> NOTE: 15 % of the run time was spent communicating energies,
> you might want to use the -nosum option of mdrun
>
>
> Parallel run - timing based on wallclock.
>
> NODE (s) Real (s) (%)
> Time: 46.000 46.000 100.0
> (Mnbf/s) (GFlops) (ns/day) (hour/ns)
> Performance: 13778.036 728.080 1.882 12.752
>
> ########
>
> Thanks,
> Chris
More information about the gromacs.org_gmx-users
mailing list