[gmx-users] Problem with domain decomposition
Mark Abraham
Mark.Abraham at anu.edu.au
Sun Sep 27 04:53:29 CEST 2009
Stephane Abel wrote:
> Hi gromacs users and experts
>
> I am doing some simulations using 8 CPU of solvate peptide (8 AA) in
> octahedron truncated box (5150) with SPC water with GMX 4.05. To
> simulate during a long time i am cutting my simulation in 24 h time
> period (25 ns/day) using checkpoints. During my last simulation part, i
> have note that the simulation was 2.6 slower (sim_last) than the
> preceding run (sim_prev). I have note that message at the end of the log
> file of the sim_last
>
> ---- Log of sim_last -------------
>
> D O M A I N D E C O M P O S I T I O N S T A T I S T I C S
>
> av. #atoms communicated per step for force: 2 x 35969.0
> av. #atoms communicated per step for LINCS: 2 x 58.1
>
> Average load imbalance: 4.6 %
> Part of the total run time spent waiting due to load imbalance: 1.5 %
>
>
> R E A L C Y C L E A N D T I M E A C C O U N T I N G
>
> Computing: Nodes Number G-Cycles Seconds %
> -----------------------------------------------------------------------
> Domain decomp. 8 1025540 19176.963 6392.1 0.9
> Comm. coord. 8 5127698 12300.804 4100.1 0.6
> Neighbor search 8 1025541 183144.975 61046.2 8.9
> Force 8 5127698 263336.032 87775.6 12.8
> Wait + Comm. F 8 5127698 23995.139 7998.1 1.2
> PME mesh 8 5127698 265259.767 88416.8 12.9
> Write traj. 8 5184 154247.417 51414.0 7.5
> Update 8 5127698 13123.384 4374.3 0.6
> Constraints 8 5127698 16635.925 5545.1 0.8
> Comm. energies 8 5127698 1084187.361 361383.0 52.8
> Rest 8 17552.589 5850.7 0.9
> -----------------------------------------------------------------------
> Total 8 2052960.356 684296.0 100.0
> -----------------------------------------------------------------------
>
> NOTE: 53 % of the run time was spent communicating energies,
> you might want to use the -nosum option of mdrun
>
>
> Parallel run - timing based on wallclock.
>
> NODE (s) Real (s) (%)
> Time: 85537.000 85537.000 100.0
> 23h45:37
> (Mnbf/s) (GFlops) (ns/day) (hour/ns)
> Performance: 144.887 10.126 10.359 2.317
> Finished mdrun on node 0 Fri Sep 25 14:19:07 2009
>
> ----------------- Log sim_prev
>
> D O M A I N D E C O M P O S I T I O N S T A T I S T I C S
>
> av. #atoms communicated per step for force: 2 x 35971.8
> av. #atoms communicated per step for LINCS: 2 x 59.7
>
> Average load imbalance: 4.6 %
> Part of the total run time spent waiting due to load imbalance: 1.5 %
>
>
> R E A L C Y C L E A N D T I M E A C C O U N T I N G
>
> Computing: Nodes Number G-Cycles Seconds %
> -----------------------------------------------------------------------
> Domain decomp. 8 2500000 47859.929 15952.7 2.4
> Comm. coord. 8 12500000 38434.207 12810.9 1.9
> Neighbor search 8 2500001 445996.846 148659.9 22.4
> Force 8 12500000 637253.269 212409.6 32.1
> Wait + Comm. F 8 12500000 58421.254 19473.0 2.9
> PME mesh 8 12500000 637267.326 212414.2 32.1
> Write traj. 8 12501 80.674 26.9 0.0
> Update 8 12500000 32011.697 10670.2 1.6
> Constraints 8 12500000 40061.175 13353.2 2.0
> Comm. energies 8 12500000 8407.505 2802.4 0.4
> Rest 8 41890.865 13963.1 2.1
> -----------------------------------------------------------------------
> Total 8 1987684.746 662536.0 100.0
> -----------------------------------------------------------------------
>
> Parallel run - timing based on wallclock.
>
> NODE (s) Real (s) (%)
> Time: 82817.000 82817.000 100.0
> 23h00:17
> (Mnbf/s) (GFlops) (ns/day) (hour/ns)
> Performance: 364.799 25.495 26.082 0.920
>
> My simulation is running on a supercompter that you can see the
> characteristic here : http://www.cines.fr/spip.php?article520). I don't
> know where is the problem (hardware ?, software ?) Any advice will be
> appreciate.
sim_last executed many fewer integration steps compared with sim_prev,
presumably because of this issue. One theory is that your 8 processors
had different relative locality during the two runs, and that this had a
severe consequence for network performance (both Write traj. and Comm.
energies are high, and these require communication to the master node,
and then I/O). Ensuring you get allocated all 8 cores in one node should
alleviate any such issue.
Mark
More information about the gromacs.org_gmx-users
mailing list