[gmx-users] Problem with domain decomposition

Mark Abraham Mark.Abraham at anu.edu.au
Sun Sep 27 04:53:29 CEST 2009


Stephane Abel wrote:
> Hi gromacs users and experts
> 
> I am doing some simulations using 8 CPU of solvate peptide (8 AA) in 
> octahedron truncated box  (5150) with SPC water with GMX 4.05.  To 
> simulate during a  long  time  i am cutting my simulation in 24 h time 
> period (25 ns/day) using checkpoints.  During my last simulation part, i 
> have note that the simulation was 2.6 slower (sim_last) than the 
> preceding run (sim_prev). I have note that message at the end of the log 
> file of the sim_last
> 
> ---- Log of sim_last -------------
> 
>    D O M A I N   D E C O M P O S I T I O N   S T A T I S T I C S
> 
> av. #atoms communicated per step for force:  2 x 35969.0
> av. #atoms communicated per step for LINCS:  2 x 58.1
> 
> Average load imbalance: 4.6 %
> Part of the total run time spent waiting due to load imbalance: 1.5 %
> 
> 
>     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
> 
> Computing:         Nodes     Number     G-Cycles    Seconds     %
> -----------------------------------------------------------------------
> Domain decomp.         8    1025540    19176.963     6392.1     0.9
> Comm. coord.           8    5127698    12300.804     4100.1     0.6
> Neighbor search        8    1025541   183144.975    61046.2     8.9
> Force                  8    5127698   263336.032    87775.6    12.8
> Wait + Comm. F         8    5127698    23995.139     7998.1     1.2
> PME mesh               8    5127698   265259.767    88416.8    12.9
> Write traj.            8       5184   154247.417    51414.0     7.5
> Update                 8    5127698    13123.384     4374.3     0.6
> Constraints            8    5127698    16635.925     5545.1     0.8
> Comm. energies         8    5127698  1084187.361   361383.0    52.8
> Rest                   8               17552.589     5850.7     0.9
> -----------------------------------------------------------------------
> Total                  8             2052960.356   684296.0   100.0
> -----------------------------------------------------------------------
> 
> NOTE: 53 % of the run time was spent communicating energies,
>      you might want to use the -nosum option of mdrun
> 
> 
>        Parallel run - timing based on wallclock.
> 
>               NODE (s)   Real (s)      (%)
>       Time:  85537.000  85537.000    100.0
>                       23h45:37
>               (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
> Performance:    144.887     10.126     10.359      2.317
> Finished mdrun on node 0 Fri Sep 25 14:19:07 2009
> 
> ----------------- Log sim_prev
> 
>    D O M A I N   D E C O M P O S I T I O N   S T A T I S T I C S
> 
> av. #atoms communicated per step for force:  2 x 35971.8
> av. #atoms communicated per step for LINCS:  2 x 59.7
> 
> Average load imbalance: 4.6 %
> Part of the total run time spent waiting due to load imbalance: 1.5 %
> 
> 
>     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
> 
> Computing:         Nodes     Number     G-Cycles    Seconds     %
> -----------------------------------------------------------------------
> Domain decomp.         8    2500000    47859.929    15952.7     2.4
> Comm. coord.           8   12500000    38434.207    12810.9     1.9
> Neighbor search        8    2500001   445996.846   148659.9    22.4
> Force                  8   12500000   637253.269   212409.6    32.1
> Wait + Comm. F         8   12500000    58421.254    19473.0     2.9
> PME mesh               8   12500000   637267.326   212414.2    32.1
> Write traj.            8      12501       80.674       26.9     0.0
> Update                 8   12500000    32011.697    10670.2     1.6
> Constraints            8   12500000    40061.175    13353.2     2.0
> Comm. energies         8   12500000     8407.505     2802.4     0.4
> Rest                   8               41890.865    13963.1     2.1
> -----------------------------------------------------------------------
> Total                  8             1987684.746   662536.0   100.0
> -----------------------------------------------------------------------
> 
>        Parallel run - timing based on wallclock.
> 
>               NODE (s)   Real (s)      (%)
>       Time:  82817.000  82817.000    100.0
>                       23h00:17
>               (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
> Performance:    364.799     25.495     26.082      0.920
> 
> My simulation is running on a supercompter that you can see the 
> characteristic  here : http://www.cines.fr/spip.php?article520). I don't 
> know where is the problem (hardware ?, software ?) Any advice will be 
> appreciate.

sim_last executed many fewer integration steps compared with sim_prev, 
presumably because of this issue. One theory is that your 8 processors 
had different relative locality during the two runs, and that this had a 
severe consequence for network performance (both Write traj. and Comm. 
energies are high, and these require communication to the master node, 
and then I/O). Ensuring you get allocated all 8 cores in one node should 
alleviate any such issue.

Mark



More information about the gromacs.org_gmx-users mailing list