[gmx-users] Pull code slows down mdrun -- is this expected?

Christopher Neale chris.neale at alum.utoronto.ca
Thu May 19 18:10:10 CEST 2016


Dear Szilárd:

I think that my ranks were set up similarly for GPU and non-GPU runs, though you can see how I ran the jobs below. One thing that I also already checked, but that I have not pasted here, is that all runs got the same DD and PME grid reported from the log file (4x1x1 for each in all cases), and that the nstlist/rlist modification was always the same (promoted to nstlist=40 / rlist=1.253)

Below I provide the run command and the end of the log files for:

(1) no pull-code with GPUs -- v5.1.2

(2) 1 pull-code cylinder restraint with GPUs -- v5.1.2

(3) 128 pull-code cylinder restraints with GPUs -- v5.1.2

(4) no pull-code with CPUs -- v5.1.2

(5) 1 pull-code cylinder restraint with CPUs -- v5.1.2

(6) 128 pull-code cylinder restraints with CPUs -- v5.1.2

(7) 128 pull-code cylinder restraints with CPUs -- Gromacs-2016-beta1

########################################################################
(1) no pull-code with GPUs

/home/cneale/exec/GROMACS/exec/gromacs-5.1.2/gpu_serial/bin/gmx mdrun -notunepme -deffnm MD -dlb yes -npme 0 -cpt 60 -maxh 0.01 -cpi MD.cpt -ntmpi 4 -ntomp 6 -gpu_id 0123

        M E G A - F L O P S   A C C O U N T I N G

 NB=Group-cutoff nonbonded kernels    NxN=N-by-N cluster Verlet kernels
 RF=Reaction-Field  VdW=Van der Waals  QSTab=quadratic-spline table
 W3=SPC/TIP3p  W4=TIP4p (single or pairs)
 V&F=Potential and force  V=Potential only  F=Force only

 Computing:                               M-Number         M-Flops  % Flops
-----------------------------------------------------------------------------
 NB VdW [V&F]                           364.832000         364.832     0.0
 Pair Search distance check            1620.797488       14587.177     0.0
 NxN Ewald Elec. + LJ [F]            483561.291648    37717780.749    95.2
 NxN Ewald Elec. + LJ [V&F]            4927.352384      635628.458     1.6
 1,4 nonbonded interactions             509.305472       45837.492     0.1
 Calc Weights                          1226.587986       44157.167     0.1
 Spread Q Bspline                     26167.210368       52334.421     0.1
 Gather F Bspline                     26167.210368      157003.262     0.4
 3D-FFT                               87418.239194      699345.914     1.8
 Solve PME                               30.828304        1973.011     0.0
 Reset In Box                            10.256532          30.770     0.0
 CG-CoM                                  10.292394          30.877     0.0
 Bonds                                   71.507072        4218.917     0.0
 Propers                                631.889024      144702.586     0.4
 Impropers                                2.918656         607.080     0.0
 Virial                                  41.123922         740.231     0.0
 Update                                 408.862662       12674.743     0.0
 Stop-CM                                  4.159992          41.600     0.0
 Calc-Ekin                               81.837084        2209.601     0.0
 Lincs                                  249.823000       14989.380     0.0
 Lincs-Mat                             2355.109984        9420.440     0.0
 Constraint-V                           961.241869        7689.935     0.0
 Constraint-Vir                          35.598117         854.355     0.0
 Settle                                 153.879318       49703.020     0.1
-----------------------------------------------------------------------------
 Total                                                39616926.018   100.0
-----------------------------------------------------------------------------


    D O M A I N   D E C O M P O S I T I O N   S T A T I S T I C S

 av. #atoms communicated per step for force:  2 x 28471.3
 av. #atoms communicated per step for LINCS:  2 x 2103.2

 Average load imbalance: 0.5 %
 Part of the total run time spent waiting due to load imbalance: 0.2 %
 Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 0 %


     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

On 4 MPI ranks, each using 6 OpenMP threads

 Computing:          Num   Num      Call    Wall time         Giga-Cycles
                     Ranks Threads  Count      (s)         total sum    %
-----------------------------------------------------------------------------
 Domain decomp.         4    6        286       1.093         62.827   3.0
 DD comm. load          4    6        285       0.002          0.123   0.0
 DD comm. bounds        4    6        286       0.004          0.251   0.0
 Neighbor search        4    6        286       0.581         33.383   1.6
 Launch GPU ops.        4    6      22802       0.962         55.297   2.7
 Comm. coord.           4    6      11115       0.839         48.251   2.3
 Force                  4    6      11401      11.241        646.160  31.3
 Wait + Comm. F         4    6      11401       0.803         46.166   2.2
 PME mesh               4    6      11401      12.488        717.838  34.8
 Wait GPU nonlocal      4    6      11401       0.080          4.614   0.2
 Wait GPU local         4    6      11401       0.030          1.697   0.1
 NB X/F buffer ops.     4    6      45032       0.474         27.265   1.3
 Write traj.            4    6          3       0.009          0.518   0.0
 Update                 4    6      22802       1.973        113.431   5.5
 Constraints            4    6      22802       4.520        259.834  12.6
 Comm. energies         4    6       1141       0.020          1.168   0.1
 Rest                                           0.754         43.343   2.1
-----------------------------------------------------------------------------
 Total                                         35.876       2062.165 100.0
-----------------------------------------------------------------------------
 Breakdown of PME mesh computation
-----------------------------------------------------------------------------
 PME redist. X/F        4    6      22802       1.619         93.076   4.5
 PME spread/gather      4    6      22802       5.835        335.390  16.3
 PME 3D-FFT             4    6      22802       3.061        175.947   8.5
 PME 3D-FFT Comm.       4    6      22802       1.229         70.636   3.4
 PME solve Elec         4    6      11401       0.714         41.044   2.0
-----------------------------------------------------------------------------

               Core t (s)   Wall t (s)        (%)
       Time:      859.895       35.876     2396.9
                 (ns/day)    (hour/ns)
Performance:       54.914        0.437
Finished mdrun on rank 0 Tue May 17 11:57:59 2016


########################################################################
(2) 1 pull-code cylinder restraint with GPUs

/home/cneale/exec/GROMACS/exec/gromacs-5.1.2/gpu_serial/bin/gmx mdrun -notunepme -deffnm MD -dlb yes -npme 0 -cpt 60 -maxh 0.01 -cpi MD.cpt -ntmpi 4 -ntomp 6 -gpu_id 0123

        M E G A - F L O P S   A C C O U N T I N G

 NB=Group-cutoff nonbonded kernels    NxN=N-by-N cluster Verlet kernels
 RF=Reaction-Field  VdW=Van der Waals  QSTab=quadratic-spline table
 W3=SPC/TIP3p  W4=TIP4p (single or pairs)
 V&F=Potential and force  V=Potential only  F=Force only

 Computing:                               M-Number         M-Flops  % Flops
-----------------------------------------------------------------------------
 NB VdW [V&F]                           341.792000         341.792     0.0
 Pair Search distance check            1523.886864       13714.982     0.0
 NxN Ewald Elec. + LJ [F]            453747.480576    35392303.485    95.2
 NxN Ewald Elec. + LJ [V&F]            4634.734080      597880.696     1.6
 1,4 nonbonded interactions             477.141632       42942.747     0.1
 Calc Weights                          1149.126066       41368.538     0.1
 Spread Q Bspline                     24514.689408       49029.379     0.1
 Gather F Bspline                     24514.689408      147088.136     0.4
 3D-FFT                               81897.571514      655180.572     1.8
 Solve PME                               28.881424        1848.411     0.0
 Reset In Box                             9.611016          28.833     0.0
 CG-CoM                                   9.646878          28.941     0.0
 Bonds                                   66.991232        3952.483     0.0
 Propers                                591.983744      135564.277     0.4
 Impropers                                2.734336         568.742     0.0
 Virial                                  38.528898         693.520     0.0
 Update                                 383.042022       11874.303     0.0
 Stop-CM                                  3.873096          38.731     0.0
 Calc-Ekin                               76.672956        2070.170     0.0
 Lincs                                  234.398906       14063.934     0.0
 Lincs-Mat                             2209.202496        8836.810     0.0
 Constraint-V                           901.536603        7212.293     0.0
 Constraint-Vir                          33.383999         801.216     0.0
 Settle                                 144.260292       46596.074     0.1
-----------------------------------------------------------------------------
 Total                                                37174029.066   100.0
-----------------------------------------------------------------------------


    D O M A I N   D E C O M P O S I T I O N   S T A T I S T I C S

 av. #atoms communicated per step for force:  2 x 28747.6
 av. #atoms communicated per step for LINCS:  2 x 2141.1

 Average load imbalance: 0.7 %
 Part of the total run time spent waiting due to load imbalance: 0.2 %
 Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 0 %


     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

On 4 MPI ranks, each using 6 OpenMP threads

 Computing:          Num   Num      Call    Wall time         Giga-Cycles
                     Ranks Threads  Count      (s)         total sum    %
-----------------------------------------------------------------------------
 Domain decomp.         4    6        268       1.036         59.528   2.9
 DD comm. load          4    6        267       0.001          0.085   0.0
 DD comm. bounds        4    6        268       0.002          0.123   0.0
 Neighbor search        4    6        268       0.546         31.386   1.5
 Launch GPU ops.        4    6      21362       0.912         52.395   2.5
 Comm. coord.           4    6      10413       0.784         45.045   2.2
 Force                  4    6      10681      10.529        605.019  29.3
 Wait + Comm. F         4    6      10681       0.762         43.803   2.1
 PME mesh               4    6      10681      11.761        675.817  32.7
 Wait GPU nonlocal      4    6      10681       0.075          4.306   0.2
 Wait GPU local         4    6      10681       0.027          1.573   0.1
 NB X/F buffer ops.     4    6      42188       0.435         24.972   1.2
 COM pull force         4    6      10681       2.233        128.302   6.2
 Write traj.            4    6          3       0.009          0.496   0.0
 Update                 4    6      21362       1.844        105.992   5.1
 Constraints            4    6      21362       4.252        244.328  11.8
 Comm. energies         4    6       1069       0.015          0.875   0.0
 Rest                                           0.705         40.524   2.0
-----------------------------------------------------------------------------
 Total                                         35.928       2064.569 100.0
-----------------------------------------------------------------------------
 Breakdown of PME mesh computation
-----------------------------------------------------------------------------
 PME redist. X/F        4    6      21362       1.561         89.718   4.3
 PME spread/gather      4    6      21362       5.478        314.785  15.2
 PME 3D-FFT             4    6      21362       2.862        164.444   8.0
 PME 3D-FFT Comm.       4    6      21362       1.162         66.773   3.2
 PME solve Elec         4    6      10681       0.669         38.447   1.9
-----------------------------------------------------------------------------

               Core t (s)   Wall t (s)        (%)
       Time:      860.923       35.928     2396.3
                 (ns/day)    (hour/ns)
Performance:       51.372        0.467
Finished mdrun on rank 0 Wed May 18 09:49:04 2016


########################################################################
(3) 128 pull-code cylinder restraints with GPUs

/home/cneale/exec/GROMACS/exec/gromacs-5.1.2/gpu_serial/bin/gmx mdrun -notunepme -deffnm MD -dlb yes -npme 0 -cpt 60 -maxh 0.01 -cpi MD.cpt -ntmpi 4 -ntomp 6 -gpu_id 0123

        M E G A - F L O P S   A C C O U N T I N G

 NB=Group-cutoff nonbonded kernels    NxN=N-by-N cluster Verlet kernels
 RF=Reaction-Field  VdW=Van der Waals  QSTab=quadratic-spline table
 W3=SPC/TIP3p  W4=TIP4p (single or pairs)
 V&F=Potential and force  V=Potential only  F=Force only

 Computing:                               M-Number         M-Flops  % Flops
-----------------------------------------------------------------------------
 NB VdW [V&F]                            60.192000          60.192     0.0
 Pair Search distance check             272.833312        2455.500     0.0
 NxN Ewald Elec. + LJ [F]             77397.539264     6037008.063    95.0
 NxN Ewald Elec. + LJ [V&F]             831.930304      107319.009     1.7
 1,4 nonbonded interactions              84.028032        7562.523     0.1
 Calc Weights                           202.369266        7285.294     0.1
 Spread Q Bspline                      4317.211008        8634.422     0.1
 Gather F Bspline                      4317.211008       25903.266     0.4
 3D-FFT                               14422.744314      115381.955     1.8
 Solve PME                                5.086224         325.518     0.0
 Reset In Box                             1.721376           5.164     0.0
 CG-CoM                                   1.757238           5.272     0.0
 Bonds                                   11.797632         696.060     0.0
 Propers                                104.252544       23873.833     0.4
 Impropers                                0.481536         100.159     0.0
 Virial                                   6.811938         122.615     0.0
 Update                                  67.456422        2091.149     0.0
 Stop-CM                                  0.717240           7.172     0.0
 Calc-Ekin                               13.555836         366.008     0.0
 Lincs                                   41.348386        2480.903     0.0
 Lincs-Mat                              389.821664        1559.287     0.0
 Constraint-V                           158.857573        1270.861     0.0
 Constraint-Vir                           5.902560         141.661     0.0
 Settle                                  25.400962        8204.511     0.1
-----------------------------------------------------------------------------
 Total                                                 6352860.396   100.0
-----------------------------------------------------------------------------


    D O M A I N   D E C O M P O S I T I O N   S T A T I S T I C S

 av. #atoms communicated per step for force:  2 x 28748.6
 av. #atoms communicated per step for LINCS:  2 x 2147.1

 Average load imbalance: 1.7 %
 Part of the total run time spent waiting due to load imbalance: 0.2 %
 Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 0 %


     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

On 4 MPI ranks, each using 6 OpenMP threads

 Computing:          Num   Num      Call    Wall time         Giga-Cycles
                     Ranks Threads  Count      (s)         total sum    %
-----------------------------------------------------------------------------
 Domain decomp.         4    6         48       0.250         14.358   0.7
 DD comm. load          4    6         47       0.002          0.098   0.0
 DD comm. bounds        4    6         48       0.001          0.051   0.0
 Neighbor search        4    6         48       0.138          7.930   0.4
 Launch GPU ops.        4    6       3762       0.209         11.990   0.6
 Comm. coord.           4    6       1833       0.187         10.771   0.5
 Force                  4    6       1881       3.479        199.905   9.4
 Wait + Comm. F         4    6       1881       0.149          8.587   0.4
 PME mesh               4    6       1881       3.791        217.825  10.3
 Wait GPU nonlocal      4    6       1881       0.019          1.094   0.1
 Wait GPU local         4    6       1881       0.005          0.309   0.0
 NB X/F buffer ops.     4    6       7428       0.113          6.514   0.3
 COM pull force         4    6       1881      26.652       1531.533  72.3
 Write traj.            4    6          2       0.008          0.435   0.0
 Update                 4    6       3762       0.562         32.308   1.5
 Constraints            4    6       3762       1.152         66.171   3.1
 Comm. energies         4    6        189       0.009          0.521   0.0
 Rest                                           0.144          8.271   0.4
-----------------------------------------------------------------------------
 Total                                         36.869       2118.672 100.0
-----------------------------------------------------------------------------
 Breakdown of PME mesh computation
-----------------------------------------------------------------------------
 PME redist. X/F        4    6       3762       0.527         30.259   1.4
 PME spread/gather      4    6       3762       1.893        108.802   5.1
 PME 3D-FFT             4    6       3762       0.956         54.953   2.6
 PME 3D-FFT Comm.       4    6       3762       0.267         15.330   0.7
 PME solve Elec         4    6       1881       0.142          8.149   0.4
-----------------------------------------------------------------------------

               Core t (s)   Wall t (s)        (%)
       Time:      883.599       36.869     2396.6
                 (ns/day)    (hour/ns)
Performance:        8.816        2.722
Finished mdrun on rank 0 Tue May 17 12:01:03 2016

########################################################################
(4) no pull-code with CPUs

/home/cneale/exec/GROMACS/exec/gromacs-5.1.2/serial/bin/gmx mdrun -notunepme -deffnm MD -dlb yes -npme 0 -cpt 60 -maxh 0.01 -cpi MD.cpt -ntmpi 4 -ntomp 6

        M E G A - F L O P S   A C C O U N T I N G

 NB=Group-cutoff nonbonded kernels    NxN=N-by-N cluster Verlet kernels
 RF=Reaction-Field  VdW=Van der Waals  QSTab=quadratic-spline table
 W3=SPC/TIP3p  W4=TIP4p (single or pairs)
 V&F=Potential and force  V=Potential only  F=Force only

 Computing:                               M-Number         M-Flops  % Flops
-----------------------------------------------------------------------------
 NB VdW [V&F]                           124.832000         124.832     0.0
 Pair Search distance check            2741.405650       24672.651     0.3
 NxN Ewald Elec. + LJ [F]             88748.544456     6922386.468    91.8
 NxN Ewald Elec. + LJ [V&F]             919.460288      118610.377     1.6
 NxN Ewald Elec. [F]                    759.291832       46316.802     0.6
 NxN Ewald Elec. [V&F]                    7.821440         657.001     0.0
 1,4 nonbonded interactions             174.265472       15683.892     0.2
 Calc Weights                           419.692986       15108.947     0.2
 Spread Q Bspline                      8953.450368       17906.901     0.2
 Gather F Bspline                      8953.450368       53720.702     0.7
 3D-FFT                               29911.284194      239290.274     3.2
 Solve PME                               10.548304         675.091     0.0
 Reset In Box                             5.630334          16.891     0.0
 CG-CoM                                   5.666196          16.999     0.0
 Bonds                                   24.467072        1443.557     0.0
 Propers                                216.209024       49511.866     0.7
 Impropers                                0.998656         207.720     0.0
 Virial                                  14.092422         253.664     0.0
 Update                                 139.897662        4336.828     0.1
 Stop-CM                                  1.470342          14.703     0.0
 Calc-Ekin                               56.016444        1512.444     0.0
 Lincs                                   85.537242        5132.235     0.1
 Lincs-Mat                              805.565504        3222.262     0.0
 Constraint-V                           329.043485        2632.348     0.0
 Constraint-Vir                          12.202321         292.856     0.0
 Settle                                  52.670362       17012.527     0.2
-----------------------------------------------------------------------------
 Total                                                 7540760.838   100.0
-----------------------------------------------------------------------------


    D O M A I N   D E C O M P O S I T I O N   S T A T I S T I C S

 av. #atoms communicated per step for force:  2 x 27844.0
 av. #atoms communicated per step for LINCS:  2 x 2116.4

 Average load imbalance: 0.6 %
 Part of the total run time spent waiting due to load imbalance: 0.4 %
 Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 0 %


     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

On 4 MPI ranks, each using 6 OpenMP threads

 Computing:          Num   Num      Call    Wall time         Giga-Cycles
                     Ranks Threads  Count      (s)         total sum    %
-----------------------------------------------------------------------------
 Domain decomp.         4    6        157       0.553         31.778   1.5
 DD comm. load          4    6        156       0.001          0.048   0.0
 DD comm. bounds        4    6        157       0.001          0.072   0.0
 Neighbor search        4    6        157       0.942         54.112   2.6
 Comm. coord.           4    6       3744       0.307         17.632   0.9
 Force                  4    6       3901      26.106       1500.166  72.4
 Wait + Comm. F         4    6       3901       0.334         19.200   0.9
 PME mesh               4    6       3901       5.275        303.154  14.6
 NB X/F buffer ops.     4    6      11389       0.215         12.369   0.6
 Write traj.            4    6          2       0.007          0.426   0.0
 Update                 4    6       7802       0.625         35.930   1.7
 Constraints            4    6       7802       1.454         83.564   4.0
 Comm. energies         4    6        781       0.010          0.590   0.0
 Rest                                           0.217         12.484   0.6
-----------------------------------------------------------------------------
 Total                                         36.049       2071.527 100.0
-----------------------------------------------------------------------------
 Breakdown of PME mesh computation
-----------------------------------------------------------------------------
 PME redist. X/F        4    6       7802       0.766         44.033   2.1
 PME spread/gather      4    6       7802       2.644        151.952   7.3
 PME 3D-FFT             4    6       7802       1.190         68.385   3.3
 PME 3D-FFT Comm.       4    6       7802       0.481         27.663   1.3
 PME solve Elec         4    6       3901       0.183         10.497   0.5
-----------------------------------------------------------------------------

               Core t (s)   Wall t (s)        (%)
       Time:      864.421       36.049     2397.9
                 (ns/day)    (hour/ns)
Performance:       18.699        1.283
Finished mdrun on rank 0 Tue May 17 12:03:43 2016

########################################################################
(5) 1 pull-code cylinder restraint with CPUs

/home/cneale/exec/GROMACS/exec/gromacs-5.1.2/serial/bin/gmx mdrun -notunepme -deffnm MD -dlb yes -npme 0 -cpt 60 -maxh 0.01 -cpi MD.cpt -ntmpi 4 -ntomp 6

        M E G A - F L O P S   A C C O U N T I N G

 NB=Group-cutoff nonbonded kernels    NxN=N-by-N cluster Verlet kernels
 RF=Reaction-Field  VdW=Van der Waals  QSTab=quadratic-spline table
 W3=SPC/TIP3p  W4=TIP4p (single or pairs)
 V&F=Potential and force  V=Potential only  F=Force only

 Computing:                               M-Number         M-Flops  % Flops
-----------------------------------------------------------------------------
 NB VdW [V&F]                           124.832000         124.832     0.0
 Pair Search distance check            2737.700316       24639.303     0.3
 NxN Ewald Elec. + LJ [F]             88646.835696     6914453.184    91.8
 NxN Ewald Elec. + LJ [V&F]             918.378464      118470.822     1.6
 NxN Ewald Elec. [F]                    760.931424       46416.817     0.6
 NxN Ewald Elec. [V&F]                    7.948128         667.643     0.0
 1,4 nonbonded interactions             174.265472       15683.892     0.2
 Calc Weights                           419.692986       15108.947     0.2
 Spread Q Bspline                      8953.450368       17906.901     0.2
 Gather F Bspline                      8953.450368       53720.702     0.7
 3D-FFT                               29911.284194      239290.274     3.2
 Solve PME                               10.548304         675.091     0.0
 Reset In Box                             5.630334          16.891     0.0
 CG-CoM                                   5.666196          16.999     0.0
 Bonds                                   24.467072        1443.557     0.0
 Propers                                216.209024       49511.866     0.7
 Impropers                                0.998656         207.720     0.0
 Virial                                  14.092422         253.664     0.0
 Update                                 139.897662        4336.828     0.1
 Stop-CM                                  1.470342          14.703     0.0
 Calc-Ekin                               56.016444        1512.444     0.0
 Lincs                                   85.716402        5142.984     0.1
 Lincs-Mat                              808.400608        3233.602     0.0
 Constraint-V                           329.341421        2634.731     0.0
 Constraint-Vir                          12.208318         293.000     0.0
 Settle                                  52.650234       17006.026     0.2
-----------------------------------------------------------------------------
 Total                                                 7532783.424   100.0
-----------------------------------------------------------------------------


    D O M A I N   D E C O M P O S I T I O N   S T A T I S T I C S

 av. #atoms communicated per step for force:  2 x 27887.4
 av. #atoms communicated per step for LINCS:  2 x 2139.2

 Average load imbalance: 0.7 %
 Part of the total run time spent waiting due to load imbalance: 0.5 %
 Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 0 %


     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

On 4 MPI ranks, each using 6 OpenMP threads

 Computing:          Num   Num      Call    Wall time         Giga-Cycles
                     Ranks Threads  Count      (s)         total sum    %
-----------------------------------------------------------------------------
 Domain decomp.         4    6        157       0.566         32.549   1.6
 DD comm. load          4    6        156       0.001          0.037   0.0
 DD comm. bounds        4    6        157       0.001          0.081   0.0
 Neighbor search        4    6        157       0.945         54.284   2.6
 Comm. coord.           4    6       3744       0.309         17.739   0.9
 Force                  4    6       3901      25.991       1493.580  72.3
 Wait + Comm. F         4    6       3901       0.338         19.445   0.9
 PME mesh               4    6       3901       4.483        257.627  12.5
 NB X/F buffer ops.     4    6      11389       0.217         12.483   0.6
 COM pull force         4    6       3901       0.781         44.862   2.2
 Write traj.            4    6          2       0.007          0.398   0.0
 Update                 4    6       7802       0.625         35.897   1.7
 Constraints            4    6       7802       1.463         84.079   4.1
 Comm. energies         4    6        781       0.010          0.568   0.0
 Rest                                           0.218         12.538   0.6
-----------------------------------------------------------------------------
 Total                                         35.955       2066.166 100.0
-----------------------------------------------------------------------------
 Breakdown of PME mesh computation
-----------------------------------------------------------------------------
 PME redist. X/F        4    6       7802       0.770         44.225   2.1
 PME spread/gather      4    6       7802       1.998        114.795   5.6
 PME 3D-FFT             4    6       7802       1.100         63.227   3.1
 PME 3D-FFT Comm.       4    6       7802       0.422         24.265   1.2
 PME solve Elec         4    6       3901       0.183         10.499   0.5
-----------------------------------------------------------------------------

               Core t (s)   Wall t (s)        (%)
       Time:      861.932       35.955     2397.2
                 (ns/day)    (hour/ns)
Performance:       18.748        1.280
Finished mdrun on rank 0 Tue May 17 12:18:35 2016

########################################################################
(6) 128 pull-code cylinder restraints with CPUs

/home/cneale/exec/GROMACS/exec/gromacs-5.1.2/serial/bin/gmx mdrun -notunepme -deffnm MD -dlb yes -npme 0 -cpt 60 -maxh 0.01 -cpi MD.cpt -ntmpi 4 -ntomp 6

        M E G A - F L O P S   A C C O U N T I N G

 NB=Group-cutoff nonbonded kernels    NxN=N-by-N cluster Verlet kernels
 RF=Reaction-Field  VdW=Van der Waals  QSTab=quadratic-spline table
 W3=SPC/TIP3p  W4=TIP4p (single or pairs)
 V&F=Potential and force  V=Potential only  F=Force only

 Computing:                               M-Number         M-Flops  % Flops
-----------------------------------------------------------------------------
 NB VdW [V&F]                            38.432000          38.432     0.0
 Pair Search distance check             840.167764        7561.510     0.3
 NxN Ewald Elec. + LJ [F]             26708.049824     2083227.886    91.6
 NxN Ewald Elec. + LJ [V&F]             292.006328       37668.816     1.7
 NxN Ewald Elec. [F]                    232.720624       14195.958     0.6
 NxN Ewald Elec. [V&F]                    2.515480         211.300     0.0
 1,4 nonbonded interactions              53.651072        4828.596     0.2
 Calc Weights                           129.210786        4651.588     0.2
 Spread Q Bspline                      2756.496768        5512.994     0.2
 Gather F Bspline                      2756.496768       16538.981     0.7
 3D-FFT                                9208.780394       73670.243     3.2
 Solve PME                                3.247504         207.840     0.0
 Reset In Box                             1.757238           5.272     0.0
 CG-CoM                                   1.793100           5.379     0.0
 Bonds                                    7.532672         444.428     0.0
 Propers                                 66.564224       15243.207     0.7
 Impropers                                0.307456          63.951     0.0
 Virial                                   4.361082          78.499     0.0
 Update                                  43.070262        1335.178     0.1
 Stop-CM                                  0.502068           5.021     0.0
 Calc-Ekin                               17.285484         466.708     0.0
 Lincs                                   26.408450        1584.507     0.1
 Lincs-Mat                              248.955744         995.823     0.0
 Constraint-V                           101.508771         812.070     0.0
 Constraint-Vir                           3.782199          90.773     0.0
 Settle                                  16.244652        5247.023     0.2
-----------------------------------------------------------------------------
 Total                                                 2274691.984   100.0
-----------------------------------------------------------------------------


    D O M A I N   D E C O M P O S I T I O N   S T A T I S T I C S

 av. #atoms communicated per step for force:  2 x 27901.5
 av. #atoms communicated per step for LINCS:  2 x 2174.5

 Average load imbalance: 0.5 %
 Part of the total run time spent waiting due to load imbalance: 0.2 %
 Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 0 %


     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

On 4 MPI ranks, each using 6 OpenMP threads

 Computing:          Num   Num      Call    Wall time         Giga-Cycles
                     Ranks Threads  Count      (s)         total sum    %
-----------------------------------------------------------------------------
 Domain decomp.         4    6         49       0.259         14.878   0.7
 DD comm. load          4    6         48       0.001          0.074   0.0
 DD comm. bounds        4    6         49       0.001          0.050   0.0
 Neighbor search        4    6         49       0.386         22.175   1.1
 Comm. coord.           4    6       1152       0.130          7.454   0.4
 Force                  4    6       1201      12.970        745.342  35.5
 Wait + Comm. F         4    6       1201       0.112          6.455   0.3
 PME mesh               4    6       1201       1.628         93.562   4.5
 NB X/F buffer ops.     4    6       3505       0.084          4.824   0.2
 COM pull force         4    6       1201      19.986       1148.495  54.7
 Write traj.            4    6          2       0.008          0.432   0.0
 Update                 4    6       2402       0.270         15.533   0.7
 Constraints            4    6       2402       0.634         36.446   1.7
 Comm. energies         4    6        241       0.012          0.671   0.0
 Rest                                           0.084          4.808   0.2
-----------------------------------------------------------------------------
 Total                                         36.565       2101.199 100.0
-----------------------------------------------------------------------------
 Breakdown of PME mesh computation
-----------------------------------------------------------------------------
 PME redist. X/F        4    6       2402       0.335         19.241   0.9
 PME spread/gather      4    6       2402       0.705         40.526   1.9
 PME 3D-FFT             4    6       2402       0.375         21.538   1.0
 PME 3D-FFT Comm.       4    6       2402       0.148          8.522   0.4
 PME solve Elec         4    6       1201       0.061          3.501   0.2
-----------------------------------------------------------------------------

               Core t (s)   Wall t (s)        (%)
       Time:      838.962       36.565     2294.4
                 (ns/day)    (hour/ns)
Performance:        5.676        4.229
Finished mdrun on rank 0 Tue May 17 12:04:46 2016

########################################################################
(7) 128 pull-code cylinder restraints with CPUs -- Gromacs-2016-beta1

/home/cneale/exec/GROMACS/exec/gromacs-2016-beta1/serial/bin/gmx mdrun -notunepme -deffnm MD -dlb yes -npme 0 -cpt 60 -maxh 0.01 -cpi MD.cpt -ntmpi 4 -ntomp 6

        M E G A - F L O P S   A C C O U N T I N G

 NB=Group-cutoff nonbonded kernels    NxN=N-by-N cluster Verlet kernels
 RF=Reaction-Field  VdW=Van der Waals  QSTab=quadratic-spline table
 W3=SPC/TIP3p  W4=TIP4p (single or pairs)
 V&F=Potential and force  V=Potential only  F=Force only

 Computing:                               M-Number         M-Flops  % Flops
-----------------------------------------------------------------------------
 NB VdW [V&F]                            38.432000          38.432     0.0
 Pair Search distance check             840.219076        7561.972     0.3
 NxN Ewald Elec. + LJ [F]             26707.387856     2083176.253    91.6
 NxN Ewald Elec. + LJ [V&F]             292.001792       37668.231     1.7
 NxN Ewald Elec. [F]                    227.934000       13903.974     0.6
 NxN Ewald Elec. [V&F]                    2.463104         206.901     0.0
 1,4 nonbonded interactions              53.651072        4828.596     0.2
 Calc Weights                           129.210786        4651.588     0.2
 Spread Q Bspline                      2756.496768        5512.994     0.2
 Gather F Bspline                      2756.496768       16538.981     0.7
 3D-FFT                                9208.780394       73670.243     3.2
 Solve PME                                3.247504         207.840     0.0
 Reset In Box                             1.757238           5.272     0.0
 CG-CoM                                   1.793100           5.379     0.0
 Bonds                                    7.532672         444.428     0.0
 Propers                                 66.564224       15243.207     0.7
 Impropers                                0.307456          63.951     0.0
 Virial                                   4.361082          78.499     0.0
 Update                                  43.070262        1335.178     0.1
 Stop-CM                                  0.502068           5.021     0.0
 Calc-Ekin                               17.285484         466.708     0.0
 Lincs                                   26.446614        1586.797     0.1
 Lincs-Mat                              249.084384         996.338     0.0
 Constraint-V                           101.492027         811.936     0.0
 Constraint-Vir                           3.779269          90.702     0.0
 Settle                                  16.213628        5237.002     0.2
-----------------------------------------------------------------------------
 Total                                                 2274336.423   100.0
-----------------------------------------------------------------------------


    D O M A I N   D E C O M P O S I T I O N   S T A T I S T I C S

 av. #atoms communicated per step for force:  2 x 27913.4
 av. #atoms communicated per step for LINCS:  2 x 2158.1

 Average load imbalance: 0.5 %
 Part of the total run time spent waiting due to load imbalance: 0.2 %
 Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 0 %


     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

On 4 MPI ranks, each using 6 OpenMP threads

 Computing:          Num   Num      Call    Wall time         Giga-Cycles
                     Ranks Threads  Count      (s)         total sum    %
-----------------------------------------------------------------------------
 Domain decomp.         4    6         49       0.258         14.812   0.7
 DD comm. load          4    6         48       0.001          0.085   0.0
 DD comm. bounds        4    6         49       0.001          0.062   0.0
 Neighbor search        4    6         49       0.385         22.119   1.1
 Comm. coord.           4    6       1152       0.126          7.236   0.3
 Force                  4    6       1201      13.030        748.785  35.6
 Wait + Comm. F         4    6       1201       0.109          6.240   0.3
 PME mesh               4    6       1201       1.576         90.561   4.3
 NB X/F buffer ops.     4    6       3505       0.084          4.818   0.2
 COM pull force         4    6       1201      20.193       1160.368  55.2
 Write traj.            4    6          2       0.008          0.448   0.0
 Update                 4    6       2402       0.249         14.334   0.7
 Constraints            4    6       2402       0.470         27.004   1.3
 Comm. energies         4    6        241       0.010          0.593   0.0
 Rest                                           0.077          4.410   0.2
-----------------------------------------------------------------------------
 Total                                         36.577       2101.875 100.0
-----------------------------------------------------------------------------
 Breakdown of PME mesh computation
-----------------------------------------------------------------------------
 PME redist. X/F        4    6       2402       0.332         19.072   0.9
 PME spread/gather      4    6       2402       0.677         38.903   1.9
 PME 3D-FFT             4    6       2402       0.370         21.254   1.0
 PME 3D-FFT Comm.       4    6       2402       0.135          7.768   0.4
 PME solve Elec         4    6       1201       0.058          3.339   0.2
-----------------------------------------------------------------------------

               Core t (s)   Wall t (s)        (%)
       Time:      835.374       36.577     2283.9
                 (ns/day)    (hour/ns)
Performance:        5.674        4.230
Finished mdrun on rank 0 Tue May 17 12:56:30 2016




More information about the gromacs.org_gmx-users mailing list