[gmx-users] Pull code slows down mdrun -- is this expected?
Christopher Neale
chris.neale at alum.utoronto.ca
Thu May 19 18:10:10 CEST 2016
Dear Szilárd:
I think that my ranks were set up similarly for GPU and non-GPU runs, though you can see how I ran the jobs below. One thing that I also already checked, but that I have not pasted here, is that all runs got the same DD and PME grid reported from the log file (4x1x1 for each in all cases), and that the nstlist/rlist modification was always the same (promoted to nstlist=40 / rlist=1.253)
Below I provide the run command and the end of the log files for:
(1) no pull-code with GPUs -- v5.1.2
(2) 1 pull-code cylinder restraint with GPUs -- v5.1.2
(3) 128 pull-code cylinder restraints with GPUs -- v5.1.2
(4) no pull-code with CPUs -- v5.1.2
(5) 1 pull-code cylinder restraint with CPUs -- v5.1.2
(6) 128 pull-code cylinder restraints with CPUs -- v5.1.2
(7) 128 pull-code cylinder restraints with CPUs -- Gromacs-2016-beta1
########################################################################
(1) no pull-code with GPUs
/home/cneale/exec/GROMACS/exec/gromacs-5.1.2/gpu_serial/bin/gmx mdrun -notunepme -deffnm MD -dlb yes -npme 0 -cpt 60 -maxh 0.01 -cpi MD.cpt -ntmpi 4 -ntomp 6 -gpu_id 0123
M E G A - F L O P S A C C O U N T I N G
NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels
RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
W3=SPC/TIP3p W4=TIP4p (single or pairs)
V&F=Potential and force V=Potential only F=Force only
Computing: M-Number M-Flops % Flops
-----------------------------------------------------------------------------
NB VdW [V&F] 364.832000 364.832 0.0
Pair Search distance check 1620.797488 14587.177 0.0
NxN Ewald Elec. + LJ [F] 483561.291648 37717780.749 95.2
NxN Ewald Elec. + LJ [V&F] 4927.352384 635628.458 1.6
1,4 nonbonded interactions 509.305472 45837.492 0.1
Calc Weights 1226.587986 44157.167 0.1
Spread Q Bspline 26167.210368 52334.421 0.1
Gather F Bspline 26167.210368 157003.262 0.4
3D-FFT 87418.239194 699345.914 1.8
Solve PME 30.828304 1973.011 0.0
Reset In Box 10.256532 30.770 0.0
CG-CoM 10.292394 30.877 0.0
Bonds 71.507072 4218.917 0.0
Propers 631.889024 144702.586 0.4
Impropers 2.918656 607.080 0.0
Virial 41.123922 740.231 0.0
Update 408.862662 12674.743 0.0
Stop-CM 4.159992 41.600 0.0
Calc-Ekin 81.837084 2209.601 0.0
Lincs 249.823000 14989.380 0.0
Lincs-Mat 2355.109984 9420.440 0.0
Constraint-V 961.241869 7689.935 0.0
Constraint-Vir 35.598117 854.355 0.0
Settle 153.879318 49703.020 0.1
-----------------------------------------------------------------------------
Total 39616926.018 100.0
-----------------------------------------------------------------------------
D O M A I N D E C O M P O S I T I O N S T A T I S T I C S
av. #atoms communicated per step for force: 2 x 28471.3
av. #atoms communicated per step for LINCS: 2 x 2103.2
Average load imbalance: 0.5 %
Part of the total run time spent waiting due to load imbalance: 0.2 %
Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 0 %
R E A L C Y C L E A N D T I M E A C C O U N T I N G
On 4 MPI ranks, each using 6 OpenMP threads
Computing: Num Num Call Wall time Giga-Cycles
Ranks Threads Count (s) total sum %
-----------------------------------------------------------------------------
Domain decomp. 4 6 286 1.093 62.827 3.0
DD comm. load 4 6 285 0.002 0.123 0.0
DD comm. bounds 4 6 286 0.004 0.251 0.0
Neighbor search 4 6 286 0.581 33.383 1.6
Launch GPU ops. 4 6 22802 0.962 55.297 2.7
Comm. coord. 4 6 11115 0.839 48.251 2.3
Force 4 6 11401 11.241 646.160 31.3
Wait + Comm. F 4 6 11401 0.803 46.166 2.2
PME mesh 4 6 11401 12.488 717.838 34.8
Wait GPU nonlocal 4 6 11401 0.080 4.614 0.2
Wait GPU local 4 6 11401 0.030 1.697 0.1
NB X/F buffer ops. 4 6 45032 0.474 27.265 1.3
Write traj. 4 6 3 0.009 0.518 0.0
Update 4 6 22802 1.973 113.431 5.5
Constraints 4 6 22802 4.520 259.834 12.6
Comm. energies 4 6 1141 0.020 1.168 0.1
Rest 0.754 43.343 2.1
-----------------------------------------------------------------------------
Total 35.876 2062.165 100.0
-----------------------------------------------------------------------------
Breakdown of PME mesh computation
-----------------------------------------------------------------------------
PME redist. X/F 4 6 22802 1.619 93.076 4.5
PME spread/gather 4 6 22802 5.835 335.390 16.3
PME 3D-FFT 4 6 22802 3.061 175.947 8.5
PME 3D-FFT Comm. 4 6 22802 1.229 70.636 3.4
PME solve Elec 4 6 11401 0.714 41.044 2.0
-----------------------------------------------------------------------------
Core t (s) Wall t (s) (%)
Time: 859.895 35.876 2396.9
(ns/day) (hour/ns)
Performance: 54.914 0.437
Finished mdrun on rank 0 Tue May 17 11:57:59 2016
########################################################################
(2) 1 pull-code cylinder restraint with GPUs
/home/cneale/exec/GROMACS/exec/gromacs-5.1.2/gpu_serial/bin/gmx mdrun -notunepme -deffnm MD -dlb yes -npme 0 -cpt 60 -maxh 0.01 -cpi MD.cpt -ntmpi 4 -ntomp 6 -gpu_id 0123
M E G A - F L O P S A C C O U N T I N G
NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels
RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
W3=SPC/TIP3p W4=TIP4p (single or pairs)
V&F=Potential and force V=Potential only F=Force only
Computing: M-Number M-Flops % Flops
-----------------------------------------------------------------------------
NB VdW [V&F] 341.792000 341.792 0.0
Pair Search distance check 1523.886864 13714.982 0.0
NxN Ewald Elec. + LJ [F] 453747.480576 35392303.485 95.2
NxN Ewald Elec. + LJ [V&F] 4634.734080 597880.696 1.6
1,4 nonbonded interactions 477.141632 42942.747 0.1
Calc Weights 1149.126066 41368.538 0.1
Spread Q Bspline 24514.689408 49029.379 0.1
Gather F Bspline 24514.689408 147088.136 0.4
3D-FFT 81897.571514 655180.572 1.8
Solve PME 28.881424 1848.411 0.0
Reset In Box 9.611016 28.833 0.0
CG-CoM 9.646878 28.941 0.0
Bonds 66.991232 3952.483 0.0
Propers 591.983744 135564.277 0.4
Impropers 2.734336 568.742 0.0
Virial 38.528898 693.520 0.0
Update 383.042022 11874.303 0.0
Stop-CM 3.873096 38.731 0.0
Calc-Ekin 76.672956 2070.170 0.0
Lincs 234.398906 14063.934 0.0
Lincs-Mat 2209.202496 8836.810 0.0
Constraint-V 901.536603 7212.293 0.0
Constraint-Vir 33.383999 801.216 0.0
Settle 144.260292 46596.074 0.1
-----------------------------------------------------------------------------
Total 37174029.066 100.0
-----------------------------------------------------------------------------
D O M A I N D E C O M P O S I T I O N S T A T I S T I C S
av. #atoms communicated per step for force: 2 x 28747.6
av. #atoms communicated per step for LINCS: 2 x 2141.1
Average load imbalance: 0.7 %
Part of the total run time spent waiting due to load imbalance: 0.2 %
Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 0 %
R E A L C Y C L E A N D T I M E A C C O U N T I N G
On 4 MPI ranks, each using 6 OpenMP threads
Computing: Num Num Call Wall time Giga-Cycles
Ranks Threads Count (s) total sum %
-----------------------------------------------------------------------------
Domain decomp. 4 6 268 1.036 59.528 2.9
DD comm. load 4 6 267 0.001 0.085 0.0
DD comm. bounds 4 6 268 0.002 0.123 0.0
Neighbor search 4 6 268 0.546 31.386 1.5
Launch GPU ops. 4 6 21362 0.912 52.395 2.5
Comm. coord. 4 6 10413 0.784 45.045 2.2
Force 4 6 10681 10.529 605.019 29.3
Wait + Comm. F 4 6 10681 0.762 43.803 2.1
PME mesh 4 6 10681 11.761 675.817 32.7
Wait GPU nonlocal 4 6 10681 0.075 4.306 0.2
Wait GPU local 4 6 10681 0.027 1.573 0.1
NB X/F buffer ops. 4 6 42188 0.435 24.972 1.2
COM pull force 4 6 10681 2.233 128.302 6.2
Write traj. 4 6 3 0.009 0.496 0.0
Update 4 6 21362 1.844 105.992 5.1
Constraints 4 6 21362 4.252 244.328 11.8
Comm. energies 4 6 1069 0.015 0.875 0.0
Rest 0.705 40.524 2.0
-----------------------------------------------------------------------------
Total 35.928 2064.569 100.0
-----------------------------------------------------------------------------
Breakdown of PME mesh computation
-----------------------------------------------------------------------------
PME redist. X/F 4 6 21362 1.561 89.718 4.3
PME spread/gather 4 6 21362 5.478 314.785 15.2
PME 3D-FFT 4 6 21362 2.862 164.444 8.0
PME 3D-FFT Comm. 4 6 21362 1.162 66.773 3.2
PME solve Elec 4 6 10681 0.669 38.447 1.9
-----------------------------------------------------------------------------
Core t (s) Wall t (s) (%)
Time: 860.923 35.928 2396.3
(ns/day) (hour/ns)
Performance: 51.372 0.467
Finished mdrun on rank 0 Wed May 18 09:49:04 2016
########################################################################
(3) 128 pull-code cylinder restraints with GPUs
/home/cneale/exec/GROMACS/exec/gromacs-5.1.2/gpu_serial/bin/gmx mdrun -notunepme -deffnm MD -dlb yes -npme 0 -cpt 60 -maxh 0.01 -cpi MD.cpt -ntmpi 4 -ntomp 6 -gpu_id 0123
M E G A - F L O P S A C C O U N T I N G
NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels
RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
W3=SPC/TIP3p W4=TIP4p (single or pairs)
V&F=Potential and force V=Potential only F=Force only
Computing: M-Number M-Flops % Flops
-----------------------------------------------------------------------------
NB VdW [V&F] 60.192000 60.192 0.0
Pair Search distance check 272.833312 2455.500 0.0
NxN Ewald Elec. + LJ [F] 77397.539264 6037008.063 95.0
NxN Ewald Elec. + LJ [V&F] 831.930304 107319.009 1.7
1,4 nonbonded interactions 84.028032 7562.523 0.1
Calc Weights 202.369266 7285.294 0.1
Spread Q Bspline 4317.211008 8634.422 0.1
Gather F Bspline 4317.211008 25903.266 0.4
3D-FFT 14422.744314 115381.955 1.8
Solve PME 5.086224 325.518 0.0
Reset In Box 1.721376 5.164 0.0
CG-CoM 1.757238 5.272 0.0
Bonds 11.797632 696.060 0.0
Propers 104.252544 23873.833 0.4
Impropers 0.481536 100.159 0.0
Virial 6.811938 122.615 0.0
Update 67.456422 2091.149 0.0
Stop-CM 0.717240 7.172 0.0
Calc-Ekin 13.555836 366.008 0.0
Lincs 41.348386 2480.903 0.0
Lincs-Mat 389.821664 1559.287 0.0
Constraint-V 158.857573 1270.861 0.0
Constraint-Vir 5.902560 141.661 0.0
Settle 25.400962 8204.511 0.1
-----------------------------------------------------------------------------
Total 6352860.396 100.0
-----------------------------------------------------------------------------
D O M A I N D E C O M P O S I T I O N S T A T I S T I C S
av. #atoms communicated per step for force: 2 x 28748.6
av. #atoms communicated per step for LINCS: 2 x 2147.1
Average load imbalance: 1.7 %
Part of the total run time spent waiting due to load imbalance: 0.2 %
Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 0 %
R E A L C Y C L E A N D T I M E A C C O U N T I N G
On 4 MPI ranks, each using 6 OpenMP threads
Computing: Num Num Call Wall time Giga-Cycles
Ranks Threads Count (s) total sum %
-----------------------------------------------------------------------------
Domain decomp. 4 6 48 0.250 14.358 0.7
DD comm. load 4 6 47 0.002 0.098 0.0
DD comm. bounds 4 6 48 0.001 0.051 0.0
Neighbor search 4 6 48 0.138 7.930 0.4
Launch GPU ops. 4 6 3762 0.209 11.990 0.6
Comm. coord. 4 6 1833 0.187 10.771 0.5
Force 4 6 1881 3.479 199.905 9.4
Wait + Comm. F 4 6 1881 0.149 8.587 0.4
PME mesh 4 6 1881 3.791 217.825 10.3
Wait GPU nonlocal 4 6 1881 0.019 1.094 0.1
Wait GPU local 4 6 1881 0.005 0.309 0.0
NB X/F buffer ops. 4 6 7428 0.113 6.514 0.3
COM pull force 4 6 1881 26.652 1531.533 72.3
Write traj. 4 6 2 0.008 0.435 0.0
Update 4 6 3762 0.562 32.308 1.5
Constraints 4 6 3762 1.152 66.171 3.1
Comm. energies 4 6 189 0.009 0.521 0.0
Rest 0.144 8.271 0.4
-----------------------------------------------------------------------------
Total 36.869 2118.672 100.0
-----------------------------------------------------------------------------
Breakdown of PME mesh computation
-----------------------------------------------------------------------------
PME redist. X/F 4 6 3762 0.527 30.259 1.4
PME spread/gather 4 6 3762 1.893 108.802 5.1
PME 3D-FFT 4 6 3762 0.956 54.953 2.6
PME 3D-FFT Comm. 4 6 3762 0.267 15.330 0.7
PME solve Elec 4 6 1881 0.142 8.149 0.4
-----------------------------------------------------------------------------
Core t (s) Wall t (s) (%)
Time: 883.599 36.869 2396.6
(ns/day) (hour/ns)
Performance: 8.816 2.722
Finished mdrun on rank 0 Tue May 17 12:01:03 2016
########################################################################
(4) no pull-code with CPUs
/home/cneale/exec/GROMACS/exec/gromacs-5.1.2/serial/bin/gmx mdrun -notunepme -deffnm MD -dlb yes -npme 0 -cpt 60 -maxh 0.01 -cpi MD.cpt -ntmpi 4 -ntomp 6
M E G A - F L O P S A C C O U N T I N G
NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels
RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
W3=SPC/TIP3p W4=TIP4p (single or pairs)
V&F=Potential and force V=Potential only F=Force only
Computing: M-Number M-Flops % Flops
-----------------------------------------------------------------------------
NB VdW [V&F] 124.832000 124.832 0.0
Pair Search distance check 2741.405650 24672.651 0.3
NxN Ewald Elec. + LJ [F] 88748.544456 6922386.468 91.8
NxN Ewald Elec. + LJ [V&F] 919.460288 118610.377 1.6
NxN Ewald Elec. [F] 759.291832 46316.802 0.6
NxN Ewald Elec. [V&F] 7.821440 657.001 0.0
1,4 nonbonded interactions 174.265472 15683.892 0.2
Calc Weights 419.692986 15108.947 0.2
Spread Q Bspline 8953.450368 17906.901 0.2
Gather F Bspline 8953.450368 53720.702 0.7
3D-FFT 29911.284194 239290.274 3.2
Solve PME 10.548304 675.091 0.0
Reset In Box 5.630334 16.891 0.0
CG-CoM 5.666196 16.999 0.0
Bonds 24.467072 1443.557 0.0
Propers 216.209024 49511.866 0.7
Impropers 0.998656 207.720 0.0
Virial 14.092422 253.664 0.0
Update 139.897662 4336.828 0.1
Stop-CM 1.470342 14.703 0.0
Calc-Ekin 56.016444 1512.444 0.0
Lincs 85.537242 5132.235 0.1
Lincs-Mat 805.565504 3222.262 0.0
Constraint-V 329.043485 2632.348 0.0
Constraint-Vir 12.202321 292.856 0.0
Settle 52.670362 17012.527 0.2
-----------------------------------------------------------------------------
Total 7540760.838 100.0
-----------------------------------------------------------------------------
D O M A I N D E C O M P O S I T I O N S T A T I S T I C S
av. #atoms communicated per step for force: 2 x 27844.0
av. #atoms communicated per step for LINCS: 2 x 2116.4
Average load imbalance: 0.6 %
Part of the total run time spent waiting due to load imbalance: 0.4 %
Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 0 %
R E A L C Y C L E A N D T I M E A C C O U N T I N G
On 4 MPI ranks, each using 6 OpenMP threads
Computing: Num Num Call Wall time Giga-Cycles
Ranks Threads Count (s) total sum %
-----------------------------------------------------------------------------
Domain decomp. 4 6 157 0.553 31.778 1.5
DD comm. load 4 6 156 0.001 0.048 0.0
DD comm. bounds 4 6 157 0.001 0.072 0.0
Neighbor search 4 6 157 0.942 54.112 2.6
Comm. coord. 4 6 3744 0.307 17.632 0.9
Force 4 6 3901 26.106 1500.166 72.4
Wait + Comm. F 4 6 3901 0.334 19.200 0.9
PME mesh 4 6 3901 5.275 303.154 14.6
NB X/F buffer ops. 4 6 11389 0.215 12.369 0.6
Write traj. 4 6 2 0.007 0.426 0.0
Update 4 6 7802 0.625 35.930 1.7
Constraints 4 6 7802 1.454 83.564 4.0
Comm. energies 4 6 781 0.010 0.590 0.0
Rest 0.217 12.484 0.6
-----------------------------------------------------------------------------
Total 36.049 2071.527 100.0
-----------------------------------------------------------------------------
Breakdown of PME mesh computation
-----------------------------------------------------------------------------
PME redist. X/F 4 6 7802 0.766 44.033 2.1
PME spread/gather 4 6 7802 2.644 151.952 7.3
PME 3D-FFT 4 6 7802 1.190 68.385 3.3
PME 3D-FFT Comm. 4 6 7802 0.481 27.663 1.3
PME solve Elec 4 6 3901 0.183 10.497 0.5
-----------------------------------------------------------------------------
Core t (s) Wall t (s) (%)
Time: 864.421 36.049 2397.9
(ns/day) (hour/ns)
Performance: 18.699 1.283
Finished mdrun on rank 0 Tue May 17 12:03:43 2016
########################################################################
(5) 1 pull-code cylinder restraint with CPUs
/home/cneale/exec/GROMACS/exec/gromacs-5.1.2/serial/bin/gmx mdrun -notunepme -deffnm MD -dlb yes -npme 0 -cpt 60 -maxh 0.01 -cpi MD.cpt -ntmpi 4 -ntomp 6
M E G A - F L O P S A C C O U N T I N G
NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels
RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
W3=SPC/TIP3p W4=TIP4p (single or pairs)
V&F=Potential and force V=Potential only F=Force only
Computing: M-Number M-Flops % Flops
-----------------------------------------------------------------------------
NB VdW [V&F] 124.832000 124.832 0.0
Pair Search distance check 2737.700316 24639.303 0.3
NxN Ewald Elec. + LJ [F] 88646.835696 6914453.184 91.8
NxN Ewald Elec. + LJ [V&F] 918.378464 118470.822 1.6
NxN Ewald Elec. [F] 760.931424 46416.817 0.6
NxN Ewald Elec. [V&F] 7.948128 667.643 0.0
1,4 nonbonded interactions 174.265472 15683.892 0.2
Calc Weights 419.692986 15108.947 0.2
Spread Q Bspline 8953.450368 17906.901 0.2
Gather F Bspline 8953.450368 53720.702 0.7
3D-FFT 29911.284194 239290.274 3.2
Solve PME 10.548304 675.091 0.0
Reset In Box 5.630334 16.891 0.0
CG-CoM 5.666196 16.999 0.0
Bonds 24.467072 1443.557 0.0
Propers 216.209024 49511.866 0.7
Impropers 0.998656 207.720 0.0
Virial 14.092422 253.664 0.0
Update 139.897662 4336.828 0.1
Stop-CM 1.470342 14.703 0.0
Calc-Ekin 56.016444 1512.444 0.0
Lincs 85.716402 5142.984 0.1
Lincs-Mat 808.400608 3233.602 0.0
Constraint-V 329.341421 2634.731 0.0
Constraint-Vir 12.208318 293.000 0.0
Settle 52.650234 17006.026 0.2
-----------------------------------------------------------------------------
Total 7532783.424 100.0
-----------------------------------------------------------------------------
D O M A I N D E C O M P O S I T I O N S T A T I S T I C S
av. #atoms communicated per step for force: 2 x 27887.4
av. #atoms communicated per step for LINCS: 2 x 2139.2
Average load imbalance: 0.7 %
Part of the total run time spent waiting due to load imbalance: 0.5 %
Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 0 %
R E A L C Y C L E A N D T I M E A C C O U N T I N G
On 4 MPI ranks, each using 6 OpenMP threads
Computing: Num Num Call Wall time Giga-Cycles
Ranks Threads Count (s) total sum %
-----------------------------------------------------------------------------
Domain decomp. 4 6 157 0.566 32.549 1.6
DD comm. load 4 6 156 0.001 0.037 0.0
DD comm. bounds 4 6 157 0.001 0.081 0.0
Neighbor search 4 6 157 0.945 54.284 2.6
Comm. coord. 4 6 3744 0.309 17.739 0.9
Force 4 6 3901 25.991 1493.580 72.3
Wait + Comm. F 4 6 3901 0.338 19.445 0.9
PME mesh 4 6 3901 4.483 257.627 12.5
NB X/F buffer ops. 4 6 11389 0.217 12.483 0.6
COM pull force 4 6 3901 0.781 44.862 2.2
Write traj. 4 6 2 0.007 0.398 0.0
Update 4 6 7802 0.625 35.897 1.7
Constraints 4 6 7802 1.463 84.079 4.1
Comm. energies 4 6 781 0.010 0.568 0.0
Rest 0.218 12.538 0.6
-----------------------------------------------------------------------------
Total 35.955 2066.166 100.0
-----------------------------------------------------------------------------
Breakdown of PME mesh computation
-----------------------------------------------------------------------------
PME redist. X/F 4 6 7802 0.770 44.225 2.1
PME spread/gather 4 6 7802 1.998 114.795 5.6
PME 3D-FFT 4 6 7802 1.100 63.227 3.1
PME 3D-FFT Comm. 4 6 7802 0.422 24.265 1.2
PME solve Elec 4 6 3901 0.183 10.499 0.5
-----------------------------------------------------------------------------
Core t (s) Wall t (s) (%)
Time: 861.932 35.955 2397.2
(ns/day) (hour/ns)
Performance: 18.748 1.280
Finished mdrun on rank 0 Tue May 17 12:18:35 2016
########################################################################
(6) 128 pull-code cylinder restraints with CPUs
/home/cneale/exec/GROMACS/exec/gromacs-5.1.2/serial/bin/gmx mdrun -notunepme -deffnm MD -dlb yes -npme 0 -cpt 60 -maxh 0.01 -cpi MD.cpt -ntmpi 4 -ntomp 6
M E G A - F L O P S A C C O U N T I N G
NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels
RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
W3=SPC/TIP3p W4=TIP4p (single or pairs)
V&F=Potential and force V=Potential only F=Force only
Computing: M-Number M-Flops % Flops
-----------------------------------------------------------------------------
NB VdW [V&F] 38.432000 38.432 0.0
Pair Search distance check 840.167764 7561.510 0.3
NxN Ewald Elec. + LJ [F] 26708.049824 2083227.886 91.6
NxN Ewald Elec. + LJ [V&F] 292.006328 37668.816 1.7
NxN Ewald Elec. [F] 232.720624 14195.958 0.6
NxN Ewald Elec. [V&F] 2.515480 211.300 0.0
1,4 nonbonded interactions 53.651072 4828.596 0.2
Calc Weights 129.210786 4651.588 0.2
Spread Q Bspline 2756.496768 5512.994 0.2
Gather F Bspline 2756.496768 16538.981 0.7
3D-FFT 9208.780394 73670.243 3.2
Solve PME 3.247504 207.840 0.0
Reset In Box 1.757238 5.272 0.0
CG-CoM 1.793100 5.379 0.0
Bonds 7.532672 444.428 0.0
Propers 66.564224 15243.207 0.7
Impropers 0.307456 63.951 0.0
Virial 4.361082 78.499 0.0
Update 43.070262 1335.178 0.1
Stop-CM 0.502068 5.021 0.0
Calc-Ekin 17.285484 466.708 0.0
Lincs 26.408450 1584.507 0.1
Lincs-Mat 248.955744 995.823 0.0
Constraint-V 101.508771 812.070 0.0
Constraint-Vir 3.782199 90.773 0.0
Settle 16.244652 5247.023 0.2
-----------------------------------------------------------------------------
Total 2274691.984 100.0
-----------------------------------------------------------------------------
D O M A I N D E C O M P O S I T I O N S T A T I S T I C S
av. #atoms communicated per step for force: 2 x 27901.5
av. #atoms communicated per step for LINCS: 2 x 2174.5
Average load imbalance: 0.5 %
Part of the total run time spent waiting due to load imbalance: 0.2 %
Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 0 %
R E A L C Y C L E A N D T I M E A C C O U N T I N G
On 4 MPI ranks, each using 6 OpenMP threads
Computing: Num Num Call Wall time Giga-Cycles
Ranks Threads Count (s) total sum %
-----------------------------------------------------------------------------
Domain decomp. 4 6 49 0.259 14.878 0.7
DD comm. load 4 6 48 0.001 0.074 0.0
DD comm. bounds 4 6 49 0.001 0.050 0.0
Neighbor search 4 6 49 0.386 22.175 1.1
Comm. coord. 4 6 1152 0.130 7.454 0.4
Force 4 6 1201 12.970 745.342 35.5
Wait + Comm. F 4 6 1201 0.112 6.455 0.3
PME mesh 4 6 1201 1.628 93.562 4.5
NB X/F buffer ops. 4 6 3505 0.084 4.824 0.2
COM pull force 4 6 1201 19.986 1148.495 54.7
Write traj. 4 6 2 0.008 0.432 0.0
Update 4 6 2402 0.270 15.533 0.7
Constraints 4 6 2402 0.634 36.446 1.7
Comm. energies 4 6 241 0.012 0.671 0.0
Rest 0.084 4.808 0.2
-----------------------------------------------------------------------------
Total 36.565 2101.199 100.0
-----------------------------------------------------------------------------
Breakdown of PME mesh computation
-----------------------------------------------------------------------------
PME redist. X/F 4 6 2402 0.335 19.241 0.9
PME spread/gather 4 6 2402 0.705 40.526 1.9
PME 3D-FFT 4 6 2402 0.375 21.538 1.0
PME 3D-FFT Comm. 4 6 2402 0.148 8.522 0.4
PME solve Elec 4 6 1201 0.061 3.501 0.2
-----------------------------------------------------------------------------
Core t (s) Wall t (s) (%)
Time: 838.962 36.565 2294.4
(ns/day) (hour/ns)
Performance: 5.676 4.229
Finished mdrun on rank 0 Tue May 17 12:04:46 2016
########################################################################
(7) 128 pull-code cylinder restraints with CPUs -- Gromacs-2016-beta1
/home/cneale/exec/GROMACS/exec/gromacs-2016-beta1/serial/bin/gmx mdrun -notunepme -deffnm MD -dlb yes -npme 0 -cpt 60 -maxh 0.01 -cpi MD.cpt -ntmpi 4 -ntomp 6
M E G A - F L O P S A C C O U N T I N G
NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels
RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
W3=SPC/TIP3p W4=TIP4p (single or pairs)
V&F=Potential and force V=Potential only F=Force only
Computing: M-Number M-Flops % Flops
-----------------------------------------------------------------------------
NB VdW [V&F] 38.432000 38.432 0.0
Pair Search distance check 840.219076 7561.972 0.3
NxN Ewald Elec. + LJ [F] 26707.387856 2083176.253 91.6
NxN Ewald Elec. + LJ [V&F] 292.001792 37668.231 1.7
NxN Ewald Elec. [F] 227.934000 13903.974 0.6
NxN Ewald Elec. [V&F] 2.463104 206.901 0.0
1,4 nonbonded interactions 53.651072 4828.596 0.2
Calc Weights 129.210786 4651.588 0.2
Spread Q Bspline 2756.496768 5512.994 0.2
Gather F Bspline 2756.496768 16538.981 0.7
3D-FFT 9208.780394 73670.243 3.2
Solve PME 3.247504 207.840 0.0
Reset In Box 1.757238 5.272 0.0
CG-CoM 1.793100 5.379 0.0
Bonds 7.532672 444.428 0.0
Propers 66.564224 15243.207 0.7
Impropers 0.307456 63.951 0.0
Virial 4.361082 78.499 0.0
Update 43.070262 1335.178 0.1
Stop-CM 0.502068 5.021 0.0
Calc-Ekin 17.285484 466.708 0.0
Lincs 26.446614 1586.797 0.1
Lincs-Mat 249.084384 996.338 0.0
Constraint-V 101.492027 811.936 0.0
Constraint-Vir 3.779269 90.702 0.0
Settle 16.213628 5237.002 0.2
-----------------------------------------------------------------------------
Total 2274336.423 100.0
-----------------------------------------------------------------------------
D O M A I N D E C O M P O S I T I O N S T A T I S T I C S
av. #atoms communicated per step for force: 2 x 27913.4
av. #atoms communicated per step for LINCS: 2 x 2158.1
Average load imbalance: 0.5 %
Part of the total run time spent waiting due to load imbalance: 0.2 %
Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 0 %
R E A L C Y C L E A N D T I M E A C C O U N T I N G
On 4 MPI ranks, each using 6 OpenMP threads
Computing: Num Num Call Wall time Giga-Cycles
Ranks Threads Count (s) total sum %
-----------------------------------------------------------------------------
Domain decomp. 4 6 49 0.258 14.812 0.7
DD comm. load 4 6 48 0.001 0.085 0.0
DD comm. bounds 4 6 49 0.001 0.062 0.0
Neighbor search 4 6 49 0.385 22.119 1.1
Comm. coord. 4 6 1152 0.126 7.236 0.3
Force 4 6 1201 13.030 748.785 35.6
Wait + Comm. F 4 6 1201 0.109 6.240 0.3
PME mesh 4 6 1201 1.576 90.561 4.3
NB X/F buffer ops. 4 6 3505 0.084 4.818 0.2
COM pull force 4 6 1201 20.193 1160.368 55.2
Write traj. 4 6 2 0.008 0.448 0.0
Update 4 6 2402 0.249 14.334 0.7
Constraints 4 6 2402 0.470 27.004 1.3
Comm. energies 4 6 241 0.010 0.593 0.0
Rest 0.077 4.410 0.2
-----------------------------------------------------------------------------
Total 36.577 2101.875 100.0
-----------------------------------------------------------------------------
Breakdown of PME mesh computation
-----------------------------------------------------------------------------
PME redist. X/F 4 6 2402 0.332 19.072 0.9
PME spread/gather 4 6 2402 0.677 38.903 1.9
PME 3D-FFT 4 6 2402 0.370 21.254 1.0
PME 3D-FFT Comm. 4 6 2402 0.135 7.768 0.4
PME solve Elec 4 6 1201 0.058 3.339 0.2
-----------------------------------------------------------------------------
Core t (s) Wall t (s) (%)
Time: 835.374 36.577 2283.9
(ns/day) (hour/ns)
Performance: 5.674 4.230
Finished mdrun on rank 0 Tue May 17 12:56:30 2016
More information about the gromacs.org_gmx-users
mailing list