[gmx-developers] Gromacs on 48 core magny-cours AMDs
Alexey Shvetsov
alexxy at omrb.pnpi.spb.ru
Thu Sep 1 14:52:42 CEST 2011
Hello!
Well there may be problems
1. Old kernel that works incorrectly with large numa
2. No correct process binding to core
3. Configuration of gcc/math libs
What is your mpi version and versions of fftw and blas libs if you use
external ones.
Also please post your cflags.
Here we have good performance on such nodes running SLES with 2.6.32
kernel (with gentoo-prefix on top of it with openmpi and ofed stack)
and with Gentoo (kenrel 3.0.4) with many system optimiztions made by me
=)
All results are stable. Gentoo works better here becuse it doesnt has
bug with irq in kernel + some optimizations.
On Wed, 31 Aug 2011 13:10:52 -0700, Igor Leontyev wrote:
> Hi
> I am benchmarking a 100K atom system (protein ~12K and solvent ~90K
> atoms, 1 fs time step, cutoffs 1.2 nm) on a 48-core 2.1 GHz AMD node.
> Software: Gromacs 4.5.4; compiled by gcc4.4.6; CentOS 5.6 kernel
> 2.6.18-238.19.1.el5. See the results of g_tune_pme bellow. The
> performance is absolutely unstable, the computation time for
> equivalent runs can differ by orders of magnitude.
>
> The issue seems to be similar to what has been discussed earlier
> http://lists.gromacs.org/pipermail/gmx-users/2010-October/055113.html
> Is there any progress in resolving it?
>
> Igor
>
>
> ------------------------------------------------------------
>
> P E R F O R M A N C E R E S U L T S
>
> ------------------------------------------------------------
> g_tune_pme for Gromacs VERSION 4.5.4
> Number of nodes : 48
> The mpirun command is :
> /home/leontyev/programs/bin/mpi/openmpi/openmpi-1.4.3/bin/mpirun
> --hostfile node_loading.txt
> Passing # of nodes via : -np
> The mdrun command is :
>
> /home/leontyev/programs/bin/gromacs/gromacs-4.5.4/bin/mdrun_mpich1.4.3
> mdrun args benchmarks : -resetstep 100 -o bench.trr -x bench.xtc
> -cpo bench.cpt -c bench.gro -e bench.edr -g bench.log
> Benchmark steps : 1000
> dlb equilibration steps : 100
> Repeats for each test : 10
> Input file : cco_PM_ff03_sorin_scaled_meanpol.tpr
> Coulomb type : PME
> Grid spacing x y z : 0.114376 0.116700 0.116215
> Van der Waals type : Cut-off
>
> Will try these real/reciprocal workload settings:
> No. scaling rcoulomb nkx nky nkz spacing rvdw tpr file
> 0 -input- 1.200000 72 80 112 0.116700 1.200000
> cco_PM_ff03_sorin_scaled_meanpol_bench00.tpr
>
> Individual timings for input file 0
> (cco_PM_ff03_sorin_scaled_meanpol_bench00.tpr):
> PME nodes Gcycles ns/day PME/f Remark
> 24 3185.840 2.734 0.538 OK.
> 24 7237.416 1.203 1.119 OK.
> 24 3225.448 2.700 0.546 OK.
> 24 5844.942 1.489 1.012 OK.
> 24 4013.986 2.169 0.552 OK.
> 24 18578.174 0.469 0.842 OK.
> 24 3234.702 2.692 0.559 OK.
> 24 25818.267 0.337 0.815 OK.
> 24 32470.278 0.268 0.479 OK.
> 24 3234.806 2.692 0.561 OK.
> 23 15097.577 0.577 0.824 OK.
> 23 2948.211 2.954 0.705 OK.
> 23 15640.485 0.557 0.826 OK.
> 23 66961.240 0.130 3.215 OK.
> 23 2964.927 2.938 0.698 OK.
> 23 2965.896 2.937 0.669 OK.
> 23 11205.121 0.774 0.668 OK.
> 23 2964.737 2.938 0.672 OK.
> 23 13384.753 0.649 0.665 OK.
> 23 3738.425 2.329 0.738 OK.
> 22 3130.744 2.782 0.682 OK.
> 22 3981.770 2.187 0.659 OK.
> 22 6397.259 1.350 0.666 OK.
> 22 41374.579 0.211 3.509 OK.
> 22 3193.327 2.728 0.683 OK.
> 22 21405.007 0.407 0.871 OK.
> 22 3543.511 2.457 0.686 OK.
> 22 3539.981 2.460 0.701 OK.
> 22 30946.123 0.281 1.235 OK.
> 22 18031.023 0.483 0.729 OK.
> 21 2978.520 2.924 0.699 OK.
> 21 4487.921 1.940 0.666 OK.
> 21 39796.932 0.219 1.085 OK.
> 21 3027.659 2.877 0.714 OK.
> 21 58613.050 0.149 1.089 OK.
> 21 2973.281 2.929 0.698 OK.
> 21 34991.505 0.249 0.702 OK.
> 21 4479.034 1.944 0.696 OK.
> 21 40401.894 0.216 1.310 OK.
> 21 63325.943 0.138 1.124 OK.
> 20 17100.304 0.510 0.620 OK.
> 20 2859.158 3.047 0.832 OK.
> 20 2660.459 3.274 0.820 OK.
> 20 2871.060 3.034 0.821 OK.
> 20 105947.063 0.082 0.728 OK.
> 20 2851.650 3.055 0.827 OK.
> 20 2766.737 3.149 0.837 OK.
> 20 13887.535 0.627 0.813 OK.
> 20 9450.158 0.919 0.854 OK.
> 20 2983.460 2.920 0.838 OK.
> 19 0.000 0.000 - No DD grid found
> for these settings.
> 18 62490.241 0.139 1.070 OK.
> 18 75625.947 0.115 0.512 OK.
> 18 3584.509 2.430 1.176 OK.
> 18 4988.745 1.734 1.197 OK.
> 18 92981.804 0.094 0.529 OK.
> 18 3070.496 2.837 1.192 OK.
> 18 3089.339 2.820 1.204 OK.
> 18 5880.675 1.465 1.170 OK.
> 18 3094.133 2.816 1.214 OK.
> 18 3573.552 2.437 1.191 OK.
> 17 0.000 0.000 - No DD grid found
> for these settings.
> 16 3105.597 2.805 0.998 OK.
> 16 2719.826 3.203 1.045 OK.
> 16 3124.013 2.788 0.992 OK.
> 16 2708.751 3.216 1.030 OK.
> 16 3116.887 2.795 1.023 OK.
> 16 2695.859 3.232 1.038 OK.
> 16 2710.272 3.215 1.033 OK.
> 16 32639.259 0.267 0.514 OK.
> 16 56748.577 0.153 0.959 OK.
> 16 32362.192 0.269 1.816 OK.
> 15 40410.983 0.216 1.241 OK.
> 15 3727.108 2.337 1.262 OK.
> 15 3297.944 2.642 1.242 OK.
> 15 23012.201 0.379 0.994 OK.
> 15 3328.307 2.618 1.248 OK.
> 15 56869.719 0.153 0.568 OK.
> 15 26662.044 0.327 0.854 OK.
> 15 44026.837 0.198 1.198 OK.
> 15 3754.812 2.320 1.238 OK.
> 15 68683.967 0.127 0.844 OK.
> 14 2934.532 2.969 1.466 OK.
> 14 2824.434 3.085 1.430 OK.
> 14 2778.103 3.137 1.391 OK.
> 14 28435.548 0.306 0.957 OK.
> 14 2876.113 3.030 1.396 OK.
> 14 2803.951 3.108 1.438 OK.
> 14 9538.366 0.913 1.400 OK.
> 14 2887.242 3.018 1.424 OK.
> 14 32542.115 0.268 0.529 OK.
> 14 14256.539 0.609 1.432 OK.
> 13 5010.011 1.732 1.768 OK.
> 13 19270.893 0.452 1.481 OK.
> 13 3451.426 2.525 1.860 OK.
> 13 28566.186 0.305 0.620 OK.
> 13 3481.006 2.504 1.833 OK.
> 13 28457.876 0.306 0.933 OK.
> 13 3689.128 2.362 1.795 OK.
> 13 3451.925 2.525 1.831 OK.
> 13 34918.063 0.249 1.838 OK.
> 13 3473.566 2.509 1.854 OK.
> 12 42705.256 0.204 1.039 OK.
> 12 4934.453 1.763 1.292 OK.
> 12 16759.163 0.520 1.288 OK.
> 12 27660.618 0.315 0.855 OK.
> 12 6293.874 1.380 1.263 OK.
> 12 40502.818 0.215 1.284 OK.
> 12 31595.114 0.276 0.615 OK.
> 12 61936.825 0.140 0.612 OK.
> 12 3013.850 2.891 1.345 OK.
> 12 3840.023 2.269 1.310 OK.
> 0 2628.156 3.317 - OK.
> 0 2573.649 3.387 - OK.
> 0 95523.769 0.091 - OK.
> 0 2594.895 3.360 - OK.
> 0 2614.131 3.335 - OK.
> 0 2610.647 3.339 - OK.
> 0 2560.067 3.405 - OK.
> 0 2609.485 3.341 - OK.
> 0 2603.154 3.349 - OK.
> 0 2583.289 3.375 - OK.
> -1( 16) 2672.797 3.260 1.002 OK.
> -1( 16) 57769.149 0.151 1.723 OK.
> -1( 16) 48598.334 0.179 1.138 OK.
> -1( 16) 2699.333 3.228 1.040 OK.
> -1( 16) 54243.321 0.161 1.679 OK.
> -1( 16) 2719.854 3.203 1.051 OK.
> -1( 16) 2716.365 3.207 1.051 OK.
> -1( 16) 24278.608 0.359 0.835 OK.
> -1( 16) 19357.359 0.449 1.006 OK.
> -1( 16) 45500.360 0.191 0.795 OK.
>
> Tuning took 500.5 minutes.
>
> ------------------------------------------------------------
> Summary of successful runs:
> Line tpr PME nodes Gcycles Av. Std.dev. ns/day
> PME/f DD grid
> 0 0 24 10684.386 10896.612 1.675 0.702
> 3 4 2
> 1 0 23 13787.137 19462.982 1.678 0.968
> 1 5 5
> 2 0 22 13554.332 13814.153 1.535 1.042
> 2 13 1
> 3 0 21 25507.574 24601.033 1.358 0.878
> 3 3 3
> 4 0 20 16337.758 31934.533 2.062 0.799
> 2 2 7
> 5 0 18 25837.944 36067.176 1.689 1.045
> 3 2 5
> 6 0 16 14193.123 19370.807 2.194 1.045
> 4 4 2
> 7 0 15 27377.392 24308.700 1.132 1.069
> 3 11 1
> 8 0 14 10187.694 11414.829 2.044 1.286
> 1 2 17
> 9 0 13 13377.008 12969.168 1.547 1.581
> 1 5 7
> 10 0 12 23924.199 20299.796 0.997 1.090
> 3 4 3
> 11 0 0 11890.124 29385.874 3.030 -
> 6 4 2
> 12 0 -1( 16) 26055.548 23371.735 1.439 1.132
> 4 4 2
--
Best Regards,
Alexey 'Alexxy' Shvetsov
Petersburg Nuclear Physics Institute, Russia
Department of Molecular and Radiation Biophysics
Gentoo Team Ru
Gentoo Linux Dev
mailto:alexxyum at gmail.com
mailto:alexxy at gentoo.org
mailto:alexxy at omrb.pnpi.spb.ru
More information about the gromacs.org_gmx-developers
mailing list