[gmx-developers] Gromacs on 48 core magny-cours AMDs
Carsten Kutzner
ckutzne at gwdg.de
Thu Sep 1 10:02:16 CEST 2011
On Sep 1, 2011, at 9:19 AM, Sander Pronk wrote:
>
> On 31 Aug 2011, at 22:10 , Igor Leontyev wrote:
>
>> Hi
>> I am benchmarking a 100K atom system (protein ~12K and solvent ~90K atoms, 1 fs time step, cutoffs 1.2 nm) on a 48-core 2.1 GHz AMD node. Software: Gromacs 4.5.4; compiled by gcc4.4.6; CentOS 5.6 kernel 2.6.18-238.19.1.el5. See the results of g_tune_pme bellow. The performance is absolutely unstable, the computation time for equivalent runs can differ by orders of magnitude.
>>
>> The issue seems to be similar to what has been discussed earlier
>> http://lists.gromacs.org/pipermail/gmx-users/2010-October/055113.html
>> Is there any progress in resolving it?
>
> That's an old kernel. If I remember correctly, that thread discussed issues related to thread&process affinity and NUMA-awareness on older kernels.
>
> Perhaps you could try a newer kernel?
Hi,
we are running a slightly older kernel and get nice performance on our 48-core magny-cours.
Maybe for mpich the processes are not pinning to the cores correctly.
Could you try the threaded version of mdrun? This is what gives the best (and reliable)
performance in our case.
Carsten
>
>
>>
>> Igor
>>
>>
>> ------------------------------------------------------------
>>
>> P E R F O R M A N C E R E S U L T S
>>
>> ------------------------------------------------------------
>> g_tune_pme for Gromacs VERSION 4.5.4
>> Number of nodes : 48
>> The mpirun command is : /home/leontyev/programs/bin/mpi/openmpi/openmpi-1.4.3/bin/mpirun --hostfile node_loading.txt
>> Passing # of nodes via : -np
>> The mdrun command is : /home/leontyev/programs/bin/gromacs/gromacs-4.5.4/bin/mdrun_mpich1.4.3
>> mdrun args benchmarks : -resetstep 100 -o bench.trr -x bench.xtc -cpo bench.cpt -c bench.gro -e bench.edr -g bench.log
>> Benchmark steps : 1000
>> dlb equilibration steps : 100
>> Repeats for each test : 10
>> Input file : cco_PM_ff03_sorin_scaled_meanpol.tpr
>> Coulomb type : PME
>> Grid spacing x y z : 0.114376 0.116700 0.116215
>> Van der Waals type : Cut-off
>>
>> Will try these real/reciprocal workload settings:
>> No. scaling rcoulomb nkx nky nkz spacing rvdw tpr file
>> 0 -input- 1.200000 72 80 112 0.116700 1.200000 cco_PM_ff03_sorin_scaled_meanpol_bench00.tpr
>>
>> Individual timings for input file 0 (cco_PM_ff03_sorin_scaled_meanpol_bench00.tpr):
>> PME nodes Gcycles ns/day PME/f Remark
>> 24 3185.840 2.734 0.538 OK.
>> 24 7237.416 1.203 1.119 OK.
>> 24 3225.448 2.700 0.546 OK.
>> 24 5844.942 1.489 1.012 OK.
>> 24 4013.986 2.169 0.552 OK.
>> 24 18578.174 0.469 0.842 OK.
>> 24 3234.702 2.692 0.559 OK.
>> 24 25818.267 0.337 0.815 OK.
>> 24 32470.278 0.268 0.479 OK.
>> 24 3234.806 2.692 0.561 OK.
>> 23 15097.577 0.577 0.824 OK.
>> 23 2948.211 2.954 0.705 OK.
>> 23 15640.485 0.557 0.826 OK.
>> 23 66961.240 0.130 3.215 OK.
>> 23 2964.927 2.938 0.698 OK.
>> 23 2965.896 2.937 0.669 OK.
>> 23 11205.121 0.774 0.668 OK.
>> 23 2964.737 2.938 0.672 OK.
>> 23 13384.753 0.649 0.665 OK.
>> 23 3738.425 2.329 0.738 OK.
>> 22 3130.744 2.782 0.682 OK.
>> 22 3981.770 2.187 0.659 OK.
>> 22 6397.259 1.350 0.666 OK.
>> 22 41374.579 0.211 3.509 OK.
>> 22 3193.327 2.728 0.683 OK.
>> 22 21405.007 0.407 0.871 OK.
>> 22 3543.511 2.457 0.686 OK.
>> 22 3539.981 2.460 0.701 OK.
>> 22 30946.123 0.281 1.235 OK.
>> 22 18031.023 0.483 0.729 OK.
>> 21 2978.520 2.924 0.699 OK.
>> 21 4487.921 1.940 0.666 OK.
>> 21 39796.932 0.219 1.085 OK.
>> 21 3027.659 2.877 0.714 OK.
>> 21 58613.050 0.149 1.089 OK.
>> 21 2973.281 2.929 0.698 OK.
>> 21 34991.505 0.249 0.702 OK.
>> 21 4479.034 1.944 0.696 OK.
>> 21 40401.894 0.216 1.310 OK.
>> 21 63325.943 0.138 1.124 OK.
>> 20 17100.304 0.510 0.620 OK.
>> 20 2859.158 3.047 0.832 OK.
>> 20 2660.459 3.274 0.820 OK.
>> 20 2871.060 3.034 0.821 OK.
>> 20 105947.063 0.082 0.728 OK.
>> 20 2851.650 3.055 0.827 OK.
>> 20 2766.737 3.149 0.837 OK.
>> 20 13887.535 0.627 0.813 OK.
>> 20 9450.158 0.919 0.854 OK.
>> 20 2983.460 2.920 0.838 OK.
>> 19 0.000 0.000 - No DD grid found for these settings.
>> 18 62490.241 0.139 1.070 OK.
>> 18 75625.947 0.115 0.512 OK.
>> 18 3584.509 2.430 1.176 OK.
>> 18 4988.745 1.734 1.197 OK.
>> 18 92981.804 0.094 0.529 OK.
>> 18 3070.496 2.837 1.192 OK.
>> 18 3089.339 2.820 1.204 OK.
>> 18 5880.675 1.465 1.170 OK.
>> 18 3094.133 2.816 1.214 OK.
>> 18 3573.552 2.437 1.191 OK.
>> 17 0.000 0.000 - No DD grid found for these settings.
>> 16 3105.597 2.805 0.998 OK.
>> 16 2719.826 3.203 1.045 OK.
>> 16 3124.013 2.788 0.992 OK.
>> 16 2708.751 3.216 1.030 OK.
>> 16 3116.887 2.795 1.023 OK.
>> 16 2695.859 3.232 1.038 OK.
>> 16 2710.272 3.215 1.033 OK.
>> 16 32639.259 0.267 0.514 OK.
>> 16 56748.577 0.153 0.959 OK.
>> 16 32362.192 0.269 1.816 OK.
>> 15 40410.983 0.216 1.241 OK.
>> 15 3727.108 2.337 1.262 OK.
>> 15 3297.944 2.642 1.242 OK.
>> 15 23012.201 0.379 0.994 OK.
>> 15 3328.307 2.618 1.248 OK.
>> 15 56869.719 0.153 0.568 OK.
>> 15 26662.044 0.327 0.854 OK.
>> 15 44026.837 0.198 1.198 OK.
>> 15 3754.812 2.320 1.238 OK.
>> 15 68683.967 0.127 0.844 OK.
>> 14 2934.532 2.969 1.466 OK.
>> 14 2824.434 3.085 1.430 OK.
>> 14 2778.103 3.137 1.391 OK.
>> 14 28435.548 0.306 0.957 OK.
>> 14 2876.113 3.030 1.396 OK.
>> 14 2803.951 3.108 1.438 OK.
>> 14 9538.366 0.913 1.400 OK.
>> 14 2887.242 3.018 1.424 OK.
>> 14 32542.115 0.268 0.529 OK.
>> 14 14256.539 0.609 1.432 OK.
>> 13 5010.011 1.732 1.768 OK.
>> 13 19270.893 0.452 1.481 OK.
>> 13 3451.426 2.525 1.860 OK.
>> 13 28566.186 0.305 0.620 OK.
>> 13 3481.006 2.504 1.833 OK.
>> 13 28457.876 0.306 0.933 OK.
>> 13 3689.128 2.362 1.795 OK.
>> 13 3451.925 2.525 1.831 OK.
>> 13 34918.063 0.249 1.838 OK.
>> 13 3473.566 2.509 1.854 OK.
>> 12 42705.256 0.204 1.039 OK.
>> 12 4934.453 1.763 1.292 OK.
>> 12 16759.163 0.520 1.288 OK.
>> 12 27660.618 0.315 0.855 OK.
>> 12 6293.874 1.380 1.263 OK.
>> 12 40502.818 0.215 1.284 OK.
>> 12 31595.114 0.276 0.615 OK.
>> 12 61936.825 0.140 0.612 OK.
>> 12 3013.850 2.891 1.345 OK.
>> 12 3840.023 2.269 1.310 OK.
>> 0 2628.156 3.317 - OK.
>> 0 2573.649 3.387 - OK.
>> 0 95523.769 0.091 - OK.
>> 0 2594.895 3.360 - OK.
>> 0 2614.131 3.335 - OK.
>> 0 2610.647 3.339 - OK.
>> 0 2560.067 3.405 - OK.
>> 0 2609.485 3.341 - OK.
>> 0 2603.154 3.349 - OK.
>> 0 2583.289 3.375 - OK.
>> -1( 16) 2672.797 3.260 1.002 OK.
>> -1( 16) 57769.149 0.151 1.723 OK.
>> -1( 16) 48598.334 0.179 1.138 OK.
>> -1( 16) 2699.333 3.228 1.040 OK.
>> -1( 16) 54243.321 0.161 1.679 OK.
>> -1( 16) 2719.854 3.203 1.051 OK.
>> -1( 16) 2716.365 3.207 1.051 OK.
>> -1( 16) 24278.608 0.359 0.835 OK.
>> -1( 16) 19357.359 0.449 1.006 OK.
>> -1( 16) 45500.360 0.191 0.795 OK.
>>
>> Tuning took 500.5 minutes.
>>
>> ------------------------------------------------------------
>> Summary of successful runs:
>> Line tpr PME nodes Gcycles Av. Std.dev. ns/day PME/f DD grid
>> 0 0 24 10684.386 10896.612 1.675 0.702 3 4 2
>> 1 0 23 13787.137 19462.982 1.678 0.968 1 5 5
>> 2 0 22 13554.332 13814.153 1.535 1.042 2 13 1
>> 3 0 21 25507.574 24601.033 1.358 0.878 3 3 3
>> 4 0 20 16337.758 31934.533 2.062 0.799 2 2 7
>> 5 0 18 25837.944 36067.176 1.689 1.045 3 2 5
>> 6 0 16 14193.123 19370.807 2.194 1.045 4 4 2
>> 7 0 15 27377.392 24308.700 1.132 1.069 3 11 1
>> 8 0 14 10187.694 11414.829 2.044 1.286 1 2 17
>> 9 0 13 13377.008 12969.168 1.547 1.581 1 5 7
>> 10 0 12 23924.199 20299.796 0.997 1.090 3 4 3
>> 11 0 0 11890.124 29385.874 3.030 - 6 4 2
>> 12 0 -1( 16) 26055.548 23371.735 1.439 1.132 4 4 2
>> --
>> gmx-developers mailing list
>> gmx-developers at gromacs.org
>> http://lists.gromacs.org/mailman/listinfo/gmx-developers
>> Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-developers-request at gromacs.org.
>
> --
> gmx-developers mailing list
> gmx-developers at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-developers
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-developers-request at gromacs.org.
More information about the gromacs.org_gmx-developers
mailing list