[gmx-developers] Gromacs on 48 core magny-cours AMDs

Alexey Shvetsov alexxy at omrb.pnpi.spb.ru
Thu Sep 1 14:52:42 CEST 2011


Hello!

Well there may be problems
1. Old kernel that works incorrectly with large numa
2. No correct process  binding to core
3. Configuration of gcc/math libs

What is your mpi version and versions of fftw and blas libs if you use 
external ones.
Also please post your cflags.

Here we have good performance on such nodes running SLES with 2.6.32 
kernel (with gentoo-prefix on top of it with openmpi and ofed stack)
and with Gentoo (kenrel 3.0.4) with many system optimiztions made by me 
=)

All results are stable. Gentoo works better here becuse it doesnt has 
bug with irq in kernel + some optimizations.

On Wed, 31 Aug 2011 13:10:52 -0700, Igor Leontyev wrote:
> Hi
> I am benchmarking a 100K atom system (protein ~12K and solvent ~90K
> atoms, 1 fs time step, cutoffs 1.2 nm) on a 48-core 2.1 GHz AMD node.
> Software: Gromacs 4.5.4; compiled by gcc4.4.6; CentOS 5.6 kernel
> 2.6.18-238.19.1.el5. See the results of g_tune_pme bellow. The
> performance is absolutely unstable, the computation time for
> equivalent runs can differ by orders of magnitude.
>
> The issue seems to be similar to what has been discussed earlier
> http://lists.gromacs.org/pipermail/gmx-users/2010-October/055113.html
> Is there any progress in resolving it?
>
> Igor
>
>
> ------------------------------------------------------------
>
>      P E R F O R M A N C E   R E S U L T S
>
> ------------------------------------------------------------
> g_tune_pme for Gromacs VERSION 4.5.4
> Number of nodes         : 48
> The mpirun command is   :
> /home/leontyev/programs/bin/mpi/openmpi/openmpi-1.4.3/bin/mpirun
> --hostfile node_loading.txt
> Passing # of nodes via  : -np
> The mdrun  command is   :
> 
> /home/leontyev/programs/bin/gromacs/gromacs-4.5.4/bin/mdrun_mpich1.4.3
> mdrun args benchmarks   : -resetstep 100 -o bench.trr -x bench.xtc
> -cpo bench.cpt -c bench.gro -e bench.edr -g bench.log
> Benchmark steps         : 1000
> dlb equilibration steps : 100
> Repeats for each test   : 10
> Input file              : cco_PM_ff03_sorin_scaled_meanpol.tpr
>   Coulomb type         : PME
>   Grid spacing x y z   : 0.114376 0.116700 0.116215
>   Van der Waals type   : Cut-off
>
> Will try these real/reciprocal workload settings:
> No.   scaling  rcoulomb  nkx  nky  nkz   spacing      rvdw  tpr file
>   0   -input-  1.200000   72   80  112  0.116700   1.200000
> cco_PM_ff03_sorin_scaled_meanpol_bench00.tpr
>
> Individual timings for input file 0
> (cco_PM_ff03_sorin_scaled_meanpol_bench00.tpr):
> PME nodes      Gcycles       ns/day        PME/f    Remark
>  24          3185.840        2.734        0.538    OK.
>  24          7237.416        1.203        1.119    OK.
>  24          3225.448        2.700        0.546    OK.
>  24          5844.942        1.489        1.012    OK.
>  24          4013.986        2.169        0.552    OK.
>  24         18578.174        0.469        0.842    OK.
>  24          3234.702        2.692        0.559    OK.
>  24         25818.267        0.337        0.815    OK.
>  24         32470.278        0.268        0.479    OK.
>  24          3234.806        2.692        0.561    OK.
>  23         15097.577        0.577        0.824    OK.
>  23          2948.211        2.954        0.705    OK.
>  23         15640.485        0.557        0.826    OK.
>  23         66961.240        0.130        3.215    OK.
>  23          2964.927        2.938        0.698    OK.
>  23          2965.896        2.937        0.669    OK.
>  23         11205.121        0.774        0.668    OK.
>  23          2964.737        2.938        0.672    OK.
>  23         13384.753        0.649        0.665    OK.
>  23          3738.425        2.329        0.738    OK.
>  22          3130.744        2.782        0.682    OK.
>  22          3981.770        2.187        0.659    OK.
>  22          6397.259        1.350        0.666    OK.
>  22         41374.579        0.211        3.509    OK.
>  22          3193.327        2.728        0.683    OK.
>  22         21405.007        0.407        0.871    OK.
>  22          3543.511        2.457        0.686    OK.
>  22          3539.981        2.460        0.701    OK.
>  22         30946.123        0.281        1.235    OK.
>  22         18031.023        0.483        0.729    OK.
>  21          2978.520        2.924        0.699    OK.
>  21          4487.921        1.940        0.666    OK.
>  21         39796.932        0.219        1.085    OK.
>  21          3027.659        2.877        0.714    OK.
>  21         58613.050        0.149        1.089    OK.
>  21          2973.281        2.929        0.698    OK.
>  21         34991.505        0.249        0.702    OK.
>  21          4479.034        1.944        0.696    OK.
>  21         40401.894        0.216        1.310    OK.
>  21         63325.943        0.138        1.124    OK.
>  20         17100.304        0.510        0.620    OK.
>  20          2859.158        3.047        0.832    OK.
>  20          2660.459        3.274        0.820    OK.
>  20          2871.060        3.034        0.821    OK.
>  20        105947.063        0.082        0.728    OK.
>  20          2851.650        3.055        0.827    OK.
>  20          2766.737        3.149        0.837    OK.
>  20         13887.535        0.627        0.813    OK.
>  20          9450.158        0.919        0.854    OK.
>  20          2983.460        2.920        0.838    OK.
>  19             0.000        0.000          -      No DD grid found
> for these settings.
>  18         62490.241        0.139        1.070    OK.
>  18         75625.947        0.115        0.512    OK.
>  18          3584.509        2.430        1.176    OK.
>  18          4988.745        1.734        1.197    OK.
>  18         92981.804        0.094        0.529    OK.
>  18          3070.496        2.837        1.192    OK.
>  18          3089.339        2.820        1.204    OK.
>  18          5880.675        1.465        1.170    OK.
>  18          3094.133        2.816        1.214    OK.
>  18          3573.552        2.437        1.191    OK.
>  17             0.000        0.000          -      No DD grid found
> for these settings.
>  16          3105.597        2.805        0.998    OK.
>  16          2719.826        3.203        1.045    OK.
>  16          3124.013        2.788        0.992    OK.
>  16          2708.751        3.216        1.030    OK.
>  16          3116.887        2.795        1.023    OK.
>  16          2695.859        3.232        1.038    OK.
>  16          2710.272        3.215        1.033    OK.
>  16         32639.259        0.267        0.514    OK.
>  16         56748.577        0.153        0.959    OK.
>  16         32362.192        0.269        1.816    OK.
>  15         40410.983        0.216        1.241    OK.
>  15          3727.108        2.337        1.262    OK.
>  15          3297.944        2.642        1.242    OK.
>  15         23012.201        0.379        0.994    OK.
>  15          3328.307        2.618        1.248    OK.
>  15         56869.719        0.153        0.568    OK.
>  15         26662.044        0.327        0.854    OK.
>  15         44026.837        0.198        1.198    OK.
>  15          3754.812        2.320        1.238    OK.
>  15         68683.967        0.127        0.844    OK.
>  14          2934.532        2.969        1.466    OK.
>  14          2824.434        3.085        1.430    OK.
>  14          2778.103        3.137        1.391    OK.
>  14         28435.548        0.306        0.957    OK.
>  14          2876.113        3.030        1.396    OK.
>  14          2803.951        3.108        1.438    OK.
>  14          9538.366        0.913        1.400    OK.
>  14          2887.242        3.018        1.424    OK.
>  14         32542.115        0.268        0.529    OK.
>  14         14256.539        0.609        1.432    OK.
>  13          5010.011        1.732        1.768    OK.
>  13         19270.893        0.452        1.481    OK.
>  13          3451.426        2.525        1.860    OK.
>  13         28566.186        0.305        0.620    OK.
>  13          3481.006        2.504        1.833    OK.
>  13         28457.876        0.306        0.933    OK.
>  13          3689.128        2.362        1.795    OK.
>  13          3451.925        2.525        1.831    OK.
>  13         34918.063        0.249        1.838    OK.
>  13          3473.566        2.509        1.854    OK.
>  12         42705.256        0.204        1.039    OK.
>  12          4934.453        1.763        1.292    OK.
>  12         16759.163        0.520        1.288    OK.
>  12         27660.618        0.315        0.855    OK.
>  12          6293.874        1.380        1.263    OK.
>  12         40502.818        0.215        1.284    OK.
>  12         31595.114        0.276        0.615    OK.
>  12         61936.825        0.140        0.612    OK.
>  12          3013.850        2.891        1.345    OK.
>  12          3840.023        2.269        1.310    OK.
>   0          2628.156        3.317          -      OK.
>   0          2573.649        3.387          -      OK.
>   0         95523.769        0.091          -      OK.
>   0          2594.895        3.360          -      OK.
>   0          2614.131        3.335          -      OK.
>   0          2610.647        3.339          -      OK.
>   0          2560.067        3.405          -      OK.
>   0          2609.485        3.341          -      OK.
>   0          2603.154        3.349          -      OK.
>   0          2583.289        3.375          -      OK.
>  -1( 16)     2672.797        3.260        1.002    OK.
>  -1( 16)    57769.149        0.151        1.723    OK.
>  -1( 16)    48598.334        0.179        1.138    OK.
>  -1( 16)     2699.333        3.228        1.040    OK.
>  -1( 16)    54243.321        0.161        1.679    OK.
>  -1( 16)     2719.854        3.203        1.051    OK.
>  -1( 16)     2716.365        3.207        1.051    OK.
>  -1( 16)    24278.608        0.359        0.835    OK.
>  -1( 16)    19357.359        0.449        1.006    OK.
>  -1( 16)    45500.360        0.191        0.795    OK.
>
> Tuning took   500.5 minutes.
>
> ------------------------------------------------------------
> Summary of successful runs:
> Line tpr PME nodes  Gcycles Av.     Std.dev.       ns/day
> PME/f    DD grid
>   0   0   24         10684.386    10896.612        1.675        0.702
>   3 4   2
>   1   0   23         13787.137    19462.982        1.678        0.968
>   1 5   5
>   2   0   22         13554.332    13814.153        1.535        1.042
>   2 13   1
>   3   0   21         25507.574    24601.033        1.358        0.878
>   3 3   3
>   4   0   20         16337.758    31934.533        2.062        0.799
>   2 2   7
>   5   0   18         25837.944    36067.176        1.689        1.045
>   3 2   5
>   6   0   16         14193.123    19370.807        2.194        1.045
>   4 4   2
>   7   0   15         27377.392    24308.700        1.132        1.069
>   3 11   1
>   8   0   14         10187.694    11414.829        2.044        1.286
>   1 2  17
>   9   0   13         13377.008    12969.168        1.547        1.581
>   1 5   7
>  10   0   12         23924.199    20299.796        0.997        1.090
>   3 4   3
>  11   0    0         11890.124    29385.874        3.030          -
>   6 4   2
>  12   0   -1( 16)    26055.548    23371.735        1.439        1.132
>   4 4   2

-- 
Best Regards,
Alexey 'Alexxy' Shvetsov
Petersburg Nuclear Physics Institute, Russia
Department of Molecular and Radiation Biophysics
Gentoo Team Ru
Gentoo Linux Dev
mailto:alexxyum at gmail.com
mailto:alexxy at gentoo.org
mailto:alexxy at omrb.pnpi.spb.ru



More information about the gromacs.org_gmx-developers mailing list