[gmx-developers] Gromacs on 48 core magny-cours AMDs

Carsten Kutzner ckutzne at gwdg.de
Thu Sep 1 10:02:16 CEST 2011


On Sep 1, 2011, at 9:19 AM, Sander Pronk wrote:

> 
> On 31 Aug 2011, at 22:10 , Igor Leontyev wrote:
> 
>> Hi
>> I am benchmarking a 100K atom system (protein ~12K and solvent ~90K atoms, 1 fs time step, cutoffs 1.2 nm) on a 48-core 2.1 GHz AMD node. Software: Gromacs 4.5.4; compiled by gcc4.4.6; CentOS 5.6 kernel 2.6.18-238.19.1.el5. See the results of g_tune_pme bellow. The performance is absolutely unstable, the computation time for equivalent runs can differ by orders of magnitude.
>> 
>> The issue seems to be similar to what has been discussed earlier
>> http://lists.gromacs.org/pipermail/gmx-users/2010-October/055113.html
>> Is there any progress in resolving it?
> 
> That's an old kernel. If I remember correctly, that thread discussed issues related to thread&process affinity and NUMA-awareness on older kernels. 
> 
> Perhaps you could try a newer kernel?

Hi,

we are running a slightly older kernel and get nice performance on our 48-core magny-cours.
Maybe for mpich the processes are not pinning to the cores correctly.

Could you try the threaded version of mdrun? This is what gives the best (and reliable) 
performance in our case.

Carsten


> 
> 
>> 
>> Igor
>> 
>> 
>> ------------------------------------------------------------
>> 
>>    P E R F O R M A N C E   R E S U L T S
>> 
>> ------------------------------------------------------------
>> g_tune_pme for Gromacs VERSION 4.5.4
>> Number of nodes         : 48
>> The mpirun command is   : /home/leontyev/programs/bin/mpi/openmpi/openmpi-1.4.3/bin/mpirun --hostfile node_loading.txt
>> Passing # of nodes via  : -np
>> The mdrun  command is   : /home/leontyev/programs/bin/gromacs/gromacs-4.5.4/bin/mdrun_mpich1.4.3
>> mdrun args benchmarks   : -resetstep 100 -o bench.trr -x bench.xtc -cpo bench.cpt -c bench.gro -e bench.edr -g bench.log
>> Benchmark steps         : 1000
>> dlb equilibration steps : 100
>> Repeats for each test   : 10
>> Input file              : cco_PM_ff03_sorin_scaled_meanpol.tpr
>> Coulomb type         : PME
>> Grid spacing x y z   : 0.114376 0.116700 0.116215
>> Van der Waals type   : Cut-off
>> 
>> Will try these real/reciprocal workload settings:
>> No.   scaling  rcoulomb  nkx  nky  nkz   spacing      rvdw  tpr file
>> 0   -input-  1.200000   72   80  112  0.116700   1.200000 cco_PM_ff03_sorin_scaled_meanpol_bench00.tpr
>> 
>> Individual timings for input file 0 (cco_PM_ff03_sorin_scaled_meanpol_bench00.tpr):
>> PME nodes      Gcycles       ns/day        PME/f    Remark
>> 24          3185.840        2.734        0.538    OK.
>> 24          7237.416        1.203        1.119    OK.
>> 24          3225.448        2.700        0.546    OK.
>> 24          5844.942        1.489        1.012    OK.
>> 24          4013.986        2.169        0.552    OK.
>> 24         18578.174        0.469        0.842    OK.
>> 24          3234.702        2.692        0.559    OK.
>> 24         25818.267        0.337        0.815    OK.
>> 24         32470.278        0.268        0.479    OK.
>> 24          3234.806        2.692        0.561    OK.
>> 23         15097.577        0.577        0.824    OK.
>> 23          2948.211        2.954        0.705    OK.
>> 23         15640.485        0.557        0.826    OK.
>> 23         66961.240        0.130        3.215    OK.
>> 23          2964.927        2.938        0.698    OK.
>> 23          2965.896        2.937        0.669    OK.
>> 23         11205.121        0.774        0.668    OK.
>> 23          2964.737        2.938        0.672    OK.
>> 23         13384.753        0.649        0.665    OK.
>> 23          3738.425        2.329        0.738    OK.
>> 22          3130.744        2.782        0.682    OK.
>> 22          3981.770        2.187        0.659    OK.
>> 22          6397.259        1.350        0.666    OK.
>> 22         41374.579        0.211        3.509    OK.
>> 22          3193.327        2.728        0.683    OK.
>> 22         21405.007        0.407        0.871    OK.
>> 22          3543.511        2.457        0.686    OK.
>> 22          3539.981        2.460        0.701    OK.
>> 22         30946.123        0.281        1.235    OK.
>> 22         18031.023        0.483        0.729    OK.
>> 21          2978.520        2.924        0.699    OK.
>> 21          4487.921        1.940        0.666    OK.
>> 21         39796.932        0.219        1.085    OK.
>> 21          3027.659        2.877        0.714    OK.
>> 21         58613.050        0.149        1.089    OK.
>> 21          2973.281        2.929        0.698    OK.
>> 21         34991.505        0.249        0.702    OK.
>> 21          4479.034        1.944        0.696    OK.
>> 21         40401.894        0.216        1.310    OK.
>> 21         63325.943        0.138        1.124    OK.
>> 20         17100.304        0.510        0.620    OK.
>> 20          2859.158        3.047        0.832    OK.
>> 20          2660.459        3.274        0.820    OK.
>> 20          2871.060        3.034        0.821    OK.
>> 20        105947.063        0.082        0.728    OK.
>> 20          2851.650        3.055        0.827    OK.
>> 20          2766.737        3.149        0.837    OK.
>> 20         13887.535        0.627        0.813    OK.
>> 20          9450.158        0.919        0.854    OK.
>> 20          2983.460        2.920        0.838    OK.
>> 19             0.000        0.000          -      No DD grid found for these settings.
>> 18         62490.241        0.139        1.070    OK.
>> 18         75625.947        0.115        0.512    OK.
>> 18          3584.509        2.430        1.176    OK.
>> 18          4988.745        1.734        1.197    OK.
>> 18         92981.804        0.094        0.529    OK.
>> 18          3070.496        2.837        1.192    OK.
>> 18          3089.339        2.820        1.204    OK.
>> 18          5880.675        1.465        1.170    OK.
>> 18          3094.133        2.816        1.214    OK.
>> 18          3573.552        2.437        1.191    OK.
>> 17             0.000        0.000          -      No DD grid found for these settings.
>> 16          3105.597        2.805        0.998    OK.
>> 16          2719.826        3.203        1.045    OK.
>> 16          3124.013        2.788        0.992    OK.
>> 16          2708.751        3.216        1.030    OK.
>> 16          3116.887        2.795        1.023    OK.
>> 16          2695.859        3.232        1.038    OK.
>> 16          2710.272        3.215        1.033    OK.
>> 16         32639.259        0.267        0.514    OK.
>> 16         56748.577        0.153        0.959    OK.
>> 16         32362.192        0.269        1.816    OK.
>> 15         40410.983        0.216        1.241    OK.
>> 15          3727.108        2.337        1.262    OK.
>> 15          3297.944        2.642        1.242    OK.
>> 15         23012.201        0.379        0.994    OK.
>> 15          3328.307        2.618        1.248    OK.
>> 15         56869.719        0.153        0.568    OK.
>> 15         26662.044        0.327        0.854    OK.
>> 15         44026.837        0.198        1.198    OK.
>> 15          3754.812        2.320        1.238    OK.
>> 15         68683.967        0.127        0.844    OK.
>> 14          2934.532        2.969        1.466    OK.
>> 14          2824.434        3.085        1.430    OK.
>> 14          2778.103        3.137        1.391    OK.
>> 14         28435.548        0.306        0.957    OK.
>> 14          2876.113        3.030        1.396    OK.
>> 14          2803.951        3.108        1.438    OK.
>> 14          9538.366        0.913        1.400    OK.
>> 14          2887.242        3.018        1.424    OK.
>> 14         32542.115        0.268        0.529    OK.
>> 14         14256.539        0.609        1.432    OK.
>> 13          5010.011        1.732        1.768    OK.
>> 13         19270.893        0.452        1.481    OK.
>> 13          3451.426        2.525        1.860    OK.
>> 13         28566.186        0.305        0.620    OK.
>> 13          3481.006        2.504        1.833    OK.
>> 13         28457.876        0.306        0.933    OK.
>> 13          3689.128        2.362        1.795    OK.
>> 13          3451.925        2.525        1.831    OK.
>> 13         34918.063        0.249        1.838    OK.
>> 13          3473.566        2.509        1.854    OK.
>> 12         42705.256        0.204        1.039    OK.
>> 12          4934.453        1.763        1.292    OK.
>> 12         16759.163        0.520        1.288    OK.
>> 12         27660.618        0.315        0.855    OK.
>> 12          6293.874        1.380        1.263    OK.
>> 12         40502.818        0.215        1.284    OK.
>> 12         31595.114        0.276        0.615    OK.
>> 12         61936.825        0.140        0.612    OK.
>> 12          3013.850        2.891        1.345    OK.
>> 12          3840.023        2.269        1.310    OK.
>> 0          2628.156        3.317          -      OK.
>> 0          2573.649        3.387          -      OK.
>> 0         95523.769        0.091          -      OK.
>> 0          2594.895        3.360          -      OK.
>> 0          2614.131        3.335          -      OK.
>> 0          2610.647        3.339          -      OK.
>> 0          2560.067        3.405          -      OK.
>> 0          2609.485        3.341          -      OK.
>> 0          2603.154        3.349          -      OK.
>> 0          2583.289        3.375          -      OK.
>> -1( 16)     2672.797        3.260        1.002    OK.
>> -1( 16)    57769.149        0.151        1.723    OK.
>> -1( 16)    48598.334        0.179        1.138    OK.
>> -1( 16)     2699.333        3.228        1.040    OK.
>> -1( 16)    54243.321        0.161        1.679    OK.
>> -1( 16)     2719.854        3.203        1.051    OK.
>> -1( 16)     2716.365        3.207        1.051    OK.
>> -1( 16)    24278.608        0.359        0.835    OK.
>> -1( 16)    19357.359        0.449        1.006    OK.
>> -1( 16)    45500.360        0.191        0.795    OK.
>> 
>> Tuning took   500.5 minutes.
>> 
>> ------------------------------------------------------------
>> Summary of successful runs:
>> Line tpr PME nodes  Gcycles Av.     Std.dev.       ns/day        PME/f    DD grid
>> 0   0   24         10684.386    10896.612        1.675        0.702    3 4   2
>> 1   0   23         13787.137    19462.982        1.678        0.968    1 5   5
>> 2   0   22         13554.332    13814.153        1.535        1.042    2 13   1
>> 3   0   21         25507.574    24601.033        1.358        0.878    3 3   3
>> 4   0   20         16337.758    31934.533        2.062        0.799    2 2   7
>> 5   0   18         25837.944    36067.176        1.689        1.045    3 2   5
>> 6   0   16         14193.123    19370.807        2.194        1.045    4 4   2
>> 7   0   15         27377.392    24308.700        1.132        1.069    3 11   1
>> 8   0   14         10187.694    11414.829        2.044        1.286    1 2  17
>> 9   0   13         13377.008    12969.168        1.547        1.581    1 5   7
>> 10   0   12         23924.199    20299.796        0.997        1.090    3 4   3
>> 11   0    0         11890.124    29385.874        3.030          -      6 4   2
>> 12   0   -1( 16)    26055.548    23371.735        1.439        1.132    4 4   2 
>> -- 
>> gmx-developers mailing list
>> gmx-developers at gromacs.org
>> http://lists.gromacs.org/mailman/listinfo/gmx-developers
>> Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-developers-request at gromacs.org.
> 
> --
> gmx-developers mailing list
> gmx-developers at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-developers
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-developers-request at gromacs.org.




More information about the gromacs.org_gmx-developers mailing list