[gmx-developers] RE: Gromacs on 48 core magny-cours AMDs

Igor Leontyev ileontyev at ucdavis.edu
Sat Sep 17 21:04:41 CEST 2011


The problem with unstable gromacs performance still exists but there is some 
progress:
 1) MPI version is unstable in runs using 48 cores per node, but STABLE with 
use of less then 48 cr/node.
 2) MPI version is running well even on 180 cores distributed by 45 per each 
of 4 nodes.
 3) Threaded version has no problems on 48 core runs.

- the cluster configuration is typical (not NUMA);
- software: Gromacs 4.5.4; compiled by gcc4.4.6; CentOS 5.6 kernel 
2.6.18-238.19.1.el5.
- the compilation used default math libraries and OpenMPI 1.4.3 supporting 
InfiniBand

Any idea why the use of all 48 cr/node results in unstable performance ?

Igor Leontyev

> Igor wrote:
> The issue might be related to configuration of our brand new cluster which
> I am testing now. On this cluster the unstable behavior of gromacs is also
> observed on Intel Xeon nodes. For gromacs installation I repeated all the
> steps that I have previously done many times on 8-core dual-Xeon
> workstation and have no problems. See bellow the compilation script.
>
> # =====================================================================
> #
> # path where to install
> pth_install=/home/leontyev/programs/bin/gromacs/gromacs-4.5.4
> # program name suffix
> suff="_mpich1.4.3"
> # path of FFTW library
> # SINGLE PRECISION
> pth_fft=/home/leontyev/programs/bin/fftw/fftw-3.2.2/single
> # path of 'open_mpi' library
> pth_lam=/home/leontyev/programs/bin/mpi/openmpi/openmpi-1.4.3
> export LD_LIBRARY_PATH="$pth_lam/lib"
>
> PATH="$pth_lam/bin:$PATH"
>
> export CPPFLAGS="-I/$pth_fft/include -I/$pth_lam/include"
> export LDFLAGS="-L/$pth_fft/lib -L/$pth_lam/lib"
>
> make distclean
> # SINGLE PRECISION
> ./configure --without-x  --prefix=/$pth_install --program-suffix=$suff  --
> enable-mpi
>
> make -j 12 mdrun >& install.log
> make install-mdrun  >> install.log
> # =====================================================================
>
> Igor
>
>
>> Alexey Shvetsov wrote:
>>
>> Hello!
>>
>> Well there may be problems
>> 1. Old kernel that works incorrectly with large numa
>> 2. No correct process  binding to core
>> 3. Configuration of gcc/math libs
>>
>> What is your mpi version and versions of fftw and blas libs if you use
>> external ones.
>> Also please post your cflags.
>>
>> Here we have good performance on such nodes running SLES with 2.6.32
>> kernel (with gentoo-prefix on top of it with openmpi and ofed stack)
>> and with Gentoo (kenrel 3.0.4) with many system optimiztions made by me
>> =)
>>
>> All results are stable. Gentoo works better here becuse it doesnt has
>> bug with irq in kernel + some optimizations.


> On Sep 1, 2011, at 9:19 AM, Sander Pronk wrote:
>
>>
>> On 31 Aug 2011, at 22:10 , Igor Leontyev wrote:
>>
>>> Hi
>>> I am benchmarking a 100K atom system (protein ~12K and solvent ~90K 
>>> atoms, 1 fs time step, cutoffs 1.2 nm) on a 48-core 2.1 GHz AMD node. 
>>> Software: Gromacs 4.5.4; compiled by gcc4.4.6; CentOS 5.6 kernel 
>>> 2.6.18-238.19.1.el5. See the results of g_tune_pme bellow. The 
>>> performance is absolutely unstable, the computation time for equivalent 
>>> runs can differ by orders of magnitude.
>>>
>>> The issue seems to be similar to what has been discussed earlier
>>> http://lists.gromacs.org/pipermail/gmx-users/2010-October/055113.html
>>> Is there any progress in resolving it?
>>
>> That's an old kernel. If I remember correctly, that thread discussed 
>> issues related to thread&process affinity and NUMA-awareness on older 
>> kernels.
>>
>> Perhaps you could try a newer kernel?
>
> Hi,
>
> we are running a slightly older kernel and get nice performance on our 
> 48-core magny-cours.
> Maybe for mpich the processes are not pinning to the cores correctly.
>
> Could you try the threaded version of mdrun? This is what gives the best 
> (and reliable)
> performance in our case.
>
> Carsten
>
>
>>
>>
>>>
>>> Igor
>>>
>>>
>>> ------------------------------------------------------------
>>>
>>>    P E R F O R M A N C E   R E S U L T S
>>>
>>> ------------------------------------------------------------
>>> g_tune_pme for Gromacs VERSION 4.5.4
>>> Number of nodes         : 48
>>> The mpirun command is   : 
>>> /home/leontyev/programs/bin/mpi/openmpi/openmpi-1.4.3/bin/mpirun --hostfile 
>>> node_loading.txt
>>> Passing # of nodes via  : -np
>>> The mdrun  command is   : 
>>> /home/leontyev/programs/bin/gromacs/gromacs-4.5.4/bin/mdrun_mpich1.4.3
>>> mdrun args benchmarks   : -resetstep 100 -o bench.trr -x bench.xtc -cpo 
>>> bench.cpt -c bench.gro -e bench.edr -g bench.log
>>> Benchmark steps         : 1000
>>> dlb equilibration steps : 100
>>> Repeats for each test   : 10
>>> Input file              : cco_PM_ff03_sorin_scaled_meanpol.tpr
>>> Coulomb type         : PME
>>> Grid spacing x y z   : 0.114376 0.116700 0.116215
>>> Van der Waals type   : Cut-off
>>>
>>> Will try these real/reciprocal workload settings:
>>> No.   scaling  rcoulomb  nkx  nky  nkz   spacing      rvdw  tpr file
>>> 0   -input-  1.200000   72   80  112  0.116700   1.200000 
>>> cco_PM_ff03_sorin_scaled_meanpol_bench00.tpr
>>>
>>> Individual timings for input file 0 
>>> (cco_PM_ff03_sorin_scaled_meanpol_bench00.tpr):
>>> PME nodes      Gcycles       ns/day        PME/f    Remark
>>> 24          3185.840        2.734        0.538    OK.
>>> 24          7237.416        1.203        1.119    OK.
>>> 24          3225.448        2.700        0.546    OK.
>>> 24          5844.942        1.489        1.012    OK.
>>> 24          4013.986        2.169        0.552    OK.
>>> 24         18578.174        0.469        0.842    OK.
>>> 24          3234.702        2.692        0.559    OK.
>>> 24         25818.267        0.337        0.815    OK.
>>> 24         32470.278        0.268        0.479    OK.
>>> 24          3234.806        2.692        0.561    OK.
>>> 23         15097.577        0.577        0.824    OK.
>>> 23          2948.211        2.954        0.705    OK.
>>> 23         15640.485        0.557        0.826    OK.
>>> 23         66961.240        0.130        3.215    OK.
>>> 23          2964.927        2.938        0.698    OK.
>>> 23          2965.896        2.937        0.669    OK.
>>> 23         11205.121        0.774        0.668    OK.
>>> 23          2964.737        2.938        0.672    OK.
>>> 23         13384.753        0.649        0.665    OK.
>>> 23          3738.425        2.329        0.738    OK.
>>> 22          3130.744        2.782        0.682    OK.
>>> 22          3981.770        2.187        0.659    OK.
>>> 22          6397.259        1.350        0.666    OK.
>>> 22         41374.579        0.211        3.509    OK.
>>> 22          3193.327        2.728        0.683    OK.
>>> 22         21405.007        0.407        0.871    OK.
>>> 22          3543.511        2.457        0.686    OK.
>>> 22          3539.981        2.460        0.701    OK.
>>> 22         30946.123        0.281        1.235    OK.
>>> 22         18031.023        0.483        0.729    OK.
>>> 21          2978.520        2.924        0.699    OK.
>>> 21          4487.921        1.940        0.666    OK.
>>> 21         39796.932        0.219        1.085    OK.
>>> 21          3027.659        2.877        0.714    OK.
>>> 21         58613.050        0.149        1.089    OK.
>>> 21          2973.281        2.929        0.698    OK.
>>> 21         34991.505        0.249        0.702    OK.
>>> 21          4479.034        1.944        0.696    OK.
>>> 21         40401.894        0.216        1.310    OK.
>>> 21         63325.943        0.138        1.124    OK.
>>> 20         17100.304        0.510        0.620    OK.
>>> 20          2859.158        3.047        0.832    OK.
>>> 20          2660.459        3.274        0.820    OK.
>>> 20          2871.060        3.034        0.821    OK.
>>> 20        105947.063        0.082        0.728    OK.
>>> 20          2851.650        3.055        0.827    OK.
>>> 20          2766.737        3.149        0.837    OK.
>>> 20         13887.535        0.627        0.813    OK.
>>> 20          9450.158        0.919        0.854    OK.
>>> 20          2983.460        2.920        0.838    OK.
>>> 19             0.000        0.000          -      No DD grid found for 
>>> these settings.
>>> 18         62490.241        0.139        1.070    OK.
>>> 18         75625.947        0.115        0.512    OK.
>>> 18          3584.509        2.430        1.176    OK.
>>> 18          4988.745        1.734        1.197    OK.
>>> 18         92981.804        0.094        0.529    OK.
>>> 18          3070.496        2.837        1.192    OK.
>>> 18          3089.339        2.820        1.204    OK.
>>> 18          5880.675        1.465        1.170    OK.
>>> 18          3094.133        2.816        1.214    OK.
>>> 18          3573.552        2.437        1.191    OK.
>>> 17             0.000        0.000          -      No DD grid found for 
>>> these settings.
>>> 16          3105.597        2.805        0.998    OK.
>>> 16          2719.826        3.203        1.045    OK.
>>> 16          3124.013        2.788        0.992    OK.
>>> 16          2708.751        3.216        1.030    OK.
>>> 16          3116.887        2.795        1.023    OK.
>>> 16          2695.859        3.232        1.038    OK.
>>> 16          2710.272        3.215        1.033    OK.
>>> 16         32639.259        0.267        0.514    OK.
>>> 16         56748.577        0.153        0.959    OK.
>>> 16         32362.192        0.269        1.816    OK.
>>> 15         40410.983        0.216        1.241    OK.
>>> 15          3727.108        2.337        1.262    OK.
>>> 15          3297.944        2.642        1.242    OK.
>>> 15         23012.201        0.379        0.994    OK.
>>> 15          3328.307        2.618        1.248    OK.
>>> 15         56869.719        0.153        0.568    OK.
>>> 15         26662.044        0.327        0.854    OK.
>>> 15         44026.837        0.198        1.198    OK.
>>> 15          3754.812        2.320        1.238    OK.
>>> 15         68683.967        0.127        0.844    OK.
>>> 14          2934.532        2.969        1.466    OK.
>>> 14          2824.434        3.085        1.430    OK.
>>> 14          2778.103        3.137        1.391    OK.
>>> 14         28435.548        0.306        0.957    OK.
>>> 14          2876.113        3.030        1.396    OK.
>>> 14          2803.951        3.108        1.438    OK.
>>> 14          9538.366        0.913        1.400    OK.
>>> 14          2887.242        3.018        1.424    OK.
>>> 14         32542.115        0.268        0.529    OK.
>>> 14         14256.539        0.609        1.432    OK.
>>> 13          5010.011        1.732        1.768    OK.
>>> 13         19270.893        0.452        1.481    OK.
>>> 13          3451.426        2.525        1.860    OK.
>>> 13         28566.186        0.305        0.620    OK.
>>> 13          3481.006        2.504        1.833    OK.
>>> 13         28457.876        0.306        0.933    OK.
>>> 13          3689.128        2.362        1.795    OK.
>>> 13          3451.925        2.525        1.831    OK.
>>> 13         34918.063        0.249        1.838    OK.
>>> 13          3473.566        2.509        1.854    OK.
>>> 12         42705.256        0.204        1.039    OK.
>>> 12          4934.453        1.763        1.292    OK.
>>> 12         16759.163        0.520        1.288    OK.
>>> 12         27660.618        0.315        0.855    OK.
>>> 12          6293.874        1.380        1.263    OK.
>>> 12         40502.818        0.215        1.284    OK.
>>> 12         31595.114        0.276        0.615    OK.
>>> 12         61936.825        0.140        0.612    OK.
>>> 12          3013.850        2.891        1.345    OK.
>>> 12          3840.023        2.269        1.310    OK.
>>> 0          2628.156        3.317          -      OK.
>>> 0          2573.649        3.387          -      OK.
>>> 0         95523.769        0.091          -      OK.
>>> 0          2594.895        3.360          -      OK.
>>> 0          2614.131        3.335          -      OK.
>>> 0          2610.647        3.339          -      OK.
>>> 0          2560.067        3.405          -      OK.
>>> 0          2609.485        3.341          -      OK.
>>> 0          2603.154        3.349          -      OK.
>>> 0          2583.289        3.375          -      OK.
>>> -1( 16)     2672.797        3.260        1.002    OK.
>>> -1( 16)    57769.149        0.151        1.723    OK.
>>> -1( 16)    48598.334        0.179        1.138    OK.
>>> -1( 16)     2699.333        3.228        1.040    OK.
>>> -1( 16)    54243.321        0.161        1.679    OK.
>>> -1( 16)     2719.854        3.203        1.051    OK.
>>> -1( 16)     2716.365        3.207        1.051    OK.
>>> -1( 16)    24278.608        0.359        0.835    OK.
>>> -1( 16)    19357.359        0.449        1.006    OK.
>>> -1( 16)    45500.360        0.191        0.795    OK.
>>>
>>> Tuning took   500.5 minutes.
>>>
>>> ------------------------------------------------------------
>>> Summary of successful runs:
>>> Line tpr PME nodes  Gcycles Av.     Std.dev.       ns/day        PME/f 
>>> DD grid
>>> 0   0   24         10684.386    10896.612        1.675        0.702    3 
>>> 4   2
>>> 1   0   23         13787.137    19462.982        1.678        0.968    1 
>>> 5   5
>>> 2   0   22         13554.332    13814.153        1.535        1.042    2 
>>> 13   1
>>> 3   0   21         25507.574    24601.033        1.358        0.878    3 
>>> 3   3
>>> 4   0   20         16337.758    31934.533        2.062        0.799    2 
>>> 2   7
>>> 5   0   18         25837.944    36067.176        1.689        1.045    3 
>>> 2   5
>>> 6   0   16         14193.123    19370.807        2.194        1.045    4 
>>> 4   2
>>> 7   0   15         27377.392    24308.700        1.132        1.069    3 
>>> 11   1
>>> 8   0   14         10187.694    11414.829        2.044        1.286    1 
>>> 2  17
>>> 9   0   13         13377.008    12969.168        1.547        1.581    1 
>>> 5   7
>>> 10   0   12         23924.199    20299.796        0.997        1.090 
>>> 3 4   3
>>> 11   0    0         11890.124    29385.874        3.030          - 
>>> 6 4   2
>>> 12   0   -1( 16)    26055.548    23371.735        1.439        1.132 
>>> 4 4   2
>>> -- 




More information about the gromacs.org_gmx-developers mailing list