[gmx-users] Various questions related to Gromacs performance tuning

Sat Mar 28 19:42:38 CET 2020

On Sat, Mar 28, 2020, at 9:32 PM, Kutzner, Carsten wrote:
> 
> 
> > Am 26.03.2020 um 17:00 schrieb Tobias Klöffel <tobias.kloeffel at fau.de>:
> > 
> > Hi Carsten,
> > 
> > 
> > On 3/24/20 9:02 PM, Kutzner, Carsten wrote:
> >> Hi,
> >> 
> >>> Am 24.03.2020 um 16:28 schrieb Tobias Klöffel <tobias.kloeffel at fau.de>:
> >>> 
> >>> Dear all,
> >>> I am very new to Gromacs so maybe some of my problems are very easy to fix:)
> >>> Currently I am trying to compile and benchmark gromacs on AMD rome cpus, the benchmarks are taken from:
> >>> https://www.mpibpc.mpg.de/grubmueller/bench
> >>> 
> >>> 1) OpenMP parallelization: Is it done via OpenMP tasks?
> >> Yes, all over the code loops are parallelized via OpenMP via #pragma omp parallel for
> >> and similar directives.
> > Ok but that's not OpenMP tasking:)
> >> 
> >>> If the Intel toolchain is detected and  -DGMX_FFT_LIBRARY=mkl is set,-mkl=serial is used, even though -DGMX_OPENMP=on is set.
> >> GROMACS uses only the serial transposes - allowing mkl to open up its own OpenMP threads
> >> would lead to oversubscription of cores and performance degradation.
> > Ah I see. But then it should be noted somewhere in the docu that all FFTW/MKL calls are inside a parallel region. Is there a specific reason for this? Normally you can achieve much better performance if you call a threaded library outside of a parallel region and let the library use its own threads.

Creating and destroying threads can  sometimes be slow, which is what threaded libraries do upon entry and exit. Thus if a progam is already using threads, it can be faster to have multiple threads call threadsafe versions of the serial library if this is what the library does - likely the case for FFTW.

> >>> 2) I am trying to use gmx_mpi tune_pme but I never got it to run. I do not really understand what I have to specify for -mdrun. I
> >> Normally you need a serial (read: non-mpi enabled) 'gmx' so that you can call
> >> gmx tune_pme. Most queueing systems don't like it if one parallel program calls
> >> another parallel program.
> >> 
> >>> tried -mdrun 'gmx_mpi mdrun' and export MPIRUN="mpirun -use-hwthread-cpus -np $tmpi -map-by ppr:$tnode:node:pe=$OMP_NUM_THREADS --report-bindings" But it just complains that mdrun is not working.
> >> There should be an output somewhere with the exact command line that
> >> tune_pme invoked to test whether mdrun works. That should shed some light
> >> on the issue.
> >> 
> >> Side note: Tuning is normally only useful on CPU-nodes. If your nodes also
> >> have GPUs, you will probably not want to do this kind of PME tuning.
> > Yes it's CPU only... I will tune pp:ppme procs manually. However, 
> most of the times it is failing with 'too large prime number' what is 
> considered to be 'too large'?
> I think 2, 3, 5, 7, 11, and 13 and multiples of these are ok, but not 
> larger prime numbers.
> So for a fixed number of procs only some of the combinations PP:PME 
> will actually work.
> The ones that don't work would not be wise to choose from a performance 
> point of view.
> 
> Best,
>  Carsten
> 
> -- 
> Gromacs Users mailing list
> 
> * Please search the archive at 
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before 
> posting!
> 
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> 
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or 
> send a mail to gmx-users-request at gromacs.org.