[gmx-users] Various questions related to Gromacs performance tuning

Tue Mar 24 21:03:35 CET 2020

Hi,

> Am 24.03.2020 um 16:28 schrieb Tobias Klöffel <tobias.kloeffel at fau.de>:
> 
> Dear all,
> I am very new to Gromacs so maybe some of my problems are very easy to fix:)
> Currently I am trying to compile and benchmark gromacs on AMD rome cpus, the benchmarks are taken from:
> https://www.mpibpc.mpg.de/grubmueller/bench
> 
> 1) OpenMP parallelization: Is it done via OpenMP tasks?
Yes, all over the code loops are parallelized via OpenMP via #pragma omp parallel for
and similar directives.

> If the Intel toolchain is detected and  -DGMX_FFT_LIBRARY=mkl is set,-mkl=serial is used, even though -DGMX_OPENMP=on is set.
GROMACS uses only the serial transposes - allowing mkl to open up its own OpenMP threads
would lead to oversubscription of cores and performance degradation.

> 2) I am trying to use gmx_mpi tune_pme but I never got it to run. I do not really understand what I have to specify for -mdrun. I 
Normally you need a serial (read: non-mpi enabled) 'gmx' so that you can call
gmx tune_pme. Most queueing systems don't like it if one parallel program calls
another parallel program.

> tried -mdrun 'gmx_mpi mdrun' and export MPIRUN="mpirun -use-hwthread-cpus -np $tmpi -map-by ppr:$tnode:node:pe=$OMP_NUM_THREADS --report-bindings" But it just complains that mdrun is not working.
There should be an output somewhere with the exact command line that
tune_pme invoked to test whether mdrun works. That should shed some light
on the issue.

Side note: Tuning is normally only useful on CPU-nodes. If your nodes also
have GPUs, you will probably not want to do this kind of PME tuning.

> Normal  execution via $MPIRUN gmx_mpi mdrun -s ... works
> 
> 
> 3) As far as I understood, most time of PME is spent in a 3d FFT and hence probably most time is spent in a mpi alltoall communication.
Yes, but that also depends a lot on the number of nodes you are running on.
Check for yourself: Do a 'normal' mdrun (without tune_pme) on the number of 
nodes that you are interested and check the detailed timings at the end of
the log file. There you will find how much time is spent in various PME
routines.

Best,
  Carsten

> For that reason I would like to place all PME tasks on a separate node via -ddorder pp_pme. If I do so, the calculations just hangs. Specifying -ddorder interleave or cartesian works without problems. Is this a known issue?
> 
> Kind regards,
> Tobias Klöffel
> 
> -- 
> M.Sc. Tobias Klöffel
> =======================================================
> HPC (High Performance Computing) group
> Erlangen Regional Computing Center(RRZE)
> Friedrich-Alexander-Universität Erlangen-Nürnberg
> Martensstr. 1
> 91058 Erlangen
> 
> Room: 1.133
> Phone: +49 (0) 9131 / 85 - 20101
> 
> =======================================================
> 
> E-mail: tobias.kloeffel at fau.de
> 
> -- 
> Gromacs Users mailing list
> 
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
> 
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> 
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.

--
Dr. Carsten Kutzner
Max Planck Institute for Biophysical Chemistry
Theoretical and Computational Biophysics
Am Fassberg 11, 37077 Goettingen, Germany
Tel. +49-551-2012313, Fax: +49-551-2012302
http://www.mpibpc.mpg.de/grubmueller/kutzner
http://www.mpibpc.mpg.de/grubmueller/sppexa