[gmx-users] Intel vs gcc compilers

Wed Jun 26 09:30:40 CEST 2013

>You're using a real-MPI process per core, and you have six cores per

I was using the current setup, which is indeed not fully optimized, just to see how much the speed-up is between intel and gcc compiled.

>processor. The recommended procedure is to map cores to OpenMP
>threads, and choose the number of MPI processes per processor (and
>thus the number of OpenMP threads per MPI process) to maximize
>performance. See
>http://www.gromacs.org/Documentation/Acceleration_and_parallelization#Multi-level_parallelization.3a_MPI.2fthread-MPI_.2b_OpenMP

I have optimized this before. In my experience one only gets a speedup from using openMP at high parrellization (+/-200 particles per PP core) and if I use #mpi = total number of cores AND 2 openMP threads per mpi process. The total number of processes is then double the number of cores, so you are effectively overloading/hyperthreading the cores (and thus the number of particles per PP process is +/- 100). I have a similar experience on a newer, intel based system, although there the advantage already starts at lower parrallelization. I was wondering if openMP is always used in combination with hyperthreading?

On the machine from my previous email, using openMP gives the warning:

"Can not set thread affinities on the current platform. On NUMA systems this
can cause performance degradation. If you think your platform should support
setting affinities, contact the GROMACS developers."

With the gcc compiled version the, using 72 cores\700 particles/PP core this indeed leads a slightly lower performance. However using the intel compiled version the simulations get orders of magnitude slower. 

Groetnis,
Djurre