[gmx-users] 3 GPUs much faster than 2 GPUs with GROMACS-4.6.2 ???

yunshi11 . yunshi09 at gmail.com
Mon Dec 9 18:28:34 CET 2013

Hi all,

I have a physical compute node with 2x 6-core Intel E5649 processors +
three NVIDIA Tesla M2070s GPUs.

First I tried using all 12 CPU cores + 3 GPUs for an equilibration run (of
protein in TIP3 waters), which gave me 8.964 ns/day performance.

But I noticed the PME mesh calculation, which I assume is done on CPU
cores/OpenMP threads, has taken up 62% of the Wall t/G cycle. It seems that
the CPU cores/OpenMP threads have too much work to do and the GPUs have to

PME mesh               3    4      50001     597.925    18177.760    62.0

Thus, I tried running with all 12 CPU cores + 2 GPUs, which is more natural
to me since the 6 cores of each Intel E5649 processor is tied to 1 GPU,
making 6 CPU cores/OpenMP threads per MPI process. However, this resulted
in a performance of only 5.694 ns/day, less than 2/3 of the previous run.
Yet, the PME mess calculation took 55.6% of the Wall t/G cycle, NOT very
different from the previous run.

 PME mesh               2    6      50001     843.186    25633.615    55.6

Does anyone know why this is the case? Why would different numbers of GPUs
affect the calculation the PME mesh?


