[gmx-users] 3 GPUs much faster than 2 GPUs with GROMACS-4.6.2 ???

yunshi11 . yunshi09 at gmail.com
Mon Dec 9 18:28:34 CET 2013


Hi all,

I have a physical compute node with 2x 6-core Intel E5649 processors +
three NVIDIA Tesla M2070s GPUs.

First I tried using all 12 CPU cores + 3 GPUs for an equilibration run (of
protein in TIP3 waters), which gave me 8.964 ns/day performance.

But I noticed the PME mesh calculation, which I assume is done on CPU
cores/OpenMP threads, has taken up 62% of the Wall t/G cycle. It seems that
the CPU cores/OpenMP threads have too much work to do and the GPUs have to
wait?

PME mesh               3    4      50001     597.925    18177.760    62.0




Thus, I tried running with all 12 CPU cores + 2 GPUs, which is more natural
to me since the 6 cores of each Intel E5649 processor is tied to 1 GPU,
making 6 CPU cores/OpenMP threads per MPI process. However, this resulted
in a performance of only 5.694 ns/day, less than 2/3 of the previous run.
Yet, the PME mess calculation took 55.6% of the Wall t/G cycle, NOT very
different from the previous run.

 PME mesh               2    6      50001     843.186    25633.615    55.6



Does anyone know why this is the case? Why would different numbers of GPUs
affect the calculation the PME mesh?

Regards,
Yun


More information about the gromacs.org_gmx-users mailing list