[gmx-users] 3 GPUs much faster than 2 GPUs with GROMACS-4.6.2 ???
yunshi09 at gmail.com
Mon Dec 9 18:28:34 CET 2013
I have a physical compute node with 2x 6-core Intel E5649 processors +
three NVIDIA Tesla M2070s GPUs.
First I tried using all 12 CPU cores + 3 GPUs for an equilibration run (of
protein in TIP3 waters), which gave me 8.964 ns/day performance.
But I noticed the PME mesh calculation, which I assume is done on CPU
cores/OpenMP threads, has taken up 62% of the Wall t/G cycle. It seems that
the CPU cores/OpenMP threads have too much work to do and the GPUs have to
PME mesh 3 4 50001 597.925 18177.760 62.0
Thus, I tried running with all 12 CPU cores + 2 GPUs, which is more natural
to me since the 6 cores of each Intel E5649 processor is tied to 1 GPU,
making 6 CPU cores/OpenMP threads per MPI process. However, this resulted
in a performance of only 5.694 ns/day, less than 2/3 of the previous run.
Yet, the PME mess calculation took 55.6% of the Wall t/G cycle, NOT very
different from the previous run.
PME mesh 2 6 50001 843.186 25633.615 55.6
Does anyone know why this is the case? Why would different numbers of GPUs
affect the calculation the PME mesh?
More information about the gromacs.org_gmx-users