[gmx-users] 3 GPUs much faster than 2 GPUs with GROMACS-4.6.2 ???

David van der Spoel spoel at xray.bmc.uu.se
Mon Dec 9 18:42:03 CET 2013


On 2013-12-09 18:27, yunshi11 . wrote:
> Hi all,
>
> I have a physical compute node with 2x 6-core Intel E5649 processors +
> three NVIDIA Tesla M2070s GPUs.
>
> First I tried using all 12 CPU cores + 3 GPUs for an equilibration run (of
> protein in TIP3 waters), which gave me 8.964 ns/day performance.
>
> But I noticed the PME mesh calculation, which I assume is done on CPU
> cores/OpenMP threads, has taken up 62% of the Wall t/G cycle. It seems that
> the CPU cores/OpenMP threads have too much work to do and the GPUs have to
> wait?
>
> PME mesh               3    4      50001     597.925    18177.760    62.0
>
>
>
>
> Thus, I tried running with all 12 CPU cores + 2 GPUs, which is more natural
> to me since the 6 cores of each Intel E5649 processor is tied to 1 GPU,
> making 6 CPU cores/OpenMP threads per MPI process. However, this resulted
> in a performance of only 5.694 ns/day, less than 2/3 of the previous run.
> Yet, the PME mess calculation took 55.6% of the Wall t/G cycle, NOT very
> different from the previous run.
>
>   PME mesh               2    6      50001     843.186    25633.615    55.6
>
>
>
> Does anyone know why this is the case? Why would different numbers of GPUs
> affect the calculation the PME mesh?
>
And did you try 6 cores + 1 GPU as well?
And 12 cores without GPU?
Your system maybe too small to scale to so much processing power.

> Regards,
> Yun
>


-- 
David van der Spoel, Ph.D., Professor of Biology
Dept. of Cell & Molec. Biol., Uppsala University.
Box 596, 75124 Uppsala, Sweden. Phone:	+46184714205.
spoel at xray.bmc.uu.se    http://folding.bmc.uu.se


More information about the gromacs.org_gmx-users mailing list