[gmx-users] Minimal PCI Bandwidth for Gromacs and Infiniband?

Mon Mar 12 16:33:08 CET 2018

Two things I forgot:

- Ryzen chipsets are limited in the number of PCIE lanes, so if you
plug in a second card (e.g. IB), you'll get x8 on both, which means
GPU transfers will be slower too. Now, this may not be a great issue
if you run multiple ranks per GPU which will provide some
transfer/kernel overlap, boot sooner of later performance will suffer.

- Whether you need IB and which one is worth it vs GbE + RoCE is made
even more tricky by the fact that, GROMACS runs can actually fall on
either on the the bandwidth/latency-bound regime. While as Mark said
for GROMACS performance _mostly_ latency and injection rate what
matters, that's actually only true at high parallelization with small
data/rank hence small message size per rank. Also, newer IB adapters
can get near peak bandwidth already around 4 KB messages size, so if
you don't have too many nodes/MPI ranks, most of your P-to-P traffic
can still benefit from the higher bandwidth of some network -- if that
can be obtained at the relevant messages sizes.
As an example take from the on the previous HPC-AC presentation slide
15 [1] you can see that running the adh benchmark (134k atoms) across
16 nodes you get MPI_Sendrecv 20% <1KB, 15% 1-4KB, 5% 4-16KB, 60%
>16KB. If you take some data from the MVAPICH2 IB point-to-point
benchmarks [2], you can see that e.g. ConnectX-3 get close to peak
bandwidth at messages above 4-8KB, while Connect-IB only above
32-64KB.

[1] http://www.hpcadvisorycouncil.com/pdf/GROMACS_Analysis_Intel_E5_2697v3_K40_K80_GPUs.pdf
[2] http://mvapich.cse.ohio-state.edu/performance/pt_to_pt
--
Szilárd

On Mon, Mar 12, 2018 at 4:06 PM, Szilárd Páll <pall.szilard at gmail.com> wrote:
> Hi,
>
> Note that it matters a lot how far you want to parallelize and what
> kind of runs would you do? 10 GbE with RoCE may well be enough to
> scale across a couple of such nodes, especially if you can squeeze PME
> into a single node and avoid the MPI collectives across the network.
> You may not even see much difference between, say 10 GbE + RoCE and
> some older IB like CX-3 FDR. However, if you want to run at larger
> strong scale with short time/step, even the P-to-P communication of
> halo-exchange will become a bottleneck on slower networks as
> communications gets purely latency-bound.
>
> I have no data myself nor first-hand experience, but there are some
> results out there, e.g.
> http://www.hpcadvisorycouncil.com/pdf/GROMACS_Analysis_Intel_E5_2697v3_K40_K80_GPUs.pdf
> http://www.hpcadvisorycouncil.com/pdf/GROMACS_Analysis_Intel_E5_2697v3.pdf
>
> Take these with a grain of salt, however, as they seem to show
> different data in places, e.g. slide 10 of the former suggests that
> EDR IB is >3x faster already from 2 nodes with RF, but the latter
> suggests that on 2-4 nodes 10 GbE / 40 GbE is not too awful (though
> compared to EDR IB).
>
> Cheers,
> --
> Szilárd
>
>
> On Mon, Mar 12, 2018 at 9:38 AM, Mark Abraham <mark.j.abraham at gmail.com> wrote:
>> Hi,
>>
>> GROMACS doesn't much care about bandwidth, but rather message latency and
>> message injection rate (which in some cases depends on what else is sharing
>> the network). For those, even high quality gigabit ethernet *can* be good
>> enough, so likely any Infiniband product will be just fine. Unfortunately
>> we don't have access to any resource that would permit us to gather
>> comparative data.
>>
>> Mark
>>
>> On Mon, Mar 12, 2018 at 9:09 AM Simon Kit Sang Chu <simoncks1994 at gmail.com>
>> wrote:
>>
>>> Hi everyone,
>>>
>>> Our group is also interested in purchasing cloud GPU cluster. Amazon only
>>> supplies GPU cluster connected by 10Gb/s bandwidth. I notice this post but
>>> there is no reply by far. It would be nice if someone give any clue.
>>>
>>> Regards,
>>> Simon
>>>
>>> 2018-03-06 1:31 GMT+08:00 Daniel Bauer <bauer at cbs.tu-darmstadt.de>:
>>>
>>> > Hello,
>>> >
>>> > In our group, we have multiple identical Ryzen 1700x / Nvidia GeForce
>>> > 1080 GTX computing nodes and think about interconnecting them via
>>> > InfiniBands.
>>> >
>>> > Does anyone have Information on what Bandwidth is required by GROMACS
>>> > for communication via InfiniBand (MPI + trajectory writing) and how it
>>> > scales with the number of nodes?
>>> >
>>> > The mainboards we are currently using can only run one PCIe slot with 16
>>> > lanes. When using both PICe slots (GPU+InfiniBand), they will run in
>>> > dual x8 mode (thus bandwidth for both GPU and InfiniBand will be reduced
>>> > to 8 GB/s instead of 16 GB/s). Now we wonder if the reduced bandwidth
>>> > will hurt GROMACS performance due to bottlenecks in GPU/CPU
>>> > communication and/or communication via InfiniBand. If this is the case,
>>> > we might have to upgrade to new mainboards with dual x16 support.
>>> >
>>> >
>>> > Best regards,
>>> >
>>> > Daniel
>>> >
>>> > --
>>> > Daniel Bauer, M.Sc.
>>> >
>>> > TU Darmstadt
>>> > Computational Biology & Simulation
>>> > Schnittspahnstr. 2
>>> > 64287 Darmstadt
>>> > bauer at cbs.tu-darmstadt.de
>>> >
>>> > Don't trust atoms, they make up everything.
>>> >
>>> >
>>> > --
>>> > Gromacs Users mailing list
>>> >
>>> > * Please search the archive at http://www.gromacs.org/
>>> > Support/Mailing_Lists/GMX-Users_List before posting!
>>> >
>>> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>> >
>>> > * For (un)subscribe requests visit
>>> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>> > send a mail to gmx-users-request at gromacs.org.
>>> >
>>> --
>>> Gromacs Users mailing list
>>>
>>> * Please search the archive at
>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>> posting!
>>>
>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>
>>> * For (un)subscribe requests visit
>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>> send a mail to gmx-users-request at gromacs.org.
>>>
>> --
>> Gromacs Users mailing list
>>
>> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.