[gmx-users] older server CPUs with recent GPUs for GROMACS

Szilárd Páll pall.szilard at gmail.com
Thu Jul 25 10:56:48 CEST 2019


Hi Mike,

Forking the discussion to have a consistent topic that is more discoverable.

On Thu, Jul 18, 2019 at 4:21 PM Michael Williams
<michael.r.c.williams at gmail.com> wrote:
>
> Hi Szilárd,
>
> Thanks for the interesting observations on recent hardware. I was wondering if you could comment on the use of somewhat older server cpus and motherboards (versus more cutting edge consumer parts). I recently noticed that Haswell era Xeon cpus (E5 v3) are quite affordable now (~$400 for 12 core models with 40 pcie lanes) and so are the corresponding 2 cpu socket server motherboards. Of course the RAM is slower than what can be used with the latest Ryzen or i7/i9 cpus.


,When it comes to GPU accelerated runs, given that most of the
arithmetically-intensive computation is offloaded, major features of
more modern processors / CPU instruction sets don't help much (like
AVX512). As most bio-MD (unless running huge systems) fits in the CPU
cache, RAM performance and more memory channels also has little to no
impact (with some exceptions being 1-st gen AMD Zen arch, but that's
another topic). What dominates the performance CPU contribution of
CPUs is cache size (and speed/efficiency) and number/speed of the CPU
cores. This is somewhat of a non-trivial thing to assess as the clock
speed specs don't always reflect the stable clocks these CPUs run at,
but roughly you can count the (#core x frequency) as a metric to gauge
the performance of a CPU *in such a scenario*.

More on this you can find in our recent paper where we do in fact
compare the performance of the best bang for buck modern servers
(spoiler alert: AMD EPYC was already and will especially be the
champion with the Rome arch) with upgraded older Xeon v2 nodes; see:
https://doi.org/10.1002/jcc.26011

>
> Are there any other bottlenecks with this somewhat older server hardware that I might not be aware of?

There can be: PCI topology can be an issue; you want a symmetric, e.g.
two x16 buses connected directly to each socket (for dual-socket
systems) rather than e.g. many lanes connected to a PCI switch all
connected to the same socket. You can also have significant GPU-to-GPU
communication issues on older-gen hardware (like v2/v3 Xeon), but
GROMACS does not make use of that yet (partly due to that very
reason), but with the near future releases that may also be a slight
concern if you want to scale across many GPUs.


I hope that helps, let me know if you have any other questions!

Cheers,
--
Szilárd

> Thanks again for the interesting information and practical advice on this topic.
>
> Mike
>
>
> > On Jul 18, 2019, at 2:21 AM, Szilárd Páll <pall.szilard at gmail.com> wrote:
> >
> > PS: You will get more PCIe lanes without motherboard trickery -- and note
> > that consumer motherboards with PCIe switches can sometimes cause
> > instabilities when under heavy compute load -- if you buy the aging and
> > quite overpriced i9 X-series like the i9-7920 with 12 cores or the
> > Threadripper 2950x 16 cores and 60 PCIe lanes.
> >
> > Also note that, but more cores always win when the CPU performance matters
> > and while 8 cores are generally sufficient, in some use-cases it may not be
> > (like runs with free energy).
> >
> > --
> > Szilárd
> >
> >
> > On Thu, Jul 18, 2019 at 10:08 AM Szilárd Páll <pall.szilard at gmail.com>
> > wrote:
> >
> >> On Wed, Jul 17, 2019 at 7:00 PM Moir, Michael (MMoir) <MMoir at chevron.com>
> >> wrote:
> >>
> >>> This is not quite true.  I certainly observed this degradation in
> >>> performance using the 9900K with two GPUs as Szilárd states using a
> >>> motherboard with one PCIe controller, but the limitation is from the
> >>> motherboard not from the CPU.
> >>
> >>
> >> Sorry, but that's not the case. PCIe controllers have been integrated into
> >> CPUs for many years; see
> >>
> >> https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-introduction-basics-paper.pdf
> >>
> >> https://www.microway.com/hpc-tech-tips/common-pci-express-myths-gpu-computing/
> >>
> >> So no, the limitation is the CPU itself. Consumer CPUs these days have 24
> >> lanes total, some of which are used to connect the CPU to the chipset, and
> >> effectively you get 16-20 lanes (BTW here too the new AMD CPUs win as they
> >> provide 16 lanes for GPUs and similar devices and 4 lanes for NVMe, all on
> >> PCIe 4.0).
> >>
> >>
> >>>  It is possible to obtain a motherboard that contains two PCIe
> >>> controllers which overcomes this obstacle for not a whole lot more money.
> >>>
> >>
> >> It is possibly to buy motherboards with PCIe switches. These don't
> >> increase the number of lanes just do what a swtich does: as long as not all
> >> connected devices try to use the full capacity of the CPU (!) at the same
> >> time, you can get full speed on all connected devices.
> >> e.g.:
> >> https://techreport.com/r.x/2015_11_19_Gigabytes_Z170XGaming_G1_motherboard_reviewed/05-diagram_pcie_routing.gif
> >>
> >> Cheers,
> >> --
> >> Szilárd
> >>
> >> Mike
> >>>
> >>> -----Original Message-----
> >>> From: gromacs.org_gmx-users-bounces at maillist.sys.kth.se <
> >>> gromacs.org_gmx-users-bounces at maillist.sys.kth.se> On Behalf Of Szilárd
> >>> Páll
> >>> Sent: Wednesday, July 17, 2019 8:14 AM
> >>> To: Discussion list for GROMACS users <gmx-users at gromacs.org>
> >>> Subject: [**EXTERNAL**] Re: [gmx-users] Xeon Gold + RTX 5000
> >>>
> >>> Hi Alex,
> >>>
> >>> I've not had a chance to test the new 3rd gen Ryzen CPUs, but all
> >>> public benchmarks out there point to the fact that they are a major
> >>> improvement over the previous generation Ryzen -- which were already
> >>> quite competitive for GPU-accelerated GROMACS runs compared to Intel,
> >>> especially in perf/price.
> >>>
> >>> One caveat for dual-GPU setups on the i9 9900 or the Ryzen 3900X is
> >>> that they don't have enough PCI lanes for peak CPU-GPU transfer (x8
> >>> for both of the GPUs) which will lead to a slightly less performance
> >>> (I'd estimate <5-10%) in particular compared to i) having a single GPU
> >>> plugged in into the machine ii) compare to CPUs like Threadripper or
> >>> the i9 79xx series processors which have more PCIe lanes.
> >>>
> >>> However, if throughput is the goal, the ideal use-case especially for
> >>> small simulation systems like <=50k atoms is to run e.g. 2 runs / GPU,
> >>> hence 4 runs on a 2-GPU system case in which the impact of the
> >>> aforementioned limitation will be further decreased.
> >>>
> >>> Cheers,
> >>> --
> >>> Szilárd
> >>>
> >>>
> >>>> On Tue, Jul 16, 2019 at 7:18 PM Alex <nedomacho at gmail.com> wrote:
> >>>>
> >>>> That is excellent information, thank you. None of us have dealt with AMD
> >>>> CPUs in a while, so would the combination of a Ryzen 3900X and two
> >>>> Quadro 2080 Ti be a good choice?
> >>>>
> >>>> Again, thanks!
> >>>>
> >>>> Alex
> >>>>
> >>>>
> >>>>> On 7/16/2019 8:41 AM, Szilárd Páll wrote:
> >>>>> Hi Alex,
> >>>>>
> >>>>>> On Mon, Jul 15, 2019 at 8:53 PM Alex <nedomacho at gmail.com> wrote:
> >>>>>> Hi all and especially Szilard!
> >>>>>>
> >>>>>> My glorious management asked me to post this here. One of our group
> >>>>>> members, an ex-NAMD guy, wants to use Gromacs for biophysics and the
> >>>>>> following basics have been spec'ed for him:
> >>>>>>
> >>>>>> CPU: Xeon Gold 6244
> >>>>>> GPU: RTX 5000 or 6000
> >>>>>>
> >>>>>> I'll be surprised if he runs systems with more than 50K particles.
> >>> Could
> >>>>>> you please comment on whether this is a cost-efficient and reasonably
> >>>>>> powerful setup? Your past suggestions have been invaluable for us.
> >>>>> That will be reasonably fast, but cost efficiency will be awful, to
> >>> be honest:
> >>>>> - that CPU is a ~$3000 part and won't perform much better than a
> >>>>> $4-500 desktop CPU like an i9 9900, let alone a Ryzen 3900X which
> >>>>> would be significantly faster.
> >>>>> - Quadro cards also pretty low in bang for buck: a 2080 Ti will be
> >>>>> close to the RTX 6000 for ~5x less and the 2080 or 2070 Super a bit
> >>>>> slower for at least another 1.5x less.
> >>>>>
> >>>>> Single run at a time or possibly multiple? The proposed (or any 8+
> >>>>> core) workstation CPU is fast enough in the majority of the
> >>>>> simulations to pair well with two of those GPUs if used for two
> >>>>> concurrent simulations. If that's a relevant use-case, I'd recommend
> >>>>> two 2070 Super or 2080 cards.
> >>>>>
> >>>>> Cheers,
> >>>>> --
> >>>>> Szilárd
> >>>>>
> >>>>>
> >>>>>> Thank you,
> >>>>>>
> >>>>>> Alex
> >>>>>> --
> >>>>>> Gromacs Users mailing list
> >>>>>>
> >>>>>> * Please search the archive at
> >>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> >>> posting!
> >>>>>>
> >>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>>>>>
> >>>>>> * For (un)subscribe requests visit
> >>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> >>> or send a mail to gmx-users-request at gromacs.org.
> >>>> --
> >>>> Gromacs Users mailing list
> >>>>
> >>>> * Please search the archive at
> >>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> >>> posting!
> >>>>
> >>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>>>
> >>>> * For (un)subscribe requests visit
> >>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> >>> send a mail to gmx-users-request at gromacs.org.
> >>> --
> >>> Gromacs Users mailing list
> >>>
> >>> * Please search the archive at
> >>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> >>> posting!
> >>>
> >>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>>
> >>> * For (un)subscribe requests visit
> >>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> >>> send a mail to gmx-users-request at gromacs.org.
> >>> --
> >>> Gromacs Users mailing list
> >>>
> >>> * Please search the archive at
> >>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> >>> posting!
> >>>
> >>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>>
> >>> * For (un)subscribe requests visit
> >>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> >>> send a mail to gmx-users-request at gromacs.org.
> >>
> >>
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.


More information about the gromacs.org_gmx-users mailing list