[gmx-users] Cluster recommendations

Szilárd Páll pall.szilard at gmail.com
Tue Feb 10 02:03:24 CET 2015


Note that perf/W or perf/buck of a certain simulation on a certain
hardware can be quite misleading. Currently we balance CPU-GPU load by
shifting long-range electrostatics work to the short-range kernels.
This involves a tradeoff which does not always give the benefit one
may expect and that's because with the cut-off scaling the GPU load
increases much more than the gain in total performance (and this will
likely show in the power consumption too).

E.g. for a 15% cut-off scaling you'll get nearly 1.5x (=1.15^3)
increase in GPU work, but probably only ~5-7% performance improvement.
More importantly however, with a GPU that's only 2/3s of the
performance of the first one you get <10% lower performance. To put
that in perspective, that's roughly the performance difference between
a GTX 770 and 780 Ti.

So if a GTX 770 gives a balanced run with the CPU of your choice, than
using a 780 Ti may not be the most advantageous thing to do as the
latter costs twice as much as the former. It is also rather difficult
(and borderline unfair) to make comparisons between a less two CPUs
that have very different performance (and cost) by pairing both with
the same GPU!

Cheers,
--
Szilárd


On Mon, Feb 2, 2015 at 2:11 PM, Carsten Kutzner <ckutzne at gwdg.de> wrote:
> Hi David,
>
> On 22 Jan 2015, at 18:01, David McGiven <davidmcgivenn at gmail.com> wrote:
>
>> Hey Karsten,
>>
>> Just another question. What do you think will be the performance difference
>> between two gromacs runs with a ~100k atoms system like the one I mentioned
>> on my first email :
>>
>> - 1 server with 4 AMD processors, 16 cores each (64 cores) with no GPU
>> - 1 server with 4 AMD processors, 16 cores each (64 cores) with one GTX 980
>> GPU
>> - 1 server with 2 Intel processors, 10 cores each (20 cores) like the ones
>> you mentioned, with one or two GTX 980 GPU.
>>
>> I'm not interested in exact performance numbers, I just need to understand
>> the logistics behind the CPU/GPU combinations in order to make an
>> inteligent cluster purchase.
> I never benchmarked 64-core AMD nodes with GPUs. With a 80 k atoms test
> system using a 2 fs time step I get
> 24 ns/d on 64 AMD   cores 6272
> 16 ns/d on 32 AMD   cores 6380
> 36 ns/d on 32 AMD   cores 6380   with 1x GTX 980
> 40 ns/d on 32 AMD   cores 6380   with 2x GTX 980
> 27 ns/d on 20 Intel cores 2680v2
> 52 ns/d on 20 Intel cores 2680v2 with 1x GTX 980
> 62 ns/d on 20 Intel cores 2680v2 with 2x GTX 980
>
> So unless you can get the AMD nodes very cheap, probably the 20-core
> Intel nodes with 1 or 2 GPUs will give you the best performance and the best
> performance/price.
>
> Best,
>   Carsten
>
>>
>> Thanks again.
>>
>> Best,
>> D
>>
>>
>> 2015-01-16 14:46 GMT+01:00 Carsten Kutzner <ckutzne at gwdg.de>:
>>
>>> Hi David,
>>>
>>> On 16 Jan 2015, at 12:28, David McGiven <davidmcgivenn at gmail.com> wrote:
>>>
>>>> Hi Carsten,
>>>>
>>>> Thanks for your answer.
>>>>
>>>> 2015-01-16 11:11 GMT+01:00 Carsten Kutzner <ckutzne at gwdg.de>:
>>>>
>>>>> Hi David,
>>>>>
>>>>> we are just finishing an evaluation to find out which is the optimal
>>>>> hardware for Gromacs setups. One of the input systems is an 80,000 atom
>>>>> membrane channel system and thus nearly exactly what you want
>>>>> to compute.
>>>>>
>>>>> The biggest benefit you will get by adding one or two consumer-class
>>> GPUs
>>>>> to your nodes (e.g. NVIDIA GTX 980). That will typically double your
>>>>> performace-to-price ratio. This is true for Intel as well as for AMD
>>>>> nodes, however the best ratio in our tests was observed with 10-core
>>>>> Intel CPUs (2670v2, 2680v2) in combination with a GTX 780Ti or 980,
>>>>> ideally two of those CPUs with two GPUs on a node.
>>>>>
>>>>>
>>>> Was there a difference between 2670v2 (2.5 GHz) and 2680v2  (2.8 GHz) ?
>>> I'm
>>>> wondering if those 0,3 GHz are significative. Or the 0,5 GHz compared to
>>>> 2690v2 for the matter. There’s a significative difference in price
>>> indeed.
>>> Usually the percent improvement for Gromacs performance is not as much
>>> as the percent improvement in clock speed, so the cheaper ones will
>>> give you a higher performance-to-price ratio.
>>>
>>>>
>>>> I'm also wondering if the performance would be better with 16 core Intels
>>>> instead of 10 core. I.e E5-2698 v3.
>>> Didn’t test those.
>>>
>>>>
>>>> I would like to know which other tests have you done. What about AMD ?
>>> We tested AMD 6380 with 1-2 GTX 980 GPUs, which gives about the same
>>> performance-to-price ratio as a 10 core Intel 2680v2 node with one GTX 980.
>>> The Intel node gives you a higher per-node performance, though.
>>>
>>>>
>>>> Unless you want to buy expensive FDR14 Infiniband, scaling across two
>>>>> or more of those nodes won’t be good (~0.65 parallel efficiency across
>>> 2,
>>>>> ~0.45 across 4 nodes using QDR infiniband), so I would advise against
>>>>> it and go for more sampling on single nodes.
>>>>>
>>>>>
>>>> Well, that puzzles me. Why is it that you get poor performance ? Are you
>>>> talking about pure CPU jobs over infiniband, or are you talking about
>>>> CPU+GPU jobs over infiniband ?
>>> For a given network (e.g. QDR Infiniband), the scaling is better the lower
>>> the performance of the individual nodes. So for CPU-only nodes you
>>> will get a better scaling than for CPU+GPU nodes, which have a way higher
>>> per-node performance.
>>>
>>>> How come you won’t get good performance if a great percentage of
>>> The performance is good, it is just that the parallel efficiency is
>>> not optimal for an MD system <100,000 atoms, meaning you do not get two
>>> times the performance on two nodes in parallel as compared to the
>>> aggregated performance of two individual runs.
>>> Bigger systems will have a better parallel efficiency.
>>>
>>>> supercomputer centers in the world use InfiniBand ? And I'm sure lots of
>>>> users here in the list use gromacs over Infiniband.
>>> I do, too :)
>>> But you get more trajectory for your money if you can wait and run on
>>> a single node.
>>>
>>> Carsten
>>>
>>>>
>>>> Thanks again.
>>>>
>>>> Best Regards,
>>>> D
>>>>
>>>>
>>>>> Best,
>>>>> Carsten
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 15 Jan 2015, at 17:35, David McGiven <davidmcgivenn at gmail.com>
>>> wrote:
>>>>>
>>>>>> Dear Gromacs Users,
>>>>>>
>>>>>> We’ve got some funding to build a new cluster. It’s going to be used
>>>>> mainly
>>>>>> for gromacs simulations (80% of the time). We run molecular dynamics
>>>>>> simulations of transmembrane proteins inside a POPC lipid bilayer. In a
>>>>>> typical system we have ~100000 atoms, from which almost 1/3 correspond
>>> to
>>>>>> water molecules. We employ usual conditions with PME for electorstatics
>>>>> and
>>>>>> cutoffs for LJ interactions.
>>>>>>
>>>>>> I would like to hear your advice on which kind of machines are the best
>>>>>> bang-for-the-buck for that kind of simulations. For instance :
>>>>>>
>>>>>> - Intel or AMD ? My understanding is that Intel is faster but
>>> expensive,
>>>>>> and AMD is slower but cheaper. So at the end you almost get the same
>>>>>> performance-per-buck. Right ?
>>>>>>
>>>>>> - Many CPUs/Cores x machine or less ? My understanding is that the more
>>>>>> cores x machine the lesser the costs. One machine is always cheaper to
>>>>> buy
>>>>>> and maintain than various. Plus maybe you can save the costs of
>>>>> Infiniband
>>>>>> if you use large core densities ?
>>>>>>
>>>>>> - Should we invest in an Infiniband network to run jobs across multiple
>>>>>> nodes ? Will the kind of simulations we run benefit from multiple
>>> nodes ?
>>>>>>
>>>>>> - Would we benefit from adding GPU’s to the cluster ? If so, which
>>> ones ?
>>>>>>
>>>>>> We now have a cluster with 48 and 64 AMD Opteron cores x machine (4
>>>>>> processors x machine) and we run our gromacs simulations there. We
>>> don’t
>>>>>> use MPI because our jobs are mostly run in a single node. As I said,
>>> with
>>>>>> 48 or 64 cores x simulation in a single machine. So far, we’re quite
>>>>>> satisfied with the performance we get.
>>>>>>
>>>>>> Any advice will be greatly appreciated.
>>>>>>
>>>>>>
>>>>>> Best Regards,
>>>>>> D.
>>>>>> --
>>>>>> Gromacs Users mailing list
>>>>>>
>>>>>> * Please search the archive at
>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>>> posting!
>>>>>>
>>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>>
>>>>>> * For (un)subscribe requests visit
>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>>>> send a mail to gmx-users-request at gromacs.org.
>>>>>
>>>>>
>>>>> --
>>>>> Dr. Carsten Kutzner
>>>>> Max Planck Institute for Biophysical Chemistry
>>>>> Theoretical and Computational Biophysics
>>>>> Am Fassberg 11, 37077 Goettingen, Germany
>>>>> Tel. +49-551-2012313, Fax: +49-551-2012302
>>>>> http://www.mpibpc.mpg.de/grubmueller/kutzner
>>>>> http://www.mpibpc.mpg.de/grubmueller/sppexa
>>>>>
>>>>> --
>>>>> Gromacs Users mailing list
>>>>>
>>>>> * Please search the archive at
>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>>> posting!
>>>>>
>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>
>>>>> * For (un)subscribe requests visit
>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>>>> send a mail to gmx-users-request at gromacs.org.
>>>>>
>>>> --
>>>> Gromacs Users mailing list
>>>>
>>>> * Please search the archive at
>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>> posting!
>>>>
>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>
>>>> * For (un)subscribe requests visit
>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>> send a mail togmx-users-request at gromacs.org.
>>>
>>>
>>> --
>>> Dr. Carsten Kutzner
>>> Max Planck Institute for Biophysical Chemistry
>>> Theoretical and Computational Biophysics
>>> Am Fassberg 11, 37077 Goettingen, Germany
>>> Tel. +49-551-2012313, Fax: +49-551-2012302
>>> http://www.mpibpc.mpg.de/grubmueller/kutzner
>>> http://www.mpibpc.mpg.de/grubmueller/sppexa
>>>
>>> --
>>> Gromacs Users mailing list
>>>
>>> * Please search the archive at
>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>> posting!
>>>
>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>
>>> * For (un)subscribe requests visit
>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>> send a mail to gmx-users-request at gromacs.org.
>>>
>> --
>> Gromacs Users mailing list
>>
>> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.
>
>
> --
> Dr. Carsten Kutzner
> Max Planck Institute for Biophysical Chemistry
> Theoretical and Computational Biophysics
> Am Fassberg 11, 37077 Goettingen, Germany
> Tel. +49-551-2012313, Fax: +49-551-2012302
> http://www.mpibpc.mpg.de/grubmueller/kutzner
> http://www.mpibpc.mpg.de/grubmueller/sppexa
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.


More information about the gromacs.org_gmx-users mailing list