[gmx-users] Performance of beowulf cluster

Szilárd Páll pall.szilard at gmail.com
Tue Aug 12 02:30:10 CEST 2014


On Tue, Aug 5, 2014 at 5:21 PM, Abhi Acharya <abhi117acharya at gmail.com> wrote:
> Thank you Mirco and Szilard,
> With regards to the GPU system, I have decided on a Xeon E5-1650 v2 system
> with GEForce GTX -780 Ti GPU for equilibration and production runs with
> small systems. But for large systems or REMD simulations, I am a bit
> skeptical on banking on GPU systems.

How would you define "large"? A 100k protein system (PME, rc=0.9,
vsites 5 fs) will run >50 ns/day on a box like the above, but ~5x (!)
slower on an FX 8350 without a GPU! Some numbers I had around plus the
CPU ones I got from some quick-and-dirty benchmark runs:

i7 3930K +/- K20: 52/17.5 ns/day
FX 8350 +/- GTX 580 : 31.5/10.1 ns/day

I think the above Xeon may not be the best deal, it is based on the
now outdated Sandy Bridge architecture, an i7 4930K will be around 10%
faster; depending on your timeline the Haswell 5930K (released this
fall) will be *far* better than either.

Additionally, unless the AMD CPUs are very cheap, my guess is that
you'll better performance per buck (and per W too) with mid-range
Haswells like i5 4670/4690.

> Any pointers as to what would be the
> minimum configuration required for REMD simulations on say a 50 K atom
> protein sampled for 100 different temperatures? I am open to all possible
> options in this regard (obviously a little cost effectiveness does not harm
> ).

For a 100-way multi-run you'll need at least 100 cores and even with
fast ones you won't get too good performance - especially without
GPUs. In fact, if you are planning to do REMD runs, you can make great
use of GPUs! The aggregate performance of independent runs sharing a
GPU (but not CPU cores) can be much greater than what you can achieve
with a single run on the same GPU-CPU pair; for an example, see the
second plot on this poster: http://goo.gl/2xH52y

Hardware-wise, with mid-range desktop Haswell CPUs, I guess you can
get about 25 ns/day and ~75 ns/day if you add a (fast enough) GPU; you
can bump this by another ~20% (aggregate) if you run 2-4 independent
runs per node. NOTE: I can't vouch for any of these numbers, they're
guesstimates.

> Also, would investing in a *good* 40 Gigabit ethernet network ensure good
> performance if we later plan to more nodes to the cluster.

As I wrote before, I personally don't have experience with MD over
Ethernet. Traditionally Ethernet has been always considered borderline
useless, but with the RDMA protocol iWARP over 10 and 40 GB Ethernet,
I've seen people report decent results.

> Regards,
> Abhishek
>
>
> On Tue, Aug 5, 2014 at 5:46 PM, Szilárd Páll <pall.szilard at gmail.com> wrote:
>
>> Hi,
>>
>> You need fast network to parallelize across multiple nodes. 1 Gb
>> ethernet won't work well and even even 10/40 Gb ethernet needs to be
>> of good quality; you'd likely need to buy separate adapters, the
>> on-board ones won't perform well. I posted some links to the list
>> related to this a fed days ago.
>>
>> The AMD FX dekstop hardware you mention is OK, but I'm not sure that
>> it's gives the best performance/price. If you find (very) discounted
>> Sandy Bridge-E (i7 3930K) or the cheaper Haswells like i5 4670 may
>> actually provide better prerformance for the money. Ivy Bridge-E or
>> Haswell-E as Mirco suggests are the best single-socket workstation
>> options, but those are/will be pretty expensive.
>>
>> Finally, unless you have a good reason not to, you should not just
>> consider GPUs, but consider what CPU/platform works best with GPUs.
>>
>> Cheers,
>> --
>> Szilárd
>>
>>
>> On Tue, Aug 5, 2014 at 7:01 AM, Abhishek Acharya
>> <abhi117acharya at gmail.com> wrote:
>> > Hello gromacs users,
>> > I am planning on investing in a beowulf cluster with 6 nodes (48 cores)
>> each with AMD Fx 8350 processor, 8 GB memory  connected by 1 Gigabit
>> Ethernet switch. Although I plan to add more cores to this cluster later
>> on, what is the max performance expected from the current specs for a
>> 100,000 atom simulation box ? Also, is it better to invest in a  single 48
>> core server ? The cluster system can be set up at almost half the price of
>> a 48 core server, but do we lose out on performance in the process??
>> >
>> > Regards,
>> >
>> > Abhishek Acharya
>> > --
>> > Gromacs Users mailing list
>> >
>> > * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> posting!
>> >
>> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >
>> > * For (un)subscribe requests visit
>> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> send a mail to gmx-users-request at gromacs.org.
>> --
>> Gromacs Users mailing list
>>
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> send a mail to gmx-users-request at gromacs.org.
>>
>
>
>
> --
> Abhishek Acharya
> Senior Research Fellow
> Gene Regulation Laboratory
> National Institute of Immunology
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.


More information about the gromacs.org_gmx-users mailing list