[gmx-users] Configuration of new HPC Cluster for GROMACS and NAMD

Mon Jul 31 13:30:45 CEST 2017

Hello everybody,

I'm currently highly involved in the planning of a new HPC cluster for MD simulations. The main applications are GROMACS (sometimes in conjuction with PLUMED) and NAMD. Typical simulations are about 100,000 atoms up to 300,000 at max. So we got a quote from a manufacturer and I have a few questions regarding the details that probably can only be answered with some experience...and that's why I'm asking here:

1. The compute nodes contain 2x Intel Xeon Broadwell-EP E5-2680v4 (each 14 cores, 2.4GHz base clock) and 8x GTX 1080 Ti. That is very GPU focused and it is difficult to find GPU benchmarks of recent GROMACS versions. I found this one from Nvidia which seems to suggest that it only scales well up to two GPUs: https://www.nvidia.com/object/gromacs-benchmarks.html It is rather likely that we will end up using AMD EPYC CPUs once they are out so that we probably will have more cores, more PCIe lanes, and a higher memory bandwidth in the end. But even in that case 8 GPUs seem too many to me. What do you think?

2. I know this is a GROMACS mailing list but if someone happens to know NAMD well: The same question as above just for NAMD. From what I read in the NAMD 2.13 release notes using many GPUs might make sense because almost everything seems to be offloaded to the GPU now and the CPU load has been reduced dramatically: "GPU-accelerated simulations less limited by CPU performance. Contributed by Antti-Pekka Hynninen. Bonded forces and exclusions are now offloaded to the GPU by default. (May be disabled with "bondedCUDA 0". Note that this is a bit flag so default is "bondedCUDA 255"; setting "bondedCUDA 1" will offload only bonds, not angles, dihedrals, etc.) Removes all load-balanced work from the CPU, leaving only integration and optional features. The only benefit is for machines that were limited by CPU performance, so a high-end dual-socket workstation with an older GPU may see no benefit, while a single-socket desktop with the latest 1080 Ti will see the most."

3. Currently the quote contains Infiniband. However, given the computational power of a single node (even if just 2 GPUs are used) I could imagine that the simulation (which will not be excessive in size...we only plan to use simulations up to 300,000 atoms) would not scale well to two nodes or even more. If this is the case we could drop Infiniband and invest the money into more nodes. What do you think about this?

4. Currently the quote contains 64GB of RAM for each compute node. To me that seems very high as from my experience MD simulations only take up a few GB at most for "reasonable" system sizes (we only plan to use simulations up to 300,000 atoms). Using 32GB instead could also save some money. But I don't know about PLUMED...maybe that needs more RAM? What do you think about this?

Any help would be highly appreciated,

Alexander