[gmx-users] Some Scaling of 5.0 Results

Tue Sep 23 02:39:25 CEST 2014

Szilard, thanks for the comments and thoughts.

> What was your benchmarking procedure on core counts that represent
> less than a full socket?

As a starting point, I simply used the same settings for everything.

> Besides the thread affinity issue mentioned by Mark, clock frequency
> scaling (boost) can also distort performance plots. You will observe
> artificially high performance on small core counts making the scaling
> inherently worse - unless this is explicitly turned off in the
> BIOS/firmware. This can be further enhanced by the low cache traffic
> when only partially using a multicore CPU. These are both artificial
> effects that you won't see in real-world runs - unless leaving a bunch
> of cores empty.
> There is no single "right way" to avoid these issues, there certainly
> are ways to present data in a less than useful manner - especially
> when it comes to scaling plots. A simple way of avoiding such issues
> and eliminating the potential for incorrect strong scaling plots is to
> start from at least a socket (or node). Otherwise, IMO the <8 threads
> data points on your plot make sense only if you show strong scaling to
> multiple sockets/nodes by using the same amount of threads per socket
> as you started with, leaving the rest of the cores free.

Fair enough.

> What run configuration did you use for Verlet on single node? With the
> Verlet scheme no domain decomposition, that is multithreding-only
> (OpenMP) runs are typically more efficient than using
> domain-decomposition. This is typically true up to a full socket and
> quite often even across two Intel sockets.

Same used as for all other options, it had DD on.  Added to the list.

> Did you tune the PME performance, i.e. the number of separate PME
> ranks?

Not as of yet.  With 4.6, my system, this same cluster etc I found that using -npme 0 provided the best scaling and through put speed so that is what I used here as a starting point.

> Did you use nstlist=40 for all Verlet data points? That may not be
> optimal across all node counts, especially on less than two nodes, but
> of course that's hard to tell without trying!

Yes, used the same setting across all.  Will add that to the options to explore.

> Finally, looking at octanol Verlet plot, especially in comparison with
> the water plot, what's strange is that the scaling efficiency is much
> worse than with water and varies quite greatly between neighboring
> data points. This indicates that something was not entirely right with
> those runs.

Yes, I suspected as much too, but due to time constraints have not gone back to look at them again, yet.  I suspect there is probably a load issue that is skewing those results down, they got split across nodes or something like that.  Supercomputer is currently rather heavily loaded.

Thanks for the comments, and I will get back with some more thorough results next month.

If there is anything else that you would be interested in knowing, options to look at and how it impacts things, let me know and I will look into those too.

Catch ya,

Dr. Dallas Warren
Drug Delivery, Disposition and Dynamics
Monash Institute of Pharmaceutical Sciences, Monash University
381 Royal Parade, Parkville VIC 3052
dallas.warren at monash.edu
+61 3 9903 9304
---------------------------------
When the only tool you own is a hammer, every problem begins to resemble a nail.