[gmx-users] Some Scaling of 5.0 Results

Mark Abraham mark.j.abraham at gmail.com
Tue Sep 23 07:39:37 CEST 2014

On Tue, Sep 23, 2014 at 1:32 AM, Dallas Warren <Dallas.Warren at monash.edu>

> Mark,
> > Thanks for sharing. Since the best way to write code that scales well is
> to
> > write code that runs slowly, we generally prefer to look at raw ns/day.
> > Choosing between perfect scaling of implementation A at 10 ns/day and
> > imperfect scaling of implementation B starting at 50 ns/day is a
> > no-brainer, but only if you know the throughput.
> Though from an end user point of view, CPU hours are things not to be
> wasted, so efficiency tends to be more important.

If the objective is maximizing science per CPU cycle, that's typically
proportional to ns/day. On the same amount of hardware, running at 100%
efficiency for 40 ns/day is worse than 80% efficiency for 50 ns/day. If you
can afford to wait for the result, you could maybe run the latter on less
hardware at 100% efficiency for 62.5 ns/day. You have to know the
throughput to choose well between any of the above - and once you know
where it is maximized, you don't really need to know the efficiency.

> I'd also be very suspicious of your single-core result, based on your
> > super-linear scaling. When using a number of cores smaller than a node,
> you
> > need to take care to pin that thread (mdrun -pin on), and not having
> other
> > processes also running on that core/node. If that result is noisy because
> > it ran into different other stuff over time, then every "scaling" data
> > point is affected.
> Had been wondering about that for awhile and had not looked into it.  Did
> think that might be the reason for the super scaling, will get back to
> testing that (and all the other options) once this quarter is over.  As I
> am sure you are aware, end of quarter equals very heavy loads on clusters
> as people try to use up their quota.

Sure. However, there's only your own conscience stopping you asking for a
whole node and using only one core, for example.


> Also, to observe the scaling benefits of the Verlet scheme, you have to
> get
> > involved with using OpenMP as the core count gets higher, since the whole
> > point is that it permits more than one core to share the work of a
> domain,
> > and the (short-ranged part of the) group scheme hasn't been implemented
> to
> > do that. Since you don't mention OpenMP, you're probably not using it ;-)
> Yep, that would be right.  Another option that I need to look into.
> This was meant to be a starting point, basic system, simple settings, then
> work through the other various options that I knew someone would pipe up
> with :) and that I had ignored.  Also need to get a script written to
> automate this scaling testing, making it easier and faster to test all the
> various options that can be used.
> > Similarly, the group scheme is unbuffered by default, so it's an
> > apples-and-oranges comparison unless you state what buffer you used
> > there.
> Catch ya,
> Dr. Dallas Warren
> Drug Delivery, Disposition and Dynamics
> Monash Institute of Pharmaceutical Sciences, Monash University
> 381 Royal Parade, Parkville VIC 3052
> dallas.warren at monash.edu
> +61 3 9903 9304
> ---------------------------------
> When the only tool you own is a hammer, every problem begins to resemble a
> nail.
> --
> Gromacs Users mailing list
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.

More information about the gromacs.org_gmx-users mailing list