[gmx-developers] Reproducible runs with DLB

Fri Jul 22 17:51:07 CEST 2011

[ I'll try to put all answers in one message ]

On Fri, Jul 22, 2011 at 13:56, Berk Hess <hess at cbr.su.se> wrote:
> On 07/21/2011 11:47 PM, XAvier Periole wrote:
>> On Jul 21, 2011, at 3:43 PM, Shirts, Michael (mrs5pt) wrote:
>>>>> And an even more useful option would be to be able to write out
>>>>> conformations more often than in the original run. That would allow one
>>>>> run long simulations and go back and zoom in a particular time
>>>>> period of the simulation where some interesting event occurred.
>>> I'll add the plug that having this sort of functionality would be great,
>>> if
>>> possible. Could only really be done on the same machine, and may be
>>> impossible since on restart, the order of operations might be different,
>>> and
>>> chaos would get you very quickly, but it would be great!

Going back and getting more detailed data is also what I try to do.

> Any dynamic load balancing based on actual timings is never reproducible,
> unless you would store all the timings, which is very impractical.

Indeed, I was under the wrong impression that FLOPS and not timing was
the base for the default DLB calculations.

> One could load balance based on flops, as the GMX_DLB_FLOP env var
> does, which is only intended for debugging purposes. But that will not give
> good load balancing. Therefore it's not worth storing the complete dlb
> state.

I have made a short test with GMX_DLB_FLOP=1 and the balancing was
indeed worse than the default, but not by much; it's much closer to
'-dlb yes' than to '-dlb no'. I'm willing to trade a bit of speed for
reproducibility.

Please correct me if I'm wrong: when using GMX_DLB_FLOP=1 (no
randomness), DLB uses the load in dd_force_load() based on comm->load
which is set in dd_force_flop_start/stop() from values returned by
force_flop_count() which calculates them based on nrnb which contains
iteration counts returned from the nonbonded kernels. This explains
why the load balance is not precise: the operations done in other
parts of the code (f.e. bonded interactions) are not accounted for.
This also means that the variation of the FLOP based load is
deterministic, so if the DD state keeps being saved during the run,
one can go back and restart from one such state and be able to exactly
reproduce the DD evolution from that point on. This would also be
reproducible when running on a machine different from the one of the
original run - but of course with the same nr. of ranks.

> You could use the -dd option and the hidden options -ddcsx, -ddcsy and
> -ddcsz
> (see mdrun -h -hidden) to do static load balancing.

After I have realized that it's -hidden and not --hidden (too much GNU
naming convention in my brain ;-)), I have seen them too. Apologies to
Mark for needing to point me twice to that...

> A string is required
> with
> the relative sizes of the domains along each dimension, for example
> -ddcsx "1.2 0.9 0.9 1.2" for 4 domains along x.
> But the load balancing efficiency will depend very much on your system.

>From what I see in the code, these values are only read with '-dlb
no', which means that they would for a system which is mostly static,
but if there are some large structural changes - f.e. during protein
(un)folding - once atoms move significantly the distribution becomes
sub-optimal again.

Why are these -ddcs* options hidden ?

> As only a few steps are required for accurate timings, you can quickly
> try a few -dd and size settings to see if you can get reasonable
> performance.

Well, I can also try printing out cell sizes from a run with DLB enabled, no ?

Roland Schulz wrote:
> take a look at GMX_DLB_FLOP and GMX_DD_LOAD environment variables defined in domdec.c. They might help with what you are trying to do.

I don't quite understand how GMX_DD_LOAD would help; this only
participates in setting comm->bRecordLoad, with a default setting of 1
anyway. Did you mean GMX_DD_DUMP or GMX_DD_DUMP_GRID by any chance ?
Anyway, thanks for pointing me in that direction.

Cheers,
Bogdan