[gmx-users] Setting Up "Equivalent" Runs With -notunepme

Sat Mar 18 03:08:42 CET 2017

Hi everyone,

 From my understanding of the release notes for the gromacs-2016 
versions, pretty much all of the settings that are tuned by tunepme can 
now be modified manually (modifying some of these settings still caused 
errors all the way up to gromacs-5.1.7).

I'm not trying to make runs that are truly reproducible nor am I even 
going for something that is perfectly equivalent. However, for both the 
sake of performance and accuracy, I'm trying to avoid situations where, 
for example, gromacs automatically runs two replicates of the same 
system on the same compute node with automatically chosen coulomb 
cutoffs of 1.3 and 2.0+ nm (which has in fact happened to me).

If I'm doing this manual tuning, I'd like to make sure that I'm doing it 
properly, so I wanted to see if there is anyone who might catch any 
obvious mistakes that I'm making. I *believe* that the options that need 
to be set manually are:

1. rlist (though I've never had an issue with gromacs picking "strange" 
values for this)

2. nstlist (gromacs will almost always pick the same values for this 
unless the same system is run on drastically different hardware)

3. rcoulomb (this is the main thing that I'm trying to make "equivalent")

4. fourierspacing (manually scaled by the same amount as rcoulomb)

5. -notunepme as an option to mdrun

Are there any other significant settings that I'm missing? In general, 
before running my production runs I perform several benchmarks where I 
allow gromacs to do its own thing with multiple different settings. It 
will usually choose several different optimizations. Then I'll compare 
the benchmarks to see which set of values actually perform the best. 
What I'd like to do is then use those automatically chosen optimizations 
for all of my future runs for that system. I think that I remember at 
one point finding at least one other setting that was automatically 
changed during the tuning. I might be wrong, but now I'm a bit paranoid 
that I'm missing something important.

Since I think the general consensus is that even with the tuning on, the 
runs are "equivalent enough", I expect that at least some answers will 
just be to leave things as they are without doing any manual tuning. 
However, here are a couple things that happen to me on a somewhat 
regular basis:

1. I do several benchmark runs before I start any major production runs, 
so I usually have a general idea of what the optimal values should be 
for maximum performance. Gromacs sometimes chooses values that are off 
by quite a bit leading to losses of performance on the order of 10-20%.

2. Gromacs may optimize two replicates with different values on 
equivalent hardware leading one run to finish days before another. This 
can be inconvenient for my work schedule.

3. Hot and cold machines do NOT have the same optimal settings. Heat is 
rarely a limiting factor for the CPUs in our nodes, but it is always a 
limiting factor for our GPUs. This leads to the tuning oftentimes being 
slightly imbalanced as the GPUs eventually reach a temperature that 
reduces their clock speeds. The opposite may be true for other users.

4. Sometimes the most efficient setup is to run multiple jobs on a 
single compute node if that node has specs that are overkill for a 
single job. These multiple jobs may not even be the same system. It is 
VERY difficult to get gromacs to properly optimize multiple jobs on the 
same node. This sometimes leads to massive performance losses on the 
order of 50%. I used to sometimes "trick" gromacs in to picking more 
optimal settings by starting one job, starting a second job after the 
optimization period, restarting the first job after the optimization 
period, etc. However, this obviously becomes very complicated for more 
than two runs per node. It also makes it impossible to start one run 
after another has been going.

Thanks for any comments and corrections,

Tim