[gmx-developers] grompp .mdp processing

Mon Feb 23 08:53:09 CET 2009

On Feb 23, 2009, at 00:58 , Mark Abraham wrote:

> David van der Spoel wrote:
>> Mark Abraham wrote:
>>> Sander Pronk wrote:
>>>> Hi everybody,
>>>>
>>>> I've made some changes to grompp (not committed yet) that:
>>>>
>>>> - will allow the use of cpp-style #include and #define in .mdp  
>>>> files (useful for setting up multiple similar simulations, but  
>>>> also for tutorials).
>> I assume you have used the existing cpp library?
>>>
>>> That looks good to me. I also generate my .mdp files with scripts,  
>>> but this feature would enable me to avoid that.
>> I'm not sure, nowadays one needs to do series of simulations  
>> anyway, so scripting is a necessary evil.
>
> If #define is useful in the .mdp file, then the value of  
> preprocessor variables must be being used - so probably that needs  
> #ifdef and #if and such also. Anyway, if variable interpretation is  
> being done, an .mdp file line like
>
> gen_seed = SEED_FROM_COMMAND_LINE
>
> enables a master script to call grompp with (say)
>
> grompp -DSEED_FROM_COMMAND_LINE=23441
>
> That adds a lot of complexity to the interpretation, however. Have I  
> misunderstood Sander's intent with the use of #define?

That's one of the things that are possible - it's useful for free  
energy simulations where the same simulation has to be run at  
different lambdas.

The other nice thing is being able to #include, which enables the use  
of system-specific 'standard includes' with good settings that remain  
consistent from structure energy minimization to production run.

>
>
>>>
>>>> - allows multiple assignments of .mdp parameters, through  
>>>> overrides so that the last assignment is the one that counts.
>>>
>>> Doing so always and silently would be asking for trouble, however  
>>> if they're only enabled with -m, and come with a note to the user  
>>> when they've occurred, that should be useful in a few corner cases.
>> And it would reverse the current policy, that is first option goes,  
>> implying that one may get different results with the same input.
>> Further, I feel a bit uncomfortable with extending the mdp files  
>> further, because we should rather move away from the endless list  
>> of options. I  haven't thought this trough, but I would prefer to  
>> move to a slightly more complex format that, however, is more user  
>> friendly. Thinking of a folding editor file (xml springs to mind).  
>> It should still be possible to generate using a script, but there  
>> are xml bindings for Perl and Python as well.
>
> Merely wrapping the "endless list of options" into an endless list  
> of XML tags (say), gets you something like the following XML
>
> <mdp>
>  ...
>  <temperaturecoupling type="berendsen">
>    <groups>
>      <group tau_t="0.1" ref_t="298">Protein</group>
>      <group tau_t="0.1" ref_t="298">Non-Protein</group>
>    </groups>
>  </temperaturecoupling>
>  ...
> </mdp>
>
> This is arguably
>
> * more or less complex (links between different options like  
> tc_groups and tau_t and ref_t are now explicit and can be tested for  
> by validating against the DTD before we try to parse it; but there's  
> a bunch of formatting constraints) and
> * more or less user friendly (the data is now structured and the  
> format adds meaning to content; but there's all this visual cruft  
> and users might feel constrained to need to learn an XML editor;  
> increasingly the latter will become a requisite IT skill).
>
> Trying to wrap #ifdef-style XML conditional structures into the  
> above would be a bit ugly... say
>
> ...
> <variable name="USING_RF"/>
> ...
> <if test="//variable[@name='USING_RF']">
>  <group tau_t="0.01" ref_t="298">Protein</group>
> </if>
> ...
>
> where that test construct is an XPath expression.
>
> Such XML is readily generatable with scripts. For example, in Perl  
> using XML::Writer you get lines like
>
> if ( $do_rf ) {
>  $do_rf = 'USING_RF';
>  $tau_t = 0.01;
>  $ref_t = 298;
> ...
>  $xml->startTag("if", "test" => "//variable[\@name='${do_rf}']");
>  foreach my $group ("Protein", "Non-Protein) {
>    $xml->dataElement("group", $group, "tau_t" => $tau_t, "ref_t" =>  
> $ref_t);
>  }
>  $xml->endTag("if");
> }
>
> if you wanted to leave the XML-level conditionals in place, or more  
> likely
>
> if ( $do_rf ) {
>  $tau_t = 0.01;
>  $ref_t = 298;
> ...
>  foreach my $group ("Protein", "Non-Protein) {
>    $xml->dataElement("group", $group, "tau_t" => $tau_t, "ref_t" =>  
> $ref_t);
>  }
> }
>
> It is also straightforward to provide scripts for the .mdp <-> <mdp>  
> conversions in the change-over period. One would use Perl for both,  
> though technically XSLT is probably the best tool for the <mdp> - 
> > .mdp conversion.
>
> I'm far from convinced that the increase in usability outweighs the  
> need for people to learn how to manage all this XML stuff, however.
>

I think I agree; the most annoying aspect of the current .mdp files is  
the lack of information hierarchy: it's hard to make out what's  
important and what is not, and xml wouldn't be the ideal format for  
what is essentially an options list. In many cases there is a sensible  
default (like 'xyz' for pb, or 'no' for free_energy), where deviations  
for the default are only needed if they're specifically needed (hence  
the need for multiple assignment - together with #include it would  
allow for better management of default settings).

Perhaps just adding some syntax to enforce related settings would make  
the structure clearer:

free_energy
{
	on = true
	init_lambda = 0.1
	delta_lambda = 0
	soft-core
	{
		power = 1
		alpha = 0.5
		sigma = 0.3
	}
}

or, if no free energy calculation is required:

free_energy
{
	on = false
}

or no free energy section at all.

BTW, personally I don't see a reason why the parameter file shouldn't  
be Turing-complete.  :-)