[gmx-developers] grompp .mdp processing
Sander Pronk
pronk at cbr.su.se
Mon Feb 23 08:53:09 CET 2009
On Feb 23, 2009, at 00:58 , Mark Abraham wrote:
> David van der Spoel wrote:
>> Mark Abraham wrote:
>>> Sander Pronk wrote:
>>>> Hi everybody,
>>>>
>>>> I've made some changes to grompp (not committed yet) that:
>>>>
>>>> - will allow the use of cpp-style #include and #define in .mdp
>>>> files (useful for setting up multiple similar simulations, but
>>>> also for tutorials).
>> I assume you have used the existing cpp library?
>>>
>>> That looks good to me. I also generate my .mdp files with scripts,
>>> but this feature would enable me to avoid that.
>> I'm not sure, nowadays one needs to do series of simulations
>> anyway, so scripting is a necessary evil.
>
> If #define is useful in the .mdp file, then the value of
> preprocessor variables must be being used - so probably that needs
> #ifdef and #if and such also. Anyway, if variable interpretation is
> being done, an .mdp file line like
>
> gen_seed = SEED_FROM_COMMAND_LINE
>
> enables a master script to call grompp with (say)
>
> grompp -DSEED_FROM_COMMAND_LINE=23441
>
> That adds a lot of complexity to the interpretation, however. Have I
> misunderstood Sander's intent with the use of #define?
That's one of the things that are possible - it's useful for free
energy simulations where the same simulation has to be run at
different lambdas.
The other nice thing is being able to #include, which enables the use
of system-specific 'standard includes' with good settings that remain
consistent from structure energy minimization to production run.
>
>
>>>
>>>> - allows multiple assignments of .mdp parameters, through
>>>> overrides so that the last assignment is the one that counts.
>>>
>>> Doing so always and silently would be asking for trouble, however
>>> if they're only enabled with -m, and come with a note to the user
>>> when they've occurred, that should be useful in a few corner cases.
>> And it would reverse the current policy, that is first option goes,
>> implying that one may get different results with the same input.
>> Further, I feel a bit uncomfortable with extending the mdp files
>> further, because we should rather move away from the endless list
>> of options. I haven't thought this trough, but I would prefer to
>> move to a slightly more complex format that, however, is more user
>> friendly. Thinking of a folding editor file (xml springs to mind).
>> It should still be possible to generate using a script, but there
>> are xml bindings for Perl and Python as well.
>
> Merely wrapping the "endless list of options" into an endless list
> of XML tags (say), gets you something like the following XML
>
> <mdp>
> ...
> <temperaturecoupling type="berendsen">
> <groups>
> <group tau_t="0.1" ref_t="298">Protein</group>
> <group tau_t="0.1" ref_t="298">Non-Protein</group>
> </groups>
> </temperaturecoupling>
> ...
> </mdp>
>
> This is arguably
>
> * more or less complex (links between different options like
> tc_groups and tau_t and ref_t are now explicit and can be tested for
> by validating against the DTD before we try to parse it; but there's
> a bunch of formatting constraints) and
> * more or less user friendly (the data is now structured and the
> format adds meaning to content; but there's all this visual cruft
> and users might feel constrained to need to learn an XML editor;
> increasingly the latter will become a requisite IT skill).
>
> Trying to wrap #ifdef-style XML conditional structures into the
> above would be a bit ugly... say
>
> ...
> <variable name="USING_RF"/>
> ...
> <if test="//variable[@name='USING_RF']">
> <group tau_t="0.01" ref_t="298">Protein</group>
> </if>
> ...
>
> where that test construct is an XPath expression.
>
> Such XML is readily generatable with scripts. For example, in Perl
> using XML::Writer you get lines like
>
> if ( $do_rf ) {
> $do_rf = 'USING_RF';
> $tau_t = 0.01;
> $ref_t = 298;
> ...
> $xml->startTag("if", "test" => "//variable[\@name='${do_rf}']");
> foreach my $group ("Protein", "Non-Protein) {
> $xml->dataElement("group", $group, "tau_t" => $tau_t, "ref_t" =>
> $ref_t);
> }
> $xml->endTag("if");
> }
>
> if you wanted to leave the XML-level conditionals in place, or more
> likely
>
> if ( $do_rf ) {
> $tau_t = 0.01;
> $ref_t = 298;
> ...
> foreach my $group ("Protein", "Non-Protein) {
> $xml->dataElement("group", $group, "tau_t" => $tau_t, "ref_t" =>
> $ref_t);
> }
> }
>
> It is also straightforward to provide scripts for the .mdp <-> <mdp>
> conversions in the change-over period. One would use Perl for both,
> though technically XSLT is probably the best tool for the <mdp> -
> > .mdp conversion.
>
> I'm far from convinced that the increase in usability outweighs the
> need for people to learn how to manage all this XML stuff, however.
>
I think I agree; the most annoying aspect of the current .mdp files is
the lack of information hierarchy: it's hard to make out what's
important and what is not, and xml wouldn't be the ideal format for
what is essentially an options list. In many cases there is a sensible
default (like 'xyz' for pb, or 'no' for free_energy), where deviations
for the default are only needed if they're specifically needed (hence
the need for multiple assignment - together with #include it would
allow for better management of default settings).
Perhaps just adding some syntax to enforce related settings would make
the structure clearer:
free_energy
{
on = true
init_lambda = 0.1
delta_lambda = 0
soft-core
{
power = 1
alpha = 0.5
sigma = 0.3
}
}
or, if no free energy calculation is required:
free_energy
{
on = false
}
or no free energy section at all.
BTW, personally I don't see a reason why the parameter file shouldn't
be Turing-complete. :-)
More information about the gromacs.org_gmx-developers
mailing list