[gmx-developers] g_decorr and Multi-I/O

Thu Nov 15 03:49:28 CET 2007

Hi,

I'm planning to implement Lyman & Zuckerman's method for determining a 
trajectory "decorrelation time" (recently published in J. Phys. Chem. B 
2007, 111, 12876-12882). I thought I'd make an initial stab at the 
Multi-I/O developer project 
(http://wiki.gromacs.org/index.php/Category:Development) while I was at 
it. This aims to port the kind of functionality present in trjcat and 
eneconv, i.e. accepting multiple trajectory files as input.

As a model, so far I've been looking at trjcat. The first complication 
here is that the demuxing functionality built into trjcat has nothing to 
do with the "normal" function. The clue here is that after the normal 
initial file processing, there's an "if(bDeMux)" in trjcat.c that 
controls the rest of the execution. So, I think this means there should 
be a trjdemux utility. :-)

Anyway, back on point, there are two ways you could do this kind of 
Multi-I/O thing : either

* read all of the relevant frames of the trajectory into memory (caveat 
-b, -e and -dt) and then process, or

* loop over files and frames in them, accounting for -b, -e and -dt on 
the fly, and using a callback function on each frame to actually do the 
required work.

(You could implement the former as a special case of the latter, of course.)

I favour the second approach, but the routine that does the looping will 
necessarily be ignorant of what the callback will be doing with the 
frames, so you may need to pass data through the looping routine to the 
callback. This kind of thing can be done in two ways.

1) The callback function takes a (void *) argument which it typecasts 
back to a (struct *) to get the data it needs. This argument will have 
to be passed to the looping routine, so it can pass it on. This probably 
means creating a new struct for most callbacks, and ugly debugging 
problems if you ever untypecast mismatching objects. It does mean you 
can have libraries of callback functions, though. This sort of thing 
gets much more elegant if one was to move to C++ at some hypothetical 
time in the future.

2) The callback function is declared in the scope that already has the 
data it needs, viz

typedef int t_trxread_callback();
int do_multi_trxread(t_trxread_callback callback);

int my_analysis_function() {
   int local_var;

   int my_analysis_callback() {
     fprintf(stderr, "local_var is %d\n", local_var);
     return 1;
   }

   do_multi_trxread(my_analysis_callback);
}

I'm not sure right now whether this means that the call to 
do_multi_trxread also has to come from that scope, but it would usually 
be convenient to do so. It does make it impossible to re-use callback 
functions with recreating a new scope, however.

You could use both approaches - have a (void *) argument in the callback 
  function type, which you can pass as NULL to do_multi_trxread if your 
callback either needs no data or is getting that data through its scope. 
My preference is to encourage use of the first approach, but for some 
one-off dirty jobs you might prefer the second in practice.

Does anybody have any feedback for me on the above design choices?

Mark