[gmx-users] How much data to choose eg for an rmsd matrix

Marc Baaden baaden at smplinux.de
Mon Dec 16 10:14:45 CET 2002


Short version: when calculating an rmsd matrix for a protein
trajectory, how many frames should one take into account (eg
skip 5ps or 10ps of the saved frames ?)

Longer version:
This is more an application or rule-of-thump like question.
I was about to calculate an rmsd matrix for a 10 ns trajectory
with 1 frame per ps, eg 10000 frames.

For one this would take quite a while to build the 10k x 10k
matrix, and for two, I think that it would bring in more noise
than information.

Eg, as motion in nearby frames is most probably correlated,
structures will be quite similar, and doing the matrix for
those frames would just show the similarity of those nearby
frames, which is not what I want.

I am rather interested in how close/similar uncorrelated
frames are.

The characteristic time for correlation obviously depends on the
type of system/molecule one looks at. In my case it concerns
proteins, let's say from 200 to 700 amino acids.

Is there a rule-of-thumb how many frames I should skip in the
calculation of the matrix ? Or another suggestion on how to do
this ? My (intuitive) bet would be that correlation at least lasts
for a couple of picoseconds, so maybe taking steps of 5 ps or 10 ps
would be good ?

Does it make sense to try and calculate an error for that kind
of analysis results ? (eg starting at 1 ps, stepping 5 ps; then
starting at 2 ps, stepping 5ps; .. and then averaging ?)

Thanks in advance,

