[gmx-users] How does g_msd calculates MSD?

Julian jgelcon at yahoo.com.ar
Fri Oct 15 23:58:18 CEST 2010

Hi everybody,

We're trying to understand how gromacs g_msd calculates MSD (without reading 
through the C code, I really don't know much about it, and there are hundreds of 
(What we want to do is to calculate MSD properly in our supercooled water 
simulations, i.e., to choose correctly the values for -b, -e  and -trestart in 
order to get the longest, reliable MSD data)

We found a mail from Gaurav Goel (quoted below) which gives a reasonable 
explanation on the topic.

If we understand this correctly, then the output of

g_msd -b 50 -e 100 -trestart 50

should be the same as the second half (with the proper shifting) of  
the output of

g_msd -b 0 -e 100 -trestart 50

But it is not, as anyone can verify with any simulation.
...... what are we missing?

Thanks in advance,

Julian Gelman Constantin 
Department of Inorganic, Analytic and Chemical Physics (DQIAQF) 
School of Exact and Natural Sciences 
University of Buenos Aires, Argentina

Gaurav Goel gauravgoeluta at gmail.com
Tue Jul 13 00:56:02 CEST 2010

On Mon, Jul 12, 2010 at 5:22 PM, Ricardo Cuya Guizado
<rcuyag at hotmail.com> wrote:
> Dear gromacs users
> I make a MD of 20 ns of a solute in water
> With the g_msd program the msd vs the time was obtained
> In the plot, I observed a linear behaviour of the MSD from 0 to 15 ns and a
> plateau with no linear tendence at the last 5 ns arpoximately.
> In order to know if the observed plateau was due to the data or is due to
> the way as the algorithm process the data, I divided the MD in two
> trajectories and obtained the msd for each one.
> From 0-10ns, the plot observed shows a linear tendence en the begining
and a
> plateau with no linear tendence from 9 to 10 ns.
> From 10-20 ns the plot observed was linear from 10 to 18 ns and not linear
> at the last, the same plateau was observed.
> Comparing the plots there are not equivalent,.
> Why g_msd produces a non linear plot at the last of the calculation and the
> plateau is ever produces.
> Somebody will explain the way as the g_msd algorithm work? and why the plot
> are no equivalent or why there must be equivalent?

I will explain how the g_msd algorithm works and hopefully that will
answer all your questions above. What you see in the output file is
average-MSD versus time. This average is done over all the particles
in the group you selected and over multiple time origins (this last
option can be selected with the -trestart parameter). Also, time in
column 1 is time difference from the start of your trajectory to
current time.

E.g., let's say you collected a trajectory over 5 time units and
choose -trestart=1 time unit and -dt=1 time unit.

dt=1 means you'll have 6 configurations for your analysis (including
the configuration at t=0).

trestart=1 means you'll have 5 distinct trajectories for your analysis:
Trajectory 1: 0-5
T2: 1-5
T3: 2-5
T4: 3-5
T5: 4-5

Now you can notice that all 5 trajectories contribute to the average
MSD after 1 time unit (T1-T5), 4 trajectories contribute to the
average MSD after 2 time units (T1-T4),  3 trajectories to the average
MSD after 3 time units (T1-T3), ...., and only one trajectory to the
MSD after 5 time units (T1). Of course, this assumes that trestart is
large enough that all all these trajectories are uncorrelated.

So, it's clear that longer the time interval at which you want to
evaluate the MSD lesser the number of trajectories used to evaluate
it...and hence, higher error in MSD values at longer times. That might
explain deviation from linear behaviour at long times.

However, you must be careful in interpreting the MSD data and I
recommend reading some literature on the subject. A plateau in MSD
versus time data might also signify what is called cage motion, in
which a particle or atom is trapped by the surrounding particles and
is not able to move out of that hole on the simulation time scale. If
you want you can send me your MSD versus time data along with some
information on your system (such as potentials, density, temperature
etc.) and I can let you know my comments.

Few words of caution:
Make sure that the center of mass of your particle (or atom or
molecule) is diffusing several particle diameters. Also, make sure
that you're calculating the self-diffusion coefficient  by fitting a
straight line to the linear region of MSD versus time data. You can
either modify the -beginfit and -endfit options... or calculate the
slope of the MSD versus time data using some other software (e.g.,
gnuplot, excel, etc.). If you're doing the latter you'll need to take
a look at the code in gmx_msd.c to know how the diffusion coefficent
is calculated from the slope of MSD versus time data (tog et correct
units, use proper scaling factors, etc.).

I hope that helped.

> Regards
> Ricardo Cuya

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20101015/9071c5e9/attachment.html>

More information about the gromacs.org_gmx-users mailing list