# [gmx-users] How does g_msd calculates MSD?

Javier Cerezo jcb1 at um.es
Mon Oct 18 10:03:19 CEST 2010

```Hello Julian.

I've not gone though the C-code neither. However, from what I have read
and understood about MSD calculation, both calculations you proposed
should not give the same results. This is the way it is calculated.

1. You take different starting times over tyour trajectory, every
-trestart picoseconds, from -b (let's take 0ps in this explanation) to -e.

2. From every restarting point (tr) you calculate the MSD, by
calculating the diplacement of atomic positions compared to the position
at the corresponding tr.

3. With the previous result you'll have a lot of MSD curves,
corresponding to each restating point, e.g. one where t0=0ps, other
t0=0+trestart, and so on. Actually, you are refering all curves to its
corresponding t0, so you have all of them starting at 0.

4. The average of all of these curves is done. With that you obtain
properly averaged results (which is essential for a good MSD
calculation). However, note that only the restarting point at 0ps of
your original trayectory (or the -b time you selected) will contain all
the points of the curve since, for example, the last restating point wil
conntribute with the few points remaining to the end of the simulation
(-e time). For that reason, the average is made with less points as the
time from the reference is increased, and at the end of the curve, the
results are not well averaged any more.

Hope that helps.

Javier

El 15/10/10 23:58, Julian escribió:
> Hi everybody,
>
> We're trying to understand how gromacs g_msd calculates MSD (without
> reading through the C code, I really don't know much about it, and
> there are hundreds of lines).
> (What we want to do is to calculate MSD properly in our supercooled
> water simulations, i.e., to choose correctly the values for -b, -e
> and -trestart in order to get the longest, reliable MSD data)
>
>
> We found a mail from Gaurav Goel (quoted below) which gives a
> reasonable explanation on the topic.
>
> If we understand this correctly, then the output of
>
> g_msd -b 50 -e 100 -trestart 50
>
> should be the same as the second half (with the proper shifting) of
> the output of
>
> g_msd -b 0 -e 100 -trestart 50
>
> But it is not, as anyone can verify with any simulation.
> ...... what are we missing?
>
> Julian
>
> --
> Julian Gelman Constantin
> Department of Inorganic, Analytic and Chemical Physics (DQIAQF)
> School of Exact and Natural Sciences
> University of Buenos Aires, Argentina
>
>
> ---
> Gaurav Goel gauravgoeluta at gmail.com
> Tue Jul 13 00:56:02 CEST 2010
>
> On Mon, Jul 12, 2010 at 5:22 PM, Ricardo Cuya Guizado
> <rcuyag at hotmail.com> wrote:
> > Dear gromacs users
> > I make a MD of 20 ns of a solute in water
> > With the g_msd program the msd vs the time was obtained
> > In the plot, I observed a linear behaviour of the MSD from 0 to 15
> ns and a
> > plateau with no linear tendence at the last 5 ns arpoximately.
> > In order to know if the observed plateau was due to the data or is
> due to
> > the way as the algorithm process the data, I divided the MD in two
> > trajectories and obtained the msd for each one.
> > From 0-10ns, the plot observed shows a linear tendence en the begining
> and a
> > plateau with no linear tendence from 9 to 10 ns.
> > From 10-20 ns the plot observed was linear from 10 to 18 ns and not
> linear
> > at the last, the same plateau was observed.
> > Comparing the plots there are not equivalent,.
> > Why g_msd produces a non linear plot at the last of the calculation
> and the
> > plateau is ever produces.
> > Somebody will explain the way as the g_msd algorithm work? and why
> the plot
> > are no equivalent or why there must be equivalent?
>
> I will explain how the g_msd algorithm works and hopefully that will
> answer all your questions above. What you see in the output file is
> average-MSD versus time. This average is done over all the particles
> in the group you selected and over multiple time origins (this last
> option can be selected with the -trestart parameter). Also, time in
> column 1 is time difference from the start of your trajectory to
> current time.
>
> E.g., let's say you collected a trajectory over 5 time units and
> choose -trestart=1 time unit and -dt=1 time unit.
>
> dt=1 means you'll have 6 configurations for your analysis (including
> the configuration at t=0).
>
> trestart=1 means you'll have 5 distinct trajectories for your analysis:
> Trajectory 1: 0-5
> T2: 1-5
> T3: 2-5
> T4: 3-5
> T5: 4-5
>
> Now you can notice that all 5 trajectories contribute to the average
> MSD after 1 time unit (T1-T5), 4 trajectories contribute to the
> average MSD after 2 time units (T1-T4),  3 trajectories to the average
> MSD after 3 time units (T1-T3), ...., and only one trajectory to the
> MSD after 5 time units (T1). Of course, this assumes that trestart is
> large enough that all all these trajectories are uncorrelated.
>
> So, it's clear that longer the time interval at which you want to
> evaluate the MSD lesser the number of trajectories used to evaluate
> it...and hence, higher error in MSD values at longer times. That might
> explain deviation from linear behaviour at long times.
>
> However, you must be careful in interpreting the MSD data and I
> recommend reading some literature on the subject. A plateau in MSD
> versus time data might also signify what is called cage motion, in
> which a particle or atom is trapped by the surrounding particles and
> is not able to move out of that hole on the simulation time scale. If
> you want you can send me your MSD versus time data along with some
> information on your system (such as potentials, density, temperature
> etc.) and I can let you know my comments.
>
> Few words of caution:
> Make sure that the center of mass of your particle (or atom or
> molecule) is diffusing several particle diameters. Also, make sure
> that you're calculating the self-diffusion coefficient  by fitting a
> straight line to the linear region of MSD versus time data. You can
> either modify the -beginfit and -endfit options... or calculate the
> slope of the MSD versus time data using some other software (e.g.,
> gnuplot, excel, etc.). If you're doing the latter you'll need to take
> a look at the code in gmx_msd.c to know how the diffusion coefficent
> is calculated from the slope of MSD versus time data (tog et correct
> units, use proper scaling factors, etc.).
>
> I hope that helped.
>
> -Gaurav
> >
> >
> >
> >
> > Regards
> > Ricardo Cuya
>

--
Javier CEREZO BASTIDA