[gmx-users] g_rdf and number of atoms to include

Enemark Soeren chees at nus.edu.sg
Thu Oct 22 13:29:20 CEST 2009

Thanks Mark and Omer for your comments - I have really learning a lot on
RDF today based on your input! 



From: gmx-users-bounces at gromacs.org
[mailto:gmx-users-bounces at gromacs.org] On Behalf Of Omer Markovitch
Sent: Thursday, October 22, 2009 3:49 PM
To: Discussion list for GROMACS users
Subject: Re: [gmx-users] g_rdf and number of atoms to include



On Thu, Oct 22, 2009 at 05:17, Enemark Soeren <chees at nus.edu.sg> wrote:

Ahh, now I understand - sorry, Omer!

No problem, glad to help.


	In fact, I have compared all three single hydrogen RDFs and they
are identical and also relatively smooth. Since, however, with 3 times
more data points (all three hydrogen atoms taken together) I get a
different RDF, would that indicate that I do not have enough data after

See my previous answers for some checks you can do on the convergence.
You have to look at your data and decide if the differences are
acceptable by you. I could suggest, for example, RMSD between curves,
focusing just on the first peak.


	Are RDFs known to be slow to converge?

A wild answer would be no, but thats basically depends on the size of
your bin. A bin of 0.5 Angstroms would mean in each bin you'll probably
have "enough" molecules for the data of this bin to converge, but than
the curve itself wouldn't look smooth (even if it is converged). A bin
size of 0.1 is often used.

Without considering bin width before you mention it, I can see that I
have been using a bin width of 0.02 Angstrom which is the default in
Gromacs. Judging by your comment, this is quite a small bin width.
However, I guess it explains why my curve looks smooth. This despite the
RDFs not being converged. Thanks to Mark, I realized that my RDF for the
3 hydrogen atoms taken over only half the time does not reproduce
neither my single hydrogen RDFs and even less my full production RDF
with 3 hydrogens..  I am still a bit puzzled as to why this can happen,
but I guess it clearly indicates that I do not have enough data for my
bin width. I am trying to play around with this..  

	I have about 1000 water molecules and about 50+ glycine
molecules, simulated for 10ns with 1ps sampling intervals. That should
give me 500,000,000 data points for the distribution, right? Can I
compare this number with the literature in which RDFs for, say,
water-water interactions are reported?

With 50 glycines you basically average 50 curves together, but the
number of water molecules does not effect the statistics of the glycine
RDF. The number of waters does mean that you most likely have a large
enough system.
As for water-water RDF, I think you should have more then enough data
here. Just make sure when comparing to similar variables with the
literature (potential, temperature, binning...). I would also worry
about calculating my water-water RDF for a water which is close to
glycine if I wanted to compare bulk water.



	One important point, which was not really clear to me before was
if, provided that I have enough data, the RDFs should be identical no
matter whether I use 1, 2, or 3 hydrogen atoms for generating the RDF.
As far as I understood, both yours and Omer's responses indicate that
the RDFs should be identical. Did I get that correctly?

>From the top of my head I would say yes, you are correct. I recommend
you look at some trajectory snapshots to make sure that indeed all 3
hydrogens have similar neighboring.


Thanks for confirming that, Omer.


Cheers, Omer. 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20091022/3586ff20/attachment.html>

More information about the gromacs.org_gmx-users mailing list