[gmx-users] Re: ci barely out of bounds

David van der Spoel spoel at xray.bmc.uu.se
Tue May 29 07:50:06 CEST 2007


Matteo Guglielmi wrote:
> David van der Spoel wrote:
>> Matteo Guglielmi wrote:
>>> David van der Spoel wrote:
>>>> chris.neale at utoronto.ca wrote:
>>>>> This email refers to my original posts on this topic:
>>>>> http://www.gromacs.org/pipermail/gmx-users/2006-October/024154.html
>>>>> http://www.gromacs.org/pipermail/gmx-users/2006-October/024333.html
>>>>>
>>>>> I have previously posted some of this information to somebody else's
>>>>> bugzilla post #109 including a possible workaround
>>>>> http://bugzilla.gromacs.org/show_bug.cgi?id=109
>>>>>
>>>>> I have compiled my gromacs always using one of the gcc3.3* distros. I
>>>>> didn't do anything fancy, just the usual configure,make,make install.
>>>>> I did have trouble with a gromacs version compiled using a gcc4*
>>>>> distro and somebody on this user list assisted me in determining that
>>>>> I should just roll back my gcc verison. I run on all sorts of
>>>>> computers, opterons and intel chips, 32 and 64 bit, and this
>>>>> particular ci problem is the same for me on all of them.
>>>>>
>>>>> Matteo, I want to be sure that we are on the same page here: your ci
>>>>> is just *barely* out of bounds right? This is a different problem
>>>>> than when your ci is a huge negative number. In that case you have
>>>>> some other problem and your system is exploding.
>>>>>
>>>>> There is one test and one workaround included in my bugzilla post.
>>>>> The test is to recompile gromacs with the -DEBUG_PBC flag and see if
>>>>> the problem still occurs. For me this solved the problem (although
>>>>> gromacs runs much slower so it is not a great workaround). The
>>>>> solution was to remake my system with a few more or a few less waters
>>>>> so that the number of grids wasn't changing as the volume of the box
>>>>> fluctuates (slightly) during constant pressure simulations.
>>>>>
>>>>> I here include the text that I added to that bugzilla post:
>>>>> Did you try with a version of mdrun that was compiled with
>>>>> -DEBUG_PBC ?
>>>>> I have some runs that reliably (but stochastically) give errors about
>>>>> an atom
>>>>> being found in a grid just one block outside of the expected boundary
>>>>> only in
>>>>> parallel runs, and often other nodes have log files that indicate
>>>>> that they have
>>>>> just updated the grid size (constant pressure simulation). This error
>>>>> disappears
>>>>> when I run with a -DEBUG_PBC version. My assumption here is that
>>>>> there is some
>>>>> non-blocking MPI communication that is not getting through in time.
>>>>> The
>>>>> -DEBUG_PBC version spends a lot of time checking some things and
>>>>> although it
>>>>> never reports having found some problem, I assume that a side-effect
>>>>> of these
>>>>> extra calculations is to slow things down enough at the proper stage
>>>>> so that the
>>>>> MPI message gets through. I have solved my problem by adjusting my
>>>>> simulation
>>>>> cell so that it doesn't fall close to the grid boundaries. Perhaps
>>>>> you are
>>>>> experiencing some analogous problem?
>>>>>
>>>>> Quoting Matteo Guglielmi <matteo.guglielmi at epfl.ch>:
>>>>>
>>>>>> Hello Chris,
>>>>>> I have the same problem with gromacs and did not understand
>>>>>> what's going wrong yet.
>>>>>>
>>>>>> I did not try to run a serial job (as you did) but all my 7
>>>>>> simulations
>>>>>> (6 solvated pores in membranes + 1 protein in water... all of them
>>>>>> with positional restrains - double precision) keep crashing in the
>>>>>> same way.
>>>>>>
>>>>>> Did you finally understand why they do crash (in parallel)?
>>>>>>
>>>>>> How did you compile gromacs?
>>>>>>
>>>>>> I used the intel copilers (ifort icc icpc 9.1 series) whith the
>>>>>> following optimization flags: -O3 -unroll -axT.
>>>>>>
>>>>>> I've also tried the 8.0 series but no chance to get rid of the
>>>>>> problem.
>>>>>>
>>>>>> I'm running on woodcrest (xeon cpu 5140 2.33GHz) and xeon cpu
>>>>>> 3.06GHz.
>>>>>>
>>>>>> Thanks for your attention,
>>>>>> MG
>>>> Do you use pressure coupling? In principle that can cause problems
>>>> when combined with position restraints. Further once again, please try
>>>> to reproduce the problem with gcc as well. If this is related to
>>>> bugzilla 109 as Chris suggests then please let's continue the
>>>> discussion there.
>>>>
>>> Yes I do (anysotropic pressure).
>>>
>>> I got the same problem with the same system using a compiled version  of
>>> gromacs
>>> with gcc 3.4.6.
>>>
>>> So, in my case, doesn't matter which compiler I do use to compile
>>> gromacs
>>> (either Intel series 9.1 or gcc 3.4.6), I always get the same error
>>> right after
>>> the "update" of the Grid size.
>>>
>>> ....
>>>
>>> step 282180, will finish at Tue May 29 15:57:31 2007
>>> step 282190, will finish at Tue May 29 15:57:32 2007
>>> -------------------------------------------------------
>>> Program mdrun_mpi, VERSION 3.3.1
>>> Source code file: nsgrid.c, line: 226
>>>
>>> Range checking error:
>>> Explanation: During neighborsearching, we assign each particle to a grid
>>> based on its coordinates. If your system contains collisions or
>>> parameter
>>> errors that give particles very high velocities you might end up with
>>> some
>>> coordinates being +-Infinity or NaN (not-a-number). Obviously, we cannot
>>> put these on a grid, so this is usually where we detect those errors.
>>> Make sure your system is properly energy-minimized and that the
>>> potential
>>> energy seems reasonable before trying again.
>>>
>>> Variable ci has value 1472. It should have been within [ 0 .. 1440 ]
>>> Please report this to the mailing list (gmx-users at gromacs.org)
>>> -------------------------------------------------------
>>>
>>> "BioBeat is Not Available In Regular Shops" (P.J. Meulenhoff)
>>>
>>>
>>>
>>> Looking forward to have a solution,
>>> thanks to all of you,
>>> MG.
>>>
>> If this is reproducible as you say it is then please save your energy
>> file at each step and plot the box as a function of time. Most likely
>> it is exploding gently. This is most likely caused by the combination
>> of position restraints and pressure coupling. Therefore it would be
>> good to turn off on. I would suggest starting with posres and no
>> pressure coupling, and once that equilibrates turn on pressure
>> coupling with a long tau_p (e.g. 5 ps).
>>
> What I see from the trajectory file is my volume dimension getting
> smaller in x, y and z.
> 
> Along the z coordinate, which is orthogonal to the membrane, the
> volume of the box shrinks a bit faster because I did prepare/solvate
> my system using amber9 (lower water molecules density).
> 
> All my systems were geometry optimized (emtol = 70) prior to
> any md step.
> 
> I apply positionrestraints (fc=1K)to a transmembrane synthetic
> ion channel (located in the center of the simulation box) which is
> *well* surrounded by "soft" lipids (isotropic pressure, off diagonal
> compressibility elements are set to 0)
> 
> I have the same problem with a complete different system
> where a full protein (position restrained) is immersed only in
> water (isotropic ressure)... variable ci gets *barely* out of bounds
> also here.
> 
> My tau_p is set to 5 ps, my time step is set to 1fs and I use lincs
> only on hbonds. I have room temperature.
> 
> The ci out of bounds problem usually occurs after the very first
> 0.3 ns.
> 
> I did run the same systems in terms of initial geometry and conditions
> with other parallel MD codes, for more than 30ns each (actually I wanna
> compare gromacs to them) without observing any slowly exploding systems.
> 
> That's why I think It's something related to the parallel version of gromacs
> and the grid update which occurs along with the initial decreasing size
> of my volume.
> 
You are welcome to submit a bugzilla if you have a reproducible problem. 
It would be great if you can upload a tpr file that reproduces the 
problem as fast as possible.

Nevertheless I would urge you to try it without position restraints as 
well. The problem could be related to restraining a molecule at a 
position far outside the box or reducing the box size until the box is 
smaller than your protein.
-- 
David.
________________________________________________________________________
David van der Spoel, PhD, Assoc. Prof., Molecular Biophysics group,
Dept. of Cell and Molecular Biology, Uppsala University.
Husargatan 3, Box 596,  	75124 Uppsala, Sweden
phone:	46 18 471 4205		fax: 46 18 511 755
spoel at xray.bmc.uu.se	spoel at gromacs.org   http://folding.bmc.uu.se
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++



More information about the gromacs.org_gmx-users mailing list