[gmx-users] Re: ci barely out of bounds

Matteo Guglielmi matteo.guglielmi at epfl.ch
Mon May 28 23:45:51 CEST 2007


David van der Spoel wrote:
> Matteo Guglielmi wrote:
>> David van der Spoel wrote:
>>> chris.neale at utoronto.ca wrote:
>>>> This email refers to my original posts on this topic:
>>>> http://www.gromacs.org/pipermail/gmx-users/2006-October/024154.html
>>>> http://www.gromacs.org/pipermail/gmx-users/2006-October/024333.html
>>>>
>>>> I have previously posted some of this information to somebody else's
>>>> bugzilla post #109 including a possible workaround
>>>> http://bugzilla.gromacs.org/show_bug.cgi?id=109
>>>>
>>>> I have compiled my gromacs always using one of the gcc3.3* distros. I
>>>> didn't do anything fancy, just the usual configure,make,make install.
>>>> I did have trouble with a gromacs version compiled using a gcc4*
>>>> distro and somebody on this user list assisted me in determining that
>>>> I should just roll back my gcc verison. I run on all sorts of
>>>> computers, opterons and intel chips, 32 and 64 bit, and this
>>>> particular ci problem is the same for me on all of them.
>>>>
>>>> Matteo, I want to be sure that we are on the same page here: your ci
>>>> is just *barely* out of bounds right? This is a different problem
>>>> than when your ci is a huge negative number. In that case you have
>>>> some other problem and your system is exploding.
>>>>
>>>> There is one test and one workaround included in my bugzilla post.
>>>> The test is to recompile gromacs with the -DEBUG_PBC flag and see if
>>>> the problem still occurs. For me this solved the problem (although
>>>> gromacs runs much slower so it is not a great workaround). The
>>>> solution was to remake my system with a few more or a few less waters
>>>> so that the number of grids wasn't changing as the volume of the box
>>>> fluctuates (slightly) during constant pressure simulations.
>>>>
>>>> I here include the text that I added to that bugzilla post:
>>>> Did you try with a version of mdrun that was compiled with
>>>> -DEBUG_PBC ?
>>>> I have some runs that reliably (but stochastically) give errors about
>>>> an atom
>>>> being found in a grid just one block outside of the expected boundary
>>>> only in
>>>> parallel runs, and often other nodes have log files that indicate
>>>> that they have
>>>> just updated the grid size (constant pressure simulation). This error
>>>> disappears
>>>> when I run with a -DEBUG_PBC version. My assumption here is that
>>>> there is some
>>>> non-blocking MPI communication that is not getting through in time.
>>>> The
>>>> -DEBUG_PBC version spends a lot of time checking some things and
>>>> although it
>>>> never reports having found some problem, I assume that a side-effect
>>>> of these
>>>> extra calculations is to slow things down enough at the proper stage
>>>> so that the
>>>> MPI message gets through. I have solved my problem by adjusting my
>>>> simulation
>>>> cell so that it doesn't fall close to the grid boundaries. Perhaps
>>>> you are
>>>> experiencing some analogous problem?
>>>>
>>>> Quoting Matteo Guglielmi <matteo.guglielmi at epfl.ch>:
>>>>
>>>>> Hello Chris,
>>>>> I have the same problem with gromacs and did not understand
>>>>> what's going wrong yet.
>>>>>
>>>>> I did not try to run a serial job (as you did) but all my 7
>>>>> simulations
>>>>> (6 solvated pores in membranes + 1 protein in water... all of them
>>>>> with positional restrains - double precision) keep crashing in the
>>>>> same way.
>>>>>
>>>>> Did you finally understand why they do crash (in parallel)?
>>>>>
>>>>> How did you compile gromacs?
>>>>>
>>>>> I used the intel copilers (ifort icc icpc 9.1 series) whith the
>>>>> following optimization flags: -O3 -unroll -axT.
>>>>>
>>>>> I've also tried the 8.0 series but no chance to get rid of the
>>>>> problem.
>>>>>
>>>>> I'm running on woodcrest (xeon cpu 5140 2.33GHz) and xeon cpu
>>>>> 3.06GHz.
>>>>>
>>>>> Thanks for your attention,
>>>>> MG
>>> Do you use pressure coupling? In principle that can cause problems
>>> when combined with position restraints. Further once again, please try
>>> to reproduce the problem with gcc as well. If this is related to
>>> bugzilla 109 as Chris suggests then please let's continue the
>>> discussion there.
>>>
>> Yes I do (anysotropic pressure).
>>
>> I got the same problem with the same system using a compiled version  of
>> gromacs
>> with gcc 3.4.6.
>>
>> So, in my case, doesn't matter which compiler I do use to compile
>> gromacs
>> (either Intel series 9.1 or gcc 3.4.6), I always get the same error
>> right after
>> the "update" of the Grid size.
>>
>> ....
>>
>> step 282180, will finish at Tue May 29 15:57:31 2007
>> step 282190, will finish at Tue May 29 15:57:32 2007
>> -------------------------------------------------------
>> Program mdrun_mpi, VERSION 3.3.1
>> Source code file: nsgrid.c, line: 226
>>
>> Range checking error:
>> Explanation: During neighborsearching, we assign each particle to a grid
>> based on its coordinates. If your system contains collisions or
>> parameter
>> errors that give particles very high velocities you might end up with
>> some
>> coordinates being +-Infinity or NaN (not-a-number). Obviously, we cannot
>> put these on a grid, so this is usually where we detect those errors.
>> Make sure your system is properly energy-minimized and that the
>> potential
>> energy seems reasonable before trying again.
>>
>> Variable ci has value 1472. It should have been within [ 0 .. 1440 ]
>> Please report this to the mailing list (gmx-users at gromacs.org)
>> -------------------------------------------------------
>>
>> "BioBeat is Not Available In Regular Shops" (P.J. Meulenhoff)
>>
>>
>>
>> Looking forward to have a solution,
>> thanks to all of you,
>> MG.
>>
> If this is reproducible as you say it is then please save your energy
> file at each step and plot the box as a function of time. Most likely
> it is exploding gently. This is most likely caused by the combination
> of position restraints and pressure coupling. Therefore it would be
> good to turn off on. I would suggest starting with posres and no
> pressure coupling, and once that equilibrates turn on pressure
> coupling with a long tau_p (e.g. 5 ps).
>
What I see from the trajectory file is my volume dimension getting
smaller in x, y and z.

Along the z coordinate, which is orthogonal to the membrane, the
volume of the box shrinks a bit faster because I did prepare/solvate
my system using amber9 (lower water molecules density).

All my systems were geometry optimized (emtol = 70) prior to
any md step.

I apply positionrestraints (fc=1K)to a transmembrane synthetic
ion channel (located in the center of the simulation box) which is
*well* surrounded by "soft" lipids (isotropic pressure, off diagonal
compressibility elements are set to 0)

I have the same problem with a complete different system
where a full protein (position restrained) is immersed only in
water (isotropic ressure)... variable ci gets *barely* out of bounds
also here.

My tau_p is set to 5 ps, my time step is set to 1fs and I use lincs
only on hbonds. I have room temperature.

The ci out of bounds problem usually occurs after the very first
0.3 ns.

I did run the same systems in terms of initial geometry and conditions
with other parallel MD codes, for more than 30ns each (actually I wanna
compare gromacs to them) without observing any slowly exploding systems.

That's why I think It's something related to the parallel version of gromacs
and the grid update which occurs along with the initial decreasing size
of my volume.

MG.



More information about the gromacs.org_gmx-users mailing list