[gmx-users] Re: ci barely out of bounds
matteo.guglielmi at epfl.ch
Mon May 28 23:45:51 CEST 2007
David van der Spoel wrote:
> Matteo Guglielmi wrote:
>> David van der Spoel wrote:
>>> chris.neale at utoronto.ca wrote:
>>>> This email refers to my original posts on this topic:
>>>> I have previously posted some of this information to somebody else's
>>>> bugzilla post #109 including a possible workaround
>>>> I have compiled my gromacs always using one of the gcc3.3* distros. I
>>>> didn't do anything fancy, just the usual configure,make,make install.
>>>> I did have trouble with a gromacs version compiled using a gcc4*
>>>> distro and somebody on this user list assisted me in determining that
>>>> I should just roll back my gcc verison. I run on all sorts of
>>>> computers, opterons and intel chips, 32 and 64 bit, and this
>>>> particular ci problem is the same for me on all of them.
>>>> Matteo, I want to be sure that we are on the same page here: your ci
>>>> is just *barely* out of bounds right? This is a different problem
>>>> than when your ci is a huge negative number. In that case you have
>>>> some other problem and your system is exploding.
>>>> There is one test and one workaround included in my bugzilla post.
>>>> The test is to recompile gromacs with the -DEBUG_PBC flag and see if
>>>> the problem still occurs. For me this solved the problem (although
>>>> gromacs runs much slower so it is not a great workaround). The
>>>> solution was to remake my system with a few more or a few less waters
>>>> so that the number of grids wasn't changing as the volume of the box
>>>> fluctuates (slightly) during constant pressure simulations.
>>>> I here include the text that I added to that bugzilla post:
>>>> Did you try with a version of mdrun that was compiled with
>>>> -DEBUG_PBC ?
>>>> I have some runs that reliably (but stochastically) give errors about
>>>> an atom
>>>> being found in a grid just one block outside of the expected boundary
>>>> only in
>>>> parallel runs, and often other nodes have log files that indicate
>>>> that they have
>>>> just updated the grid size (constant pressure simulation). This error
>>>> when I run with a -DEBUG_PBC version. My assumption here is that
>>>> there is some
>>>> non-blocking MPI communication that is not getting through in time.
>>>> -DEBUG_PBC version spends a lot of time checking some things and
>>>> although it
>>>> never reports having found some problem, I assume that a side-effect
>>>> of these
>>>> extra calculations is to slow things down enough at the proper stage
>>>> so that the
>>>> MPI message gets through. I have solved my problem by adjusting my
>>>> cell so that it doesn't fall close to the grid boundaries. Perhaps
>>>> you are
>>>> experiencing some analogous problem?
>>>> Quoting Matteo Guglielmi <matteo.guglielmi at epfl.ch>:
>>>>> Hello Chris,
>>>>> I have the same problem with gromacs and did not understand
>>>>> what's going wrong yet.
>>>>> I did not try to run a serial job (as you did) but all my 7
>>>>> (6 solvated pores in membranes + 1 protein in water... all of them
>>>>> with positional restrains - double precision) keep crashing in the
>>>>> same way.
>>>>> Did you finally understand why they do crash (in parallel)?
>>>>> How did you compile gromacs?
>>>>> I used the intel copilers (ifort icc icpc 9.1 series) whith the
>>>>> following optimization flags: -O3 -unroll -axT.
>>>>> I've also tried the 8.0 series but no chance to get rid of the
>>>>> I'm running on woodcrest (xeon cpu 5140 2.33GHz) and xeon cpu
>>>>> Thanks for your attention,
>>> Do you use pressure coupling? In principle that can cause problems
>>> when combined with position restraints. Further once again, please try
>>> to reproduce the problem with gcc as well. If this is related to
>>> bugzilla 109 as Chris suggests then please let's continue the
>>> discussion there.
>> Yes I do (anysotropic pressure).
>> I got the same problem with the same system using a compiled version of
>> with gcc 3.4.6.
>> So, in my case, doesn't matter which compiler I do use to compile
>> (either Intel series 9.1 or gcc 3.4.6), I always get the same error
>> right after
>> the "update" of the Grid size.
>> step 282180, will finish at Tue May 29 15:57:31 2007
>> step 282190, will finish at Tue May 29 15:57:32 2007
>> Program mdrun_mpi, VERSION 3.3.1
>> Source code file: nsgrid.c, line: 226
>> Range checking error:
>> Explanation: During neighborsearching, we assign each particle to a grid
>> based on its coordinates. If your system contains collisions or
>> errors that give particles very high velocities you might end up with
>> coordinates being +-Infinity or NaN (not-a-number). Obviously, we cannot
>> put these on a grid, so this is usually where we detect those errors.
>> Make sure your system is properly energy-minimized and that the
>> energy seems reasonable before trying again.
>> Variable ci has value 1472. It should have been within [ 0 .. 1440 ]
>> Please report this to the mailing list (gmx-users at gromacs.org)
>> "BioBeat is Not Available In Regular Shops" (P.J. Meulenhoff)
>> Looking forward to have a solution,
>> thanks to all of you,
> If this is reproducible as you say it is then please save your energy
> file at each step and plot the box as a function of time. Most likely
> it is exploding gently. This is most likely caused by the combination
> of position restraints and pressure coupling. Therefore it would be
> good to turn off on. I would suggest starting with posres and no
> pressure coupling, and once that equilibrates turn on pressure
> coupling with a long tau_p (e.g. 5 ps).
What I see from the trajectory file is my volume dimension getting
smaller in x, y and z.
Along the z coordinate, which is orthogonal to the membrane, the
volume of the box shrinks a bit faster because I did prepare/solvate
my system using amber9 (lower water molecules density).
All my systems were geometry optimized (emtol = 70) prior to
any md step.
I apply positionrestraints (fc=1K)to a transmembrane synthetic
ion channel (located in the center of the simulation box) which is
*well* surrounded by "soft" lipids (isotropic pressure, off diagonal
compressibility elements are set to 0)
I have the same problem with a complete different system
where a full protein (position restrained) is immersed only in
water (isotropic ressure)... variable ci gets *barely* out of bounds
My tau_p is set to 5 ps, my time step is set to 1fs and I use lincs
only on hbonds. I have room temperature.
The ci out of bounds problem usually occurs after the very first
I did run the same systems in terms of initial geometry and conditions
with other parallel MD codes, for more than 30ns each (actually I wanna
compare gromacs to them) without observing any slowly exploding systems.
That's why I think It's something related to the parallel version of gromacs
and the grid update which occurs along with the initial decreasing size
of my volume.
More information about the gromacs.org_gmx-users