[gmx-users] Re: ci barely out of bounds

Matteo Guglielmi matteo.guglielmi at epfl.ch
Tue May 29 20:49:30 CEST 2007


David van der Spoel wrote:
> Matteo Guglielmi wrote:
>> David van der Spoel wrote:
>>> Matteo Guglielmi wrote:
>>>> David van der Spoel wrote:
>>>>> chris.neale at utoronto.ca wrote:
>>>>>> This email refers to my original posts on this topic:
>>>>>> http://www.gromacs.org/pipermail/gmx-users/2006-October/024154.html
>>>>>> http://www.gromacs.org/pipermail/gmx-users/2006-October/024333.html
>>>>>>
>>>>>> I have previously posted some of this information to somebody else's
>>>>>> bugzilla post #109 including a possible workaround
>>>>>> http://bugzilla.gromacs.org/show_bug.cgi?id=109
>>>>>>
>>>>>> I have compiled my gromacs always using one of the gcc3.3*
>>>>>> distros. I
>>>>>> didn't do anything fancy, just the usual configure,make,make
>>>>>> install.
>>>>>> I did have trouble with a gromacs version compiled using a gcc4*
>>>>>> distro and somebody on this user list assisted me in determining
>>>>>> that
>>>>>> I should just roll back my gcc verison. I run on all sorts of
>>>>>> computers, opterons and intel chips, 32 and 64 bit, and this
>>>>>> particular ci problem is the same for me on all of them.
>>>>>>
>>>>>> Matteo, I want to be sure that we are on the same page here: your ci
>>>>>> is just *barely* out of bounds right? This is a different problem
>>>>>> than when your ci is a huge negative number. In that case you have
>>>>>> some other problem and your system is exploding.
>>>>>>
>>>>>> There is one test and one workaround included in my bugzilla post.
>>>>>> The test is to recompile gromacs with the -DEBUG_PBC flag and see if
>>>>>> the problem still occurs. For me this solved the problem (although
>>>>>> gromacs runs much slower so it is not a great workaround). The
>>>>>> solution was to remake my system with a few more or a few less
>>>>>> waters
>>>>>> so that the number of grids wasn't changing as the volume of the box
>>>>>> fluctuates (slightly) during constant pressure simulations.
>>>>>>
>>>>>> I here include the text that I added to that bugzilla post:
>>>>>> Did you try with a version of mdrun that was compiled with
>>>>>> -DEBUG_PBC ?
>>>>>> I have some runs that reliably (but stochastically) give errors
>>>>>> about
>>>>>> an atom
>>>>>> being found in a grid just one block outside of the expected
>>>>>> boundary
>>>>>> only in
>>>>>> parallel runs, and often other nodes have log files that indicate
>>>>>> that they have
>>>>>> just updated the grid size (constant pressure simulation). This
>>>>>> error
>>>>>> disappears
>>>>>> when I run with a -DEBUG_PBC version. My assumption here is that
>>>>>> there is some
>>>>>> non-blocking MPI communication that is not getting through in time.
>>>>>> The
>>>>>> -DEBUG_PBC version spends a lot of time checking some things and
>>>>>> although it
>>>>>> never reports having found some problem, I assume that a side-effect
>>>>>> of these
>>>>>> extra calculations is to slow things down enough at the proper stage
>>>>>> so that the
>>>>>> MPI message gets through. I have solved my problem by adjusting my
>>>>>> simulation
>>>>>> cell so that it doesn't fall close to the grid boundaries. Perhaps
>>>>>> you are
>>>>>> experiencing some analogous problem?
>>>>>>
>>>>>> Quoting Matteo Guglielmi <matteo.guglielmi at epfl.ch>:
>>>>>>
>>>>>>> Hello Chris,
>>>>>>> I have the same problem with gromacs and did not understand
>>>>>>> what's going wrong yet.
>>>>>>>
>>>>>>> I did not try to run a serial job (as you did) but all my 7
>>>>>>> simulations
>>>>>>> (6 solvated pores in membranes + 1 protein in water... all of them
>>>>>>> with positional restrains - double precision) keep crashing in the
>>>>>>> same way.
>>>>>>>
>>>>>>> Did you finally understand why they do crash (in parallel)?
>>>>>>>
>>>>>>> How did you compile gromacs?
>>>>>>>
>>>>>>> I used the intel copilers (ifort icc icpc 9.1 series) whith the
>>>>>>> following optimization flags: -O3 -unroll -axT.
>>>>>>>
>>>>>>> I've also tried the 8.0 series but no chance to get rid of the
>>>>>>> problem.
>>>>>>>
>>>>>>> I'm running on woodcrest (xeon cpu 5140 2.33GHz) and xeon cpu
>>>>>>> 3.06GHz.
>>>>>>>
>>>>>>> Thanks for your attention,
>>>>>>> MG
>>>>> Do you use pressure coupling? In principle that can cause problems
>>>>> when combined with position restraints. Further once again, please
>>>>> try
>>>>> to reproduce the problem with gcc as well. If this is related to
>>>>> bugzilla 109 as Chris suggests then please let's continue the
>>>>> discussion there.
>>>>>
>>>> Yes I do (anysotropic pressure).
>>>>
>>>> I got the same problem with the same system using a compiled
>>>> version  of
>>>> gromacs
>>>> with gcc 3.4.6.
>>>>
>>>> So, in my case, doesn't matter which compiler I do use to compile
>>>> gromacs
>>>> (either Intel series 9.1 or gcc 3.4.6), I always get the same error
>>>> right after
>>>> the "update" of the Grid size.
>>>>
>>>> ....
>>>>
>>>> step 282180, will finish at Tue May 29 15:57:31 2007
>>>> step 282190, will finish at Tue May 29 15:57:32 2007
>>>> -------------------------------------------------------
>>>> Program mdrun_mpi, VERSION 3.3.1
>>>> Source code file: nsgrid.c, line: 226
>>>>
>>>> Range checking error:
>>>> Explanation: During neighborsearching, we assign each particle to a
>>>> grid
>>>> based on its coordinates. If your system contains collisions or
>>>> parameter
>>>> errors that give particles very high velocities you might end up with
>>>> some
>>>> coordinates being +-Infinity or NaN (not-a-number). Obviously, we
>>>> cannot
>>>> put these on a grid, so this is usually where we detect those errors.
>>>> Make sure your system is properly energy-minimized and that the
>>>> potential
>>>> energy seems reasonable before trying again.
>>>>
>>>> Variable ci has value 1472. It should have been within [ 0 .. 1440 ]
>>>> Please report this to the mailing list (gmx-users at gromacs.org)
>>>> -------------------------------------------------------
>>>>
>>>> "BioBeat is Not Available In Regular Shops" (P.J. Meulenhoff)
>>>>
>>>>
>>>>
>>>> Looking forward to have a solution,
>>>> thanks to all of you,
>>>> MG.
>>>>
>>> If this is reproducible as you say it is then please save your energy
>>> file at each step and plot the box as a function of time. Most likely
>>> it is exploding gently. This is most likely caused by the combination
>>> of position restraints and pressure coupling. Therefore it would be
>>> good to turn off on. I would suggest starting with posres and no
>>> pressure coupling, and once that equilibrates turn on pressure
>>> coupling with a long tau_p (e.g. 5 ps).
>>>
>> What I see from the trajectory file is my volume dimension getting
>> smaller in x, y and z.
>>
>> Along the z coordinate, which is orthogonal to the membrane, the
>> volume of the box shrinks a bit faster because I did prepare/solvate
>> my system using amber9 (lower water molecules density).
>>
>> All my systems were geometry optimized (emtol = 70) prior to
>> any md step.
>>
>> I apply positionrestraints (fc=1K)to a transmembrane synthetic
>> ion channel (located in the center of the simulation box) which is
>> *well* surrounded by "soft" lipids (isotropic pressure, off diagonal
>> compressibility elements are set to 0)
>>
>> I have the same problem with a complete different system
>> where a full protein (position restrained) is immersed only in
>> water (isotropic ressure)... variable ci gets *barely* out of bounds
>> also here.
>>
>> My tau_p is set to 5 ps, my time step is set to 1fs and I use lincs
>> only on hbonds. I have room temperature.
>>
>> The ci out of bounds problem usually occurs after the very first
>> 0.3 ns.
>>
>> I did run the same systems in terms of initial geometry and conditions
>> with other parallel MD codes, for more than 30ns each (actually I wanna
>> compare gromacs to them) without observing any slowly exploding systems.
>>
>> That's why I think It's something related to the parallel version of
>> gromacs
>> and the grid update which occurs along with the initial decreasing size
>> of my volume.
>>
> You are welcome to submit a bugzilla if you have a reproducible
> problem. It would be great if you can upload a tpr file that
> reproduces the problem as fast as possible.
>
> Nevertheless I would urge you to try it without position restraints as
> well. The problem could be related to restraining a molecule at a
> position far outside the box or reducing the box size until the box is
> smaller than your protein.
I have the tpr file.

My molecules are located in the center of the simulation box and
their size is much smaller then the box itself.

I could run a position restraints-less job just to see what's gonna
happens but I have no time at the moment.
(Actually my transmembrane pores will collapse for sure since
the water density, all over the box, is too much low ;-) )

Moreover, since I'm not the only one gmx user fighting with this
problem:

http://www.gromacs.org/pipermail/gmx-users/2006-October/024154.html
http://www.gromacs.org/pipermail/gmx-users/2006-October/024333.html
http://bugzilla.gromacs.org/show_bug.cgi?id=109

I'm pretty sure we are dealing with a "bad" communication between
parallel processes (serial jobs do not suffer of such a problem)

Thanks David,
MG.



More information about the gromacs.org_gmx-users mailing list