[gmx-users] Re: ci barely out of bounds
David van der Spoel
spoel at xray.bmc.uu.se
Tue May 29 21:14:40 CEST 2007
Matteo Guglielmi wrote:
> David van der Spoel wrote:
>> Matteo Guglielmi wrote:
>>> David van der Spoel wrote:
>>>> Matteo Guglielmi wrote:
>>>>> David van der Spoel wrote:
>>>>>> chris.neale at utoronto.ca wrote:
>>>>>>> This email refers to my original posts on this topic:
>>>>>>> http://www.gromacs.org/pipermail/gmx-users/2006-October/024154.html
>>>>>>> http://www.gromacs.org/pipermail/gmx-users/2006-October/024333.html
>>>>>>>
>>>>>>> I have previously posted some of this information to somebody else's
>>>>>>> bugzilla post #109 including a possible workaround
>>>>>>> http://bugzilla.gromacs.org/show_bug.cgi?id=109
>>>>>>>
>>>>>>> I have compiled my gromacs always using one of the gcc3.3*
>>>>>>> distros. I
>>>>>>> didn't do anything fancy, just the usual configure,make,make
>>>>>>> install.
>>>>>>> I did have trouble with a gromacs version compiled using a gcc4*
>>>>>>> distro and somebody on this user list assisted me in determining
>>>>>>> that
>>>>>>> I should just roll back my gcc verison. I run on all sorts of
>>>>>>> computers, opterons and intel chips, 32 and 64 bit, and this
>>>>>>> particular ci problem is the same for me on all of them.
>>>>>>>
>>>>>>> Matteo, I want to be sure that we are on the same page here: your ci
>>>>>>> is just *barely* out of bounds right? This is a different problem
>>>>>>> than when your ci is a huge negative number. In that case you have
>>>>>>> some other problem and your system is exploding.
>>>>>>>
>>>>>>> There is one test and one workaround included in my bugzilla post.
>>>>>>> The test is to recompile gromacs with the -DEBUG_PBC flag and see if
>>>>>>> the problem still occurs. For me this solved the problem (although
>>>>>>> gromacs runs much slower so it is not a great workaround). The
>>>>>>> solution was to remake my system with a few more or a few less
>>>>>>> waters
>>>>>>> so that the number of grids wasn't changing as the volume of the box
>>>>>>> fluctuates (slightly) during constant pressure simulations.
>>>>>>>
>>>>>>> I here include the text that I added to that bugzilla post:
>>>>>>> Did you try with a version of mdrun that was compiled with
>>>>>>> -DEBUG_PBC ?
>>>>>>> I have some runs that reliably (but stochastically) give errors
>>>>>>> about
>>>>>>> an atom
>>>>>>> being found in a grid just one block outside of the expected
>>>>>>> boundary
>>>>>>> only in
>>>>>>> parallel runs, and often other nodes have log files that indicate
>>>>>>> that they have
>>>>>>> just updated the grid size (constant pressure simulation). This
>>>>>>> error
>>>>>>> disappears
>>>>>>> when I run with a -DEBUG_PBC version. My assumption here is that
>>>>>>> there is some
>>>>>>> non-blocking MPI communication that is not getting through in time.
>>>>>>> The
>>>>>>> -DEBUG_PBC version spends a lot of time checking some things and
>>>>>>> although it
>>>>>>> never reports having found some problem, I assume that a side-effect
>>>>>>> of these
>>>>>>> extra calculations is to slow things down enough at the proper stage
>>>>>>> so that the
>>>>>>> MPI message gets through. I have solved my problem by adjusting my
>>>>>>> simulation
>>>>>>> cell so that it doesn't fall close to the grid boundaries. Perhaps
>>>>>>> you are
>>>>>>> experiencing some analogous problem?
>>>>>>>
>>>>>>> Quoting Matteo Guglielmi <matteo.guglielmi at epfl.ch>:
>>>>>>>
>>>>>>>> Hello Chris,
>>>>>>>> I have the same problem with gromacs and did not understand
>>>>>>>> what's going wrong yet.
>>>>>>>>
>>>>>>>> I did not try to run a serial job (as you did) but all my 7
>>>>>>>> simulations
>>>>>>>> (6 solvated pores in membranes + 1 protein in water... all of them
>>>>>>>> with positional restrains - double precision) keep crashing in the
>>>>>>>> same way.
>>>>>>>>
>>>>>>>> Did you finally understand why they do crash (in parallel)?
>>>>>>>>
>>>>>>>> How did you compile gromacs?
>>>>>>>>
>>>>>>>> I used the intel copilers (ifort icc icpc 9.1 series) whith the
>>>>>>>> following optimization flags: -O3 -unroll -axT.
>>>>>>>>
>>>>>>>> I've also tried the 8.0 series but no chance to get rid of the
>>>>>>>> problem.
>>>>>>>>
>>>>>>>> I'm running on woodcrest (xeon cpu 5140 2.33GHz) and xeon cpu
>>>>>>>> 3.06GHz.
>>>>>>>>
>>>>>>>> Thanks for your attention,
>>>>>>>> MG
>>>>>> Do you use pressure coupling? In principle that can cause problems
>>>>>> when combined with position restraints. Further once again, please
>>>>>> try
>>>>>> to reproduce the problem with gcc as well. If this is related to
>>>>>> bugzilla 109 as Chris suggests then please let's continue the
>>>>>> discussion there.
>>>>>>
>>>>> Yes I do (anysotropic pressure).
>>>>>
>>>>> I got the same problem with the same system using a compiled
>>>>> version of
>>>>> gromacs
>>>>> with gcc 3.4.6.
>>>>>
>>>>> So, in my case, doesn't matter which compiler I do use to compile
>>>>> gromacs
>>>>> (either Intel series 9.1 or gcc 3.4.6), I always get the same error
>>>>> right after
>>>>> the "update" of the Grid size.
>>>>>
>>>>> ....
>>>>>
>>>>> step 282180, will finish at Tue May 29 15:57:31 2007
>>>>> step 282190, will finish at Tue May 29 15:57:32 2007
>>>>> -------------------------------------------------------
>>>>> Program mdrun_mpi, VERSION 3.3.1
>>>>> Source code file: nsgrid.c, line: 226
>>>>>
>>>>> Range checking error:
>>>>> Explanation: During neighborsearching, we assign each particle to a
>>>>> grid
>>>>> based on its coordinates. If your system contains collisions or
>>>>> parameter
>>>>> errors that give particles very high velocities you might end up with
>>>>> some
>>>>> coordinates being +-Infinity or NaN (not-a-number). Obviously, we
>>>>> cannot
>>>>> put these on a grid, so this is usually where we detect those errors.
>>>>> Make sure your system is properly energy-minimized and that the
>>>>> potential
>>>>> energy seems reasonable before trying again.
>>>>>
>>>>> Variable ci has value 1472. It should have been within [ 0 .. 1440 ]
>>>>> Please report this to the mailing list (gmx-users at gromacs.org)
>>>>> -------------------------------------------------------
>>>>>
>>>>> "BioBeat is Not Available In Regular Shops" (P.J. Meulenhoff)
>>>>>
>>>>>
>>>>>
>>>>> Looking forward to have a solution,
>>>>> thanks to all of you,
>>>>> MG.
>>>>>
>>>> If this is reproducible as you say it is then please save your energy
>>>> file at each step and plot the box as a function of time. Most likely
>>>> it is exploding gently. This is most likely caused by the combination
>>>> of position restraints and pressure coupling. Therefore it would be
>>>> good to turn off on. I would suggest starting with posres and no
>>>> pressure coupling, and once that equilibrates turn on pressure
>>>> coupling with a long tau_p (e.g. 5 ps).
>>>>
>>> What I see from the trajectory file is my volume dimension getting
>>> smaller in x, y and z.
>>>
>>> Along the z coordinate, which is orthogonal to the membrane, the
>>> volume of the box shrinks a bit faster because I did prepare/solvate
>>> my system using amber9 (lower water molecules density).
>>>
>>> All my systems were geometry optimized (emtol = 70) prior to
>>> any md step.
>>>
>>> I apply positionrestraints (fc=1K)to a transmembrane synthetic
>>> ion channel (located in the center of the simulation box) which is
>>> *well* surrounded by "soft" lipids (isotropic pressure, off diagonal
>>> compressibility elements are set to 0)
>>>
>>> I have the same problem with a complete different system
>>> where a full protein (position restrained) is immersed only in
>>> water (isotropic ressure)... variable ci gets *barely* out of bounds
>>> also here.
>>>
>>> My tau_p is set to 5 ps, my time step is set to 1fs and I use lincs
>>> only on hbonds. I have room temperature.
>>>
>>> The ci out of bounds problem usually occurs after the very first
>>> 0.3 ns.
>>>
>>> I did run the same systems in terms of initial geometry and conditions
>>> with other parallel MD codes, for more than 30ns each (actually I wanna
>>> compare gromacs to them) without observing any slowly exploding systems.
>>>
>>> That's why I think It's something related to the parallel version of
>>> gromacs
>>> and the grid update which occurs along with the initial decreasing size
>>> of my volume.
>>>
>> You are welcome to submit a bugzilla if you have a reproducible
>> problem. It would be great if you can upload a tpr file that
>> reproduces the problem as fast as possible.
>>
>> Nevertheless I would urge you to try it without position restraints as
>> well. The problem could be related to restraining a molecule at a
>> position far outside the box or reducing the box size until the box is
>> smaller than your protein.
> I have the tpr file.
>
> My molecules are located in the center of the simulation box and
> their size is much smaller then the box itself.
>
> I could run a position restraints-less job just to see what's gonna
> happens but I have no time at the moment.
Sorry, but please don't say these kind of things, it is not very
encouraging for those wanting to help.
> (Actually my transmembrane pores will collapse for sure since
> the water density, all over the box, is too much low ;-) )
>
> Moreover, since I'm not the only one gmx user fighting with this
> problem:
>
> http://www.gromacs.org/pipermail/gmx-users/2006-October/024154.html
> http://www.gromacs.org/pipermail/gmx-users/2006-October/024333.html
> http://bugzilla.gromacs.org/show_bug.cgi?id=109
>
> I'm pretty sure we are dealing with a "bad" communication between
> parallel processes (serial jobs do not suffer of such a problem)
>
> Thanks David,
> MG.
You have not uploaded the tpr to the bugzilla yet...
--
David.
________________________________________________________________________
David van der Spoel, PhD, Assoc. Prof., Molecular Biophysics group,
Dept. of Cell and Molecular Biology, Uppsala University.
Husargatan 3, Box 596, 75124 Uppsala, Sweden
phone: 46 18 471 4205 fax: 46 18 511 755
spoel at xray.bmc.uu.se spoel at gromacs.org http://folding.bmc.uu.se
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
More information about the gromacs.org_gmx-users
mailing list