[gmx-users] Re: ci barely out of bounds
Matteo Guglielmi
matteo.guglielmi at epfl.ch
Mon May 28 22:26:38 CEST 2007
David van der Spoel wrote:
> chris.neale at utoronto.ca wrote:
>> This email refers to my original posts on this topic:
>> http://www.gromacs.org/pipermail/gmx-users/2006-October/024154.html
>> http://www.gromacs.org/pipermail/gmx-users/2006-October/024333.html
>>
>> I have previously posted some of this information to somebody else's
>> bugzilla post #109 including a possible workaround
>> http://bugzilla.gromacs.org/show_bug.cgi?id=109
>>
>> I have compiled my gromacs always using one of the gcc3.3* distros. I
>> didn't do anything fancy, just the usual configure,make,make install.
>> I did have trouble with a gromacs version compiled using a gcc4*
>> distro and somebody on this user list assisted me in determining that
>> I should just roll back my gcc verison. I run on all sorts of
>> computers, opterons and intel chips, 32 and 64 bit, and this
>> particular ci problem is the same for me on all of them.
>>
>> Matteo, I want to be sure that we are on the same page here: your ci
>> is just *barely* out of bounds right? This is a different problem
>> than when your ci is a huge negative number. In that case you have
>> some other problem and your system is exploding.
>>
>> There is one test and one workaround included in my bugzilla post.
>> The test is to recompile gromacs with the -DEBUG_PBC flag and see if
>> the problem still occurs. For me this solved the problem (although
>> gromacs runs much slower so it is not a great workaround). The
>> solution was to remake my system with a few more or a few less waters
>> so that the number of grids wasn't changing as the volume of the box
>> fluctuates (slightly) during constant pressure simulations.
>>
>> I here include the text that I added to that bugzilla post:
>> Did you try with a version of mdrun that was compiled with -DEBUG_PBC ?
>> I have some runs that reliably (but stochastically) give errors about
>> an atom
>> being found in a grid just one block outside of the expected boundary
>> only in
>> parallel runs, and often other nodes have log files that indicate
>> that they have
>> just updated the grid size (constant pressure simulation). This error
>> disappears
>> when I run with a -DEBUG_PBC version. My assumption here is that
>> there is some
>> non-blocking MPI communication that is not getting through in time. The
>> -DEBUG_PBC version spends a lot of time checking some things and
>> although it
>> never reports having found some problem, I assume that a side-effect
>> of these
>> extra calculations is to slow things down enough at the proper stage
>> so that the
>> MPI message gets through. I have solved my problem by adjusting my
>> simulation
>> cell so that it doesn't fall close to the grid boundaries. Perhaps
>> you are
>> experiencing some analogous problem?
>>
>> Quoting Matteo Guglielmi <matteo.guglielmi at epfl.ch>:
>>
>>> Hello Chris,
>>> I have the same problem with gromacs and did not understand
>>> what's going wrong yet.
>>>
>>> I did not try to run a serial job (as you did) but all my 7 simulations
>>> (6 solvated pores in membranes + 1 protein in water... all of them
>>> with positional restrains - double precision) keep crashing in the
>>> same way.
>>>
>>> Did you finally understand why they do crash (in parallel)?
>>>
>>> How did you compile gromacs?
>>>
>>> I used the intel copilers (ifort icc icpc 9.1 series) whith the
>>> following optimization flags: -O3 -unroll -axT.
>>>
>>> I've also tried the 8.0 series but no chance to get rid of the problem.
>>>
>>> I'm running on woodcrest (xeon cpu 5140 2.33GHz) and xeon cpu
>>> 3.06GHz.
>>>
>>> Thanks for your attention,
>>> MG
>
> Do you use pressure coupling? In principle that can cause problems
> when combined with position restraints. Further once again, please try
> to reproduce the problem with gcc as well. If this is related to
> bugzilla 109 as Chris suggests then please let's continue the
> discussion there.
>
Yes I do (anysotropic pressure).
I got the same problem with the same system using a compiled version of
gromacs
with gcc 3.4.6.
So, in my case, doesn't matter which compiler I do use to compile gromacs
(either Intel series 9.1 or gcc 3.4.6), I always get the same error
right after
the "update" of the Grid size.
....
step 282180, will finish at Tue May 29 15:57:31 2007
step 282190, will finish at Tue May 29 15:57:32 2007
-------------------------------------------------------
Program mdrun_mpi, VERSION 3.3.1
Source code file: nsgrid.c, line: 226
Range checking error:
Explanation: During neighborsearching, we assign each particle to a grid
based on its coordinates. If your system contains collisions or parameter
errors that give particles very high velocities you might end up with some
coordinates being +-Infinity or NaN (not-a-number). Obviously, we cannot
put these on a grid, so this is usually where we detect those errors.
Make sure your system is properly energy-minimized and that the potential
energy seems reasonable before trying again.
Variable ci has value 1472. It should have been within [ 0 .. 1440 ]
Please report this to the mailing list (gmx-users at gromacs.org)
-------------------------------------------------------
"BioBeat is Not Available In Regular Shops" (P.J. Meulenhoff)
Looking forward to have a solution,
thanks to all of you,
MG.
More information about the gromacs.org_gmx-users
mailing list