[gmx-developers] Lost particles while sorting
Berk Hess
hess at kth.se
Fri Nov 8 14:48:36 CET 2013
Hi,
I assume this is with GPUs.
If you run in a debugger, break on exit, can you tell me which
sort_atoms call this comes from?
On how many MPI ranks is this?
If I can easily run this, could you mail me the tpr and the run settings?
Cheers,
Berk
On 11/08/2013 02:30 PM, Carsten Kutzner wrote:
> Hi,
>
> using a just checked-out 4.6 branch compiled with debug checks I get
>
> -------------------------------------------------------
> Program mdrun, VERSION 4.6.4-dev-20131107-ba8232e
> Source code file:
> /home/ckutzne/junoworkspace/git-gromacs-vanilla/src/mdlib/nbnxn_search.c,
> line: 609
>
> Fatal error:
> (int)((x[74522][x]=11.764535 - 10.229600)*58.394176) = 89, not in 0 - 16*4
>
> For more information and tips for troubleshooting, please check the
> GROMACS
> website at http://www.gromacs.org/Documentation/Errors
> -------------------------------------------------------
>
> Carsten
>
>
> On 11/08/2013 02:00 PM, Berk Hess wrote:
>> On 11/08/2013 01:44 PM, Mark Abraham wrote:
>>>
>>>
>>>
>>> On Fri, Nov 8, 2013 at 12:58 PM, Carsten Kutzner <ckutzne at gwdg.de
>>> <mailto:ckutzne at gwdg.de>> wrote:
>>>
>>> Hi Mark, hi Berk,
>>>
>>> On Nov 7, 2013, at 6:48 PM, Berk Hess <hess at kth.se
>>> <mailto:hess at kth.se>> wrote:
>>>
>>> > Hi Carsten,
>>> >
>>> > After how many steps does this happen?
>>> this happens immedeately at startup.
>>>
>>> > Could you run with a debug build (or without NDEBUG defined)?
>>> > I added a lot of checks, not done with NDEBUG, in the fix for
>>> the issue you linked.
>>> Will do that now.
>>>
>>> > On 11/07/2013 06:27 PM, Mark Abraham wrote:
>>> >> Unclear. 6583c94 is one of your commits. Some very recent
>>> stuff has been playing with nstlist and rlist (safely, or so we
>>> thought.) Can you reproduce with mainstream release-4-6?
>>> This is basically mainstream 4-6, since in my commit I only
>>> changed the default behavior of
>>> appending to no.
>>>
>>>
>>> Right. What's the mainstream parent commit? I was going to release
>>> 4.6.4 today - if you're based off the current tip then maybe we
>>> shouldn't. If you're based off code a month back then we know the
>>> problem, if any, is of longer standing.
>> This is 4.6.4-dev which seems to include my fix for the previous
>> issue, so this issue is surely present in the current 4-6-release
>> branch. It must be due to a somewhat exotic condition, since this
>> code is widely used and we haven't had other reports.
>>
>> I think it should be easy to track this down with all the debug
>> checks in the code.
>> And if Carsten can send me his system and the conditions to reproduce
>> it, I can also help with debugging.
>>
>> Cheers,
>>
>> Berk
>>>
>>> Mark
>>>
>>>
>>> Carsten
>>>
>>> >>
>>> >> Mark
>>> >>
>>> >>
>>> >> On Thu, Nov 7, 2013 at 5:18 PM, Carsten Kutzner
>>> <ckutzne at gwdg.de <mailto:ckutzne at gwdg.de>> wrote:
>>> >> Hi,
>>> >>
>>> >> we have a 120k atom system that crashes with
>>> >>
>>> >> ------------------------------------------------------
>>> >> Program mdrun_mpi, VERSION 4.6.4-dev-20131015-6583c94
>>> >> Source code file: /home/c/gromacs/src/mdlib/nbnxn_search.c,
>>> line: 685
>>> >>
>>> >> Software inconsistency error:
>>> >> Lost particles while sorting
>>> >> For more information and tips for troubleshooting, please
>>> check the GROMACS
>>> >> website at http://www.gromacs.org/Documentation/Errors
>>> >> -------------------------------------------------------
>>> >>
>>> >> if run with >= 2 MPI processes on a GPU and small values for
>>> nstlist. On my workstation,
>>> >> nstlist = 34 and larger works, whereas nstlist <= 33 lead to
>>> the above problem.
>>> >>
>>> >> Another system (60k atoms) does not produce this problem, so
>>> system size seems
>>> >> to matter as well.
>>> >>
>>> >> Looks like an old ghost:
>>> >>
>>> >> http://redmine.gromacs.org/issues/1153
>>> >>
>>> >>
>>> >> Should I file a redmine issue?
>>> >>
>>> >> Carsten
>>> >>
>>> >>
>>> >> --
>>> >> gmx-developers mailing list
>>> >> gmx-developers at gromacs.org <mailto:gmx-developers at gromacs.org>
>>> >> http://lists.gromacs.org/mailman/listinfo/gmx-developers
>>> >> Please don't post (un)subscribe requests to the list. Use the
>>> www interface or send it to gmx-developers-request at gromacs.org
>>> <mailto:gmx-developers-request at gromacs.org>.
>>> >>
>>> >>
>>> >>
>>> >
>>> > --
>>> > gmx-developers mailing list
>>> > gmx-developers at gromacs.org <mailto:gmx-developers at gromacs.org>
>>> > http://lists.gromacs.org/mailman/listinfo/gmx-developers
>>> > Please don't post (un)subscribe requests to the list. Use the
>>> > www interface or send it to gmx-developers-request at gromacs.org
>>> <mailto:gmx-developers-request at gromacs.org>.
>>>
>>> --
>>> gmx-developers mailing list
>>> gmx-developers at gromacs.org <mailto:gmx-developers at gromacs.org>
>>> http://lists.gromacs.org/mailman/listinfo/gmx-developers
>>> Please don't post (un)subscribe requests to the list. Use the
>>> www interface or send it to gmx-developers-request at gromacs.org
>>> <mailto:gmx-developers-request at gromacs.org>.
>>>
>>>
>>>
>>>
>>
>>
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20131108/79ab87df/attachment.html>
More information about the gromacs.org_gmx-developers
mailing list