[gmx-developers] Lost particles while sorting

Berk Hess hess at kth.se
Fri Nov 8 14:48:36 CET 2013


Hi,

I assume this is with GPUs.
If you run in a debugger, break on exit, can you tell me which 
sort_atoms call this comes from?

On how many MPI ranks is this?
If I can easily run this, could you mail me the tpr and the run settings?

Cheers,

Berk

On 11/08/2013 02:30 PM, Carsten Kutzner wrote:
> Hi,
>
> using a just checked-out 4.6 branch compiled with debug checks I get
>
> -------------------------------------------------------
> Program mdrun, VERSION 4.6.4-dev-20131107-ba8232e
> Source code file: 
> /home/ckutzne/junoworkspace/git-gromacs-vanilla/src/mdlib/nbnxn_search.c, 
> line: 609
>
> Fatal error:
> (int)((x[74522][x]=11.764535 - 10.229600)*58.394176) = 89, not in 0 - 16*4
>
> For more information and tips for troubleshooting, please check the 
> GROMACS
> website at http://www.gromacs.org/Documentation/Errors
> -------------------------------------------------------
>
> Carsten
>
>
> On 11/08/2013 02:00 PM, Berk Hess wrote:
>> On 11/08/2013 01:44 PM, Mark Abraham wrote:
>>>
>>>
>>>
>>> On Fri, Nov 8, 2013 at 12:58 PM, Carsten Kutzner <ckutzne at gwdg.de 
>>> <mailto:ckutzne at gwdg.de>> wrote:
>>>
>>>     Hi Mark, hi Berk,
>>>
>>>     On Nov 7, 2013, at 6:48 PM, Berk Hess <hess at kth.se
>>>     <mailto:hess at kth.se>> wrote:
>>>
>>>     > Hi Carsten,
>>>     >
>>>     > After how many steps does this happen?
>>>     this happens immedeately at startup.
>>>
>>>     > Could you run with a debug build (or without NDEBUG defined)?
>>>     > I added a lot of checks, not done with NDEBUG, in the fix for
>>>     the issue you linked.
>>>     Will do that now.
>>>
>>>     > On 11/07/2013 06:27 PM, Mark Abraham wrote:
>>>     >> Unclear. 6583c94 is one of your commits. Some very recent
>>>     stuff has been playing with nstlist and rlist (safely, or so we
>>>     thought.) Can you reproduce with mainstream release-4-6?
>>>     This is basically mainstream 4-6, since in my commit I only
>>>     changed the default behavior of
>>>     appending to no.
>>>
>>>
>>> Right. What's the mainstream parent commit? I was going to release 
>>> 4.6.4 today - if you're based off the current tip then maybe we 
>>> shouldn't. If you're based off code a month back then we know the 
>>> problem, if any, is of longer standing.
>> This is 4.6.4-dev which seems to include my fix for the previous 
>> issue, so this issue is surely present in the current 4-6-release 
>> branch. It must be due to a somewhat exotic condition, since this 
>> code is widely used and we haven't had other reports.
>>
>> I think it should be easy to track this down with all the debug 
>> checks in the code.
>> And if Carsten can send me his system and the conditions to reproduce 
>> it, I can also help with debugging.
>>
>> Cheers,
>>
>> Berk
>>>
>>> Mark
>>>
>>>
>>>     Carsten
>>>
>>>     >>
>>>     >> Mark
>>>     >>
>>>     >>
>>>     >> On Thu, Nov 7, 2013 at 5:18 PM, Carsten Kutzner
>>>     <ckutzne at gwdg.de <mailto:ckutzne at gwdg.de>> wrote:
>>>     >> Hi,
>>>     >>
>>>     >> we have a 120k atom system that crashes with
>>>     >>
>>>     >> ------------------------------------------------------
>>>     >> Program mdrun_mpi, VERSION 4.6.4-dev-20131015-6583c94
>>>     >> Source code file: /home/c/gromacs/src/mdlib/nbnxn_search.c,
>>>     line: 685
>>>     >>
>>>     >> Software inconsistency error:
>>>     >> Lost particles while sorting
>>>     >> For more information and tips for troubleshooting, please
>>>     check the GROMACS
>>>     >> website at http://www.gromacs.org/Documentation/Errors
>>>     >> -------------------------------------------------------
>>>     >>
>>>     >> if run with >= 2 MPI processes on a GPU and small values for
>>>     nstlist. On my workstation,
>>>     >> nstlist = 34 and larger works, whereas nstlist <= 33 lead to
>>>     the above problem.
>>>     >>
>>>     >> Another system (60k atoms) does not produce this problem, so
>>>     system size seems
>>>     >> to matter as well.
>>>     >>
>>>     >> Looks like an old ghost:
>>>     >>
>>>     >> http://redmine.gromacs.org/issues/1153
>>>     >>
>>>     >>
>>>     >> Should I file a redmine issue?
>>>     >>
>>>     >> Carsten
>>>     >>
>>>     >>
>>>     >> --
>>>     >> gmx-developers mailing list
>>>     >> gmx-developers at gromacs.org <mailto:gmx-developers at gromacs.org>
>>>     >> http://lists.gromacs.org/mailman/listinfo/gmx-developers
>>>     >> Please don't post (un)subscribe requests to the list. Use the
>>>     www interface or send it to gmx-developers-request at gromacs.org
>>>     <mailto:gmx-developers-request at gromacs.org>.
>>>     >>
>>>     >>
>>>     >>
>>>     >
>>>     > --
>>>     > gmx-developers mailing list
>>>     > gmx-developers at gromacs.org <mailto:gmx-developers at gromacs.org>
>>>     > http://lists.gromacs.org/mailman/listinfo/gmx-developers
>>>     > Please don't post (un)subscribe requests to the list. Use the
>>>     > www interface or send it to gmx-developers-request at gromacs.org
>>>     <mailto:gmx-developers-request at gromacs.org>.
>>>
>>>     --
>>>     gmx-developers mailing list
>>>     gmx-developers at gromacs.org <mailto:gmx-developers at gromacs.org>
>>>     http://lists.gromacs.org/mailman/listinfo/gmx-developers
>>>     Please don't post (un)subscribe requests to the list. Use the
>>>     www interface or send it to gmx-developers-request at gromacs.org
>>>     <mailto:gmx-developers-request at gromacs.org>.
>>>
>>>
>>>
>>>
>>
>>
>>
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20131108/79ab87df/attachment.html>


More information about the gromacs.org_gmx-developers mailing list