[gmx-developers] Re: gmx-developers Digest, Vol 87, Issue 3
hess at cbr.su.se
Thu Jul 14 11:59:08 CEST 2011
On a single core the particles (or more accurately: charge groups) are not
ordered according to the ns grid. The ordering is only done with domain
decomposition. This results in a lot of cache misses during single core
neighbor search (note that Gromacs now runs multi-core by default,
so this is not really an issue that is worth improving).
I think the condition is almost never triggered single core, as we make
sure we only check cell pairs where cg pairs are nearly always in range.
With domain decomposition this is no longer the case, since DD zones
will not necessarily overlap with grid cells, especially with dynamic load
balancing or with a triclinic unit cell. This might be improved in
On 07/11/2011 10:34 PM, Pedro Gonnet wrote:
> Hi Berk,
> Thanks for the reply!
> I still don't really understand what's going on though... My problem is
> the following: on a single CPU, the nsgrid_core function requires
> roughly 40% more time than on two CPUs.
> Using a profiler, I tracked down this difference to the condition
> /* Check if all j's are out of range so we
> * can skip the whole cell.
> * Should save some time, especially with DD.
> if (nrj == 0 ||
> (grida[cgj0]>= max_jcg&&
> (grida[cgj0]>= jcg1 || grida[cgj0+nrj-1]< jcg0)))
> being triggered substantially more often in the two-CPU case than in the
> single-CPU case. In my understanding, in both cases (two or single CPU),
> the same number of cell pairs need to be inspected and hence roughly the
> same computational costs should incurred.
> How, in this case, do the single-CPU and two-CPU cases differ? In the
> single-cell case are particles in cells i and j traversed twice, e.g.
> (i,j) and (j,i)?
> Many thanks,
> On Mon, 2011-07-11 at 12:13 +0200, gmx-developers-request at gromacs.org
>> Date: Mon, 11 Jul 2011 11:26:00 +0200
>> From: Berk Hess<hess at cbr.su.se>
>> Subject: Re: [gmx-developers] Re: Fairly detailed question regarding
>> cell lists in Gromacs in general and nsgrid_core specifically
>> To: Discussion list for GROMACS development
>> <gmx-developers at gromacs.org>
>> Message-ID:<4E1AC1A8.1050402 at cbr.su.se>
>> Content-Type: text/plain; charset=UTF-8; format=flowed
>> This code is for parallel neighbor searching.
>> We have to ensure that pairs are not assigned to multiple processes.
>> In addition with particle decomposition we want to ensure load balancing.
>> With particle decomposition jcg0=icg and jcg1=icg+0.5*#icg, this ensures
>> the two above conditions.
>> For domain decomposition we use the eighth shell method, which use
>> up till 8 zones. Only half of the 8x8 zone pairs should interact.
>> For domain decomposition jcg0 and jcg1 are set such that only the wanted
>> zone pairs interact (zones are ordered such that only consecutive j-zones
>> interact, so a simply check suffices).
>> On 07/06/2011 10:52 AM, Pedro Gonnet wrote:
>>> Hello again,
>>> I had another long look at the code and at the older Gromacs papers and
>>> realized that the main loop over charge groups starts on line 2058 of
>>> ns.c and that the loops in lines 2135, 2151 and 2173 are for the
>>> periodic images.
>>> I still, however, have no idea what the second condition in lines
>>> 2232--2241 of ns.c mean:
>>> /* Check if all j's are out of range so we
>>> * can skip the whole cell.
>>> * Should save some time, especially with DD.
>>> if (nrj == 0 ||
>>> (grida[cgj0]>= max_jcg&&
>>> (grida[cgj0]>= jcg1 || grida[cgj0+nrj-1]< jcg0)))
>>> Does anybody know what max_jcg, jcg1 and jcg0 are? Or does anybody know
>>> where this is documented in detail?
>>> Cheers, Pedro
>>> On Tue, 2011-07-05 at 16:07 +0100, Pedro Gonnet wrote:
>>>> I'm trying to understand how Gromacs builds its neighbor lists and have
>>>> been looking, more specifically, at the function nsgrid_core in ns.c.
>>>> If I understand the underlying data organization correctly, the grid
>>>> (t_grid) contains an array of cells in which the indices of charge
>>>> groups are stored. Pairs of such charge groups are identified and stored
>>>> in the neighbor list (put_in_list).
>>>> What I don't really understand is how these pairs are identified.
>>>> Usually one would loop over all cells, loop over each charge group
>>>> therein, loop over all neighboring cells and store the charge groups
>>>> therein which are within the cutoff distance.
>>>> I assume that the first loop, over all cells, is somehow computed with
>>>> the for-loops starting at lines 2135, 2151 and 2173 of ns.c. However, I
>>>> don't really understand how this is done: What do these loops loop over
>>>> In any case, the coordinates of the particle in the outer loop seem to
>>>> land in the variables XI, YI and ZI. The inner loop (for-loops starting
>>>> in lines 2213, 2216 and 2221 of ns.c) then runs through the neighboring
>>>> cells. If I understand correctly, cj is the id of the neighboring cell,
>>>> nrj the number of charge groups in that cell and cgj0 the offset of the
>>>> charge groups in the data.
>>>> What I don't really understand here are the lines 2232--2241:
>>>> /* Check if all j's are out of range so we
>>>> * can skip the whole cell.
>>>> * Should save some time, especially with DD.
>>>> if (nrj == 0 ||
>>>> (grida[cgj0]>= max_jcg&&
>>>> (grida[cgj0]>= jcg1 || grida[cgj0+nrj-1]< jcg0)))
>>>> Apparently, some cells can be excluded, but what are the exact criteria?
>>>> The test on nrj is somewhat obvious, but what is stored in grid->a?
>>>> There is probably no short answer to my questions, but if anybody could
>>>> at least point me to any documentation or description of how the
>>>> neighbors are collected in this routine, I would be extremely thankful!
>>>> Cheers, Pedro
More information about the gromacs.org_gmx-developers