[gmx-users] How to assign PME ranks to particular nodes?

Marcin Mielniczuk marcin at golem.network
Thu Dec 19 17:08:35 CET 2019


Hi,

Thanks for your reply.

On 13.12.2019 00:07, Mark Abraham wrote:
> Hi,
>
> On Thu., 12 Dec. 2019, 20:27 Marcin Mielniczuk, <marcin at golem.network>
> wrote:
>
>> Hi,
>>
>> I'm running Gromacs on a heterogenous cluster, with one node
>> significantly faster than the other. Therefore, I'd like to achieve the
>> following setup:
>> * run 2 or 3 PP processes and 1 PME process on the faster node (with a
>> lower number of OpenMP threads)
>> * run 2 PP processes on the slower node (with a lower number of OpenMP
>> threads)
>>
> The special case of n PP ranks and 1 PME rank is relatively easy. Just
> arrange for the node to get the PME rank to be the last node to be
> allocated ranks, and then both the relevant -ddorder settings produce the
> same mapping of duty to rank. As always, if the nodes have multiple
> sockets, then you want to avoid splitting any single rank over the socket
> boundary; sometimes that can severely restrict the solution space.
>
> Setting the number of threads is easy: one may just create a wrapper
>> script on every node setting OMP_NUM_THREADS and GMX_PME_NUM_THREADS and
>> use it instead of gmx_mpi. The rank assignment is more difficult.
>>
>> It's possible to use the OpenMPI --map-by option to control how ranks
>> are mapped to nodes and use -ddorder pp_pme in Gromacs, so that the last
>> rank is the PME rank, but this may possibly negatively affect the
>> topology of the PP ranks.
> I don't think there's a reason why that might be expected, do you know of
> one?

This is an experimental result: when I ran my workload using `mpirun
--bynode ...`, all of the cores used 100% CPU. On the other hand, when
using a rankfile, only one thread per process used 100% CPU, the rest of
them exhibited ~70% CPU use. I'm experiencing similar behavior when
using only one process per node (no PME ranks, just 2 PP ranks 8 threads
each).

This looks to me as though blocking network operations, which are
conducted on the main thread, took too much time. This sounds like a
plausible reason, because I'm running on 10GigE.

Thanks,
Marcin

> An alternative is to use an OpenMPI rankfile,
>> which requires me to specify precisely how all the ranks are mapped, not
>> just the rank counts.
>>
>> Is there any better way to set the number of PP/PME ranks on particular
>> nodes?
>>
> No, heterogeneous cases are up to you to solve in the kind of way you are
> doing!
>
> Mark
>
> Thanks,
>> Marcin
>>
>> --
>> Gromacs Users mailing list
>>
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> send a mail to gmx-users-request at gromacs.org.
>>




More information about the gromacs.org_gmx-users mailing list