[gmx-users] Can we set the number of pure PME nodes when using GPU&CPU?
Mark Abraham
mark.j.abraham at gmail.com
Fri Aug 22 13:20:15 CEST 2014
Hi,
Because no work will be sent to them. The GPU implementation can accelerate
domains from PP ranks on their node, but with an MPMD setup that uses
dedicated PME nodes, there will be no PP ranks on nodes that have been set
up with only PME ranks. The two offload models (PP work -> GPU; PME work ->
CPU subset) do not work well together, as I said.
One can devise various schemes in 4.6/5.0 that could use those GPUs, but
they either require
* each node does both PME and PP work (thus limiting scaling because of the
all-to-all for PME, and perhaps making poor use of locality on multi-socket
nodes), or
* that all nodes have PP ranks, but only some have PME ranks, and the nodes
map their GPUs to PP ranks in a way that is different depending on whether
PME ranks are present (which could work well, but relies on the DD
load-balancer recognizing and taking advantage of the faster progress of
the PP ranks that have better GPU support, and requires that you get very
dirty hands laying out PP and PME ranks onto hardware that will later match
the requirements of the DD load balancer, and probably that you balance
PP-PME load manually)
I do not recommend the last approach, because of its complexity.
Clearly there are design decisions to improve. Work is underway.
Cheers,
Mark
On Fri, Aug 22, 2014 at 10:11 AM, Theodore Si <sjyzhxw at gmail.com> wrote:
> Hi Mark,
>
> Could you tell me why that when we are GPU-CPU nodes as PME-dedicated
> nodes, the GPU on such nodes will be idle?
>
>
> Theo
>
> On 8/11/2014 9:36 PM, Mark Abraham wrote:
>
>> Hi,
>>
>> What Carsten said, if running on nodes that have GPUs.
>>
>> If running on a mixed setup (some nodes with GPU, some not), then
>> arranging
>> your MPI environment to place PME ranks on CPU-only nodes is probably
>> worthwhile. For example, all your PP ranks first, mapped to GPU nodes,
>> then
>> all your PME ranks, mapped to CPU-only nodes, and then use mdrun -ddorder
>> pp_pme.
>>
>> Mark
>>
>>
>> On Mon, Aug 11, 2014 at 2:45 AM, Theodore Si <sjyzhxw at gmail.com> wrote:
>>
>> Hi Mark,
>>>
>>> This is information of our cluster, could you give us some advice as
>>> regards to our cluster so that we can make GMX run faster on our system?
>>>
>>> Each CPU node has 2 CPUs and each GPU node has 2 CPUs and 2 Nvidia K20M
>>>
>>>
>>> Device Name Device Type Specifications Number
>>> CPU Node IntelH2216JFFKRNodes CPU: 2×Intel Xeon E5-2670(8
>>> Cores,
>>> 2.6GHz, 20MB Cache, 8.0GT)
>>> Mem: 64GB(8×8GB) ECC Registered DDR3 1600MHz Samsung Memory 332
>>> Fat Node IntelH2216WPFKRNodes CPU: 2×Intel Xeon E5-2670(8
>>> Cores,
>>> 2.6GHz, 20MB Cache, 8.0GT)
>>> Mem: 256G(16×16G) ECC Registered DDR3 1600MHz Samsung Memory 20
>>> GPU Node IntelR2208GZ4GC CPU: 2×Intel Xeon E5-2670(8
>>> Cores,
>>> 2.6GHz, 20MB Cache, 8.0GT)
>>> Mem: 64GB(8×8GB) ECC Registered DDR3 1600MHz Samsung Memory 50
>>> MIC Node IntelR2208GZ4GC CPU: 2×Intel Xeon E5-2670(8
>>> Cores,
>>> 2.6GHz, 20MB Cache, 8.0GT)
>>> Mem: 64GB(8×8GB) ECC Registered DDR3 1600MHz Samsung Memory 5
>>> Computing Network Switch Mellanox Infiniband FDR Core Switch
>>> 648× FDR Core Switch MSX6536-10R, Mellanox Unified Fabric Manager 1
>>> Mellanox SX1036 40Gb Switch 36× 40Gb Ethernet Switch SX1036, 36× QSFP
>>> Interface 1
>>> Management Network Switch Extreme Summit X440-48t-10G 2-layer
>>> Switch
>>> 48× 1Giga Switch Summit X440-48t-10G, authorized by ExtremeXOS 9
>>> Extreme Summit X650-24X 3-layer Switch 24× 10Giga 3-layer Ethernet
>>> Switch
>>> Summit X650-24X, authorized by ExtremeXOS 1
>>> Parallel Storage DDN Parallel Storage System DDN SFA12K
>>> Storage
>>> System 1
>>> GPU GPU Accelerator NVIDIA Tesla Kepler K20M 70
>>> MIC MIC Intel Xeon Phi 5110P Knights Corner 10
>>> 40Gb Ethernet Card MCX314A-BCBT Mellanox ConnextX-3 Chip 40Gb
>>> Ethernet Card
>>> 2× 40Gb Ethernet ports, enough QSFP cables 16
>>> SSD Intel SSD910 Intel SSD910 Disk, 400GB, PCIE 80
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 8/10/2014 5:50 AM, Mark Abraham wrote:
>>>
>>> That's not what I said.... "You can set..."
>>>>
>>>> -npme behaves the same whether or not GPUs are in use. Using separate
>>>> ranks
>>>> for PME caters to trying to minimize the cost of the all-to-all
>>>> communication of the 3DFFT. That's still relevant when using GPUs, but
>>>> if
>>>> separate PME ranks are used, any GPUs on nodes that only have PME ranks
>>>> are
>>>> left idle. The most effective approach depends critically on the
>>>> hardware
>>>> and simulation setup, and whether you pay money for your hardware.
>>>>
>>>> Mark
>>>>
>>>>
>>>> On Sat, Aug 9, 2014 at 2:56 AM, Theodore Si <sjyzhxw at gmail.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>>> You mean no matter we use GPU acceleration or not, -npme is just a
>>>>> reference?
>>>>> Why we can't set that to a exact value?
>>>>>
>>>>>
>>>>> On 8/9/2014 5:14 AM, Mark Abraham wrote:
>>>>>
>>>>> You can set the number of PME-only ranks with -npme. Whether it's
>>>>> useful
>>>>>
>>>>>> is
>>>>>> another matter :-) The CPU-based PME offload and the GPU-based PP
>>>>>> offload
>>>>>> do not combine very well.
>>>>>>
>>>>>> Mark
>>>>>>
>>>>>>
>>>>>> On Fri, Aug 8, 2014 at 7:24 AM, Theodore Si <sjyzhxw at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Can we set the number manually with -npme when using GPU
>>>>>>> acceleration?
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Gromacs Users mailing list
>>>>>>>
>>>>>>> * Please search the archive at http://www.gromacs.org/
>>>>>>> Support/Mailing_Lists/GMX-Users_List before posting!
>>>>>>>
>>>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>>>
>>>>>>> * For (un)subscribe requests visit
>>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>>>>>>> or
>>>>>>> send a mail to gmx-users-request at gromacs.org.
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>> Gromacs Users mailing list
>>>>>
>>>>> * Please search the archive at http://www.gromacs.org/
>>>>> Support/Mailing_Lists/GMX-Users_List before posting!
>>>>>
>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>
>>>>> * For (un)subscribe requests visit
>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>>>> send a mail to gmx-users-request at gromacs.org.
>>>>>
>>>>>
>>>>> --
>>> Gromacs Users mailing list
>>>
>>> * Please search the archive at http://www.gromacs.org/
>>> Support/Mailing_Lists/GMX-Users_List before posting!
>>>
>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>
>>> * For (un)subscribe requests visit
>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>> send a mail to gmx-users-request at gromacs.org.
>>>
>>>
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/
> Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>
More information about the gromacs.org_gmx-users
mailing list