[gmx-developers] Gromacs with GPU

Fri Sep 22 17:04:18 CEST 2017

On Fri, Sep 22, 2017 at 4:38 PM, Åke Sandgren <ake.sandgren at hpc2n.umu.se> wrote:
> Ok, so there has been some development on that in later versions then?

Are you referring to tune_pme? If so, the answer is (AFAIK) no.

> Anyway, i use gmx tune_pme as a easy wasy to get it to test lots of npme
> values quickly.
> And then submit LOTS of jobs with different number of mpi-tasks,
> threads, gpus, knl settings, etc.

Sounds very reasonable. Note that the the various load-balancers can
interact unfavorably; this is something we tried to address in the
2016 release by observing what the effect of DLB is and turning it off
it is _seems_ to harm performance.
For the next release, we've made significant improvements in the way
load-balancing measures load and this should improve its the
robustness in a wide range of cases. The run will also communicate
much better to the user what did the load balancing do (and why)!

The robustness of the PP-PME balancer, especially with GPUs also begs
fo improvements (and I have plans to work on that), but the problem is
not straightforward to address. If anybody is interested to
contribute, do get in touch!

> We need to get a better understanding on what to recommend to users on
> the different node types.

Good to hear that you're taking that path!

If you haven't yet, I'd recommend checking out the benchmarking
methodology, analysis and conclusions of a paper we wrote on this
topic (http://onlinelibrary.wiley.com/doi/10.1002/jcc.24030/full).
Some of the specifics (like hardware generation, compiler versions,
former limitations) have surely gotten dated since, but overall the
mdrun offload-based acceleration behaves the same way, so most of the
information is transferable (with appropriate factors to core counts,
per-socket and per GPU speeds, etc.)!

> (And i like to play around a bit)
>
> On 09/22/2017 04:14 PM, Szilárd Páll wrote:
>> Oh tune_pme is quite unaware about GPU acceleration and I would
>> actually recommend using it only to tune PME rank count -- which there
>> is often not much to tune about because the PME load on the CPUs
>> typically requires around half the cores.
>>
>> You also have #threads per rank that you'd often want to vary, but
>> you'd have to do manually.
>>
>> For the cutoff tuning, I believe the built-in PP-PME balancer is
>> generally better (AFAIK tune_pme doesn't even use the same  grid size
>> picking).
>>
>> --
>> Szilárd
>>
>>
>> On Fri, Sep 22, 2017 at 4:02 PM, Åke Sandgren <ake.sandgren at hpc2n.umu.se> wrote:
>>> Btw, I mostly see this problem when running gmx tune_pme when the npme
>>> under testing results in there being to few PP processes to utilize all
>>> GPU engines..
>>>
>>> On 09/22/2017 01:54 PM, Åke Sandgren wrote:
>>>> Ok, then I'll have to write better instructions for our users.
>>>>
>>>> On 09/22/2017 01:19 PM, Mark Abraham wrote:
>>>>> Hi,
>>>>>
>>>>> Currently the hwloc support doesn't do much except inspect the hardware
>>>>> and help produce the report in the log file. Obviously it should be used
>>>>> for helping with such placement tasks, and that's why we're adopting it,
>>>>> but nobody has prioritized specializing the task-assignment code yet.
>>>>> For the moment, such things are still up to the user.
>>>>>
>>>>> Mark
>>>>>
>>>>> On Fri, Sep 22, 2017 at 1:10 PM Åke Sandgren <ake.sandgren at hpc2n.umu.se
>>>>> <mailto:ake.sandgren at hpc2n.umu.se>> wrote:
>>>>>
>>>>>     Hi!
>>>>>
>>>>>     I am seeing a possible performance enhancement (possibly) when running
>>>>>     gromacs on nodes with multiple gpu cards.
>>>>>     (And yes I know this is perhaps a mote point since current GPU cards
>>>>>     don't have dual engines per card)
>>>>>
>>>>>     System:
>>>>>     dual socket 14-core broadwell cpus
>>>>>     2 K80 cards, one on each socket.
>>>>>
>>>>>     Gromacs built with hwloc support.
>>>>>
>>>>>     When running a dual node (56 core)
>>>>>
>>>>>     gmx_mpi mdrun -npme 4 -s ion_channel_bench00.tpr -resetstep 20000 -o
>>>>>     bench.trr -x bench.xtc -cpo bench.cpt -c bench.gro -e bench.edr -g
>>>>>     bench.log -ntomp 7 -pin on -dlb yes
>>>>>
>>>>>     job, (slurm + cgroups), gromacs doesn't fully take hwloc info into
>>>>>     account. The job correctly gets allocated on cores, but looking at
>>>>>     nvidia-smi and hwloc-ps i can see that the PP processes are using a
>>>>>     suboptimal selection of GPU engines.
>>>>>
>>>>>     The PP processes are placed one on each CPU socket (according to which
>>>>>     process-ids are using the GPUs and the position of those pids according
>>>>>     to hwloc-ps), but they both uses gpu engines from the same (first)
>>>>>     K80 card.
>>>>>
>>>>>     It would be better to have looked at the hwloc info and selected CUDA
>>>>>     devices 0,2 (or 1,3) instead of 0,1.
>>>>>
>>>>>
>>>>>     Any comments on that?
>>>>>
>>>>>     Attached nvidia-smi + hwloc-ps output
>>>>>
>>>>>     --
>>>>>     Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
>>>>>     Internet: ake at hpc2n.umu.se <mailto:ake at hpc2n.umu.se>   Phone: +46 90
>>>>>     7866134 <tel:090-786%2061%2034> Fax: +46 90-580 14 <tel:090-580%2014>
>>>>>     Mobile: +46 70 7716134 <tel:070-771%2061%2034> WWW:
>>>>>     http://www.hpc2n.umu.se
>>>>>     --
>>>>>     Gromacs Developers mailing list
>>>>>
>>>>>     * Please search the archive at
>>>>>     http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List
>>>>>     before posting!
>>>>>
>>>>>     * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>
>>>>>     * For (un)subscribe requests visit
>>>>>     https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>>>>>     or send a mail to gmx-developers-request at gromacs.org
>>>>>     <mailto:gmx-developers-request at gromacs.org>.
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>> --
>>> Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
>>> Internet: ake at hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90-580 14
>>> Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se
>>> --
>>> Gromacs Developers mailing list
>>>
>>> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before posting!
>>>
>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>
>>> * For (un)subscribe requests visit
>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers or send a mail to gmx-developers-request at gromacs.org.
>
> --
> Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
> Internet: ake at hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90-580 14
> Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se
> --
> Gromacs Developers mailing list
>
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers or send a mail to gmx-developers-request at gromacs.org.