[gmx-users] Gromacs 2018 and GPU PME

Tue Feb 13 11:41:37 CET 2018

Hi Szilard

Thank you for answering. It did indeed show a significant improvement
with, in particular,

$ gmx mdrun -v -deffnm md -pme gpu -nb gpu -ntmpi 8 -ntomp 6 -npme 1
-gputasks 00000001

I also now understand better how to control each individual
simulation. Your point on maximizing aggregate performance is well
taken :-)

Thanks again
/PK

2018-02-09 16:25 GMT+01:00, Szilárd Páll <pall.szilard at gmail.com>:
> Hi,
>
> First of all,have you read the docs (admittedly somewhat brief):
> http://manual.gromacs.org/documentation/2018/user-guide/mdrun-performance.html#types-of-gpu-tasks
>
> The current PME GPU was optimized for single-GPU runs. Using multiple GPUs
> with PME offloaded works, but this mode hasn't been an optimization target
> and it will often not give very good performance. Using multiple GPUs
> requires a separate PME rank (as you have realized), only one can be used
> (as we don't support PME decomposition on GPUs) and it comes some inherent
> scaling drawbacks. For this reason, unless you _need_ your single run to be
> as fast as possible, you'll be better off running multiple simulations
> side-by side.
>
> A few tips for tuning the performance of a multi-GPU run with PME offload:
> * expect to get at best 1.5 scaling to 2 GPUs (rarely 3 if the tasks allow)
> * generally it's best to use about the same decomposition that you'd use
> with nonbonded-only offload, e.g. in your case 6-8 ranks
> * map the GPU task alone or at most together with 1 PP rank to a GPU, i.e.
> use the new -gputasks option
> e.g. for your case I'd expect the following to work ~best:
> gmx mdrun -v -deffnm md -pme gpu -nb gpu -ntmpi 8 -ntomp 6 -npme 1
> -gputasks 00000001
> or
> gmx mdrun -v -deffnm md -pme gpu -nb gpu -ntmpi 8 -ntomp 6 -npme 1
> -gputasks 00000011
>
>
> Let me know if that gave some improvement.
>
> Cheers,
>
> --
> Szilárd
>
> On Fri, Feb 9, 2018 at 8:51 AM, Gmx QA <gmxquestions at gmail.com> wrote:
>
>> Hi list,
>>
>> I am trying out the new gromacs 2018 (really nice so far), but have a few
>> questions about what command line options I should specify, specifically
>> with the new gnu pme implementation.
>>
>> My computer has two CPUs (with 12 cores each, 24 with hyper threading) and
>> two GPUs, and I currently (with 2018) start simulations like this:
>>
>> $ gmx mdrun -v -deffnm md -pme gpu -nb gpu -ntmpi 2 -npme 1 -ntomp 24
>> -gpu_id 01
>>
>> this works, but gromacs prints the message that 24 omp threads per mpi
>> rank
>> is likely inefficient. However, trying to reduce the number of omp threads
>> I see a reduction in performance. Is this message no longer relevant with
>> gpu pme or am I overlooking something?
>>
>> Thanks
>> /PK
>> --
>> Gromacs Users mailing list
>>
>> * Please search the archive at http://www.gromacs.org/
>> Support/Mailing_Lists/GMX-Users_List before posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> send a mail to gmx-users-request at gromacs.org.
>>
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a
> mail to gmx-users-request at gromacs.org.