[gmx-users] [Performance] poor performance with NV V100

Szilárd Páll pall.szilard at gmail.com
Tue Oct 8 14:33:53 CEST 2019


Hi,

Can you please share your log files? we may be able to help with spotting
performance issues or bottlenecks.
However, note that for NVIDIA are the best source to aid you with
reproducing their benchmark numbers, we

Scaling across multiple GPUs requires some tuning of command line options,
please see the related discussion on the list ((briefly: use multiple ranks
per GPU, and one separate PME rank with GPU offload).

Also note that intra-node strong scaling optimization target of recent
releases (there are no p2p optimizations either), however new features
going into the 2020 release will improve things significantly. Keep an eye
out on the beta2/3 releases if you are interested in checking out the new
features.

Cheers,
--
Szilárd


On Mon, Oct 7, 2019 at 7:48 AM Jimmy Chen <catjmc at gmail.com> wrote:

> Hi,
>
> I'm using NV v100 to evaluate if it's suitable to do purchase.
> But I can't get similar test result as referenced performance data
> which was got from internet.
> https://developer.nvidia.com/hpc-application-performance
>
> https://www.hpc.co.jp/images/pdf/benchmark/Molecular-Dynamics-March-2018.pdf
>
>
> No matter using docker tag 18.02 from
> https://ngc.nvidia.com/catalog/containers/hpc:gromacs/tags
>
> or gromacs source code from
> ftp://ftp.gromacs.org/pub/gromacs/gromacs-2019.3.tar.gz
>
> test data set is ADH dodec and water 1.5M
> gmx grompp -f pme_verlet.mdp
> gmx mdrun -ntmpi 1 -nb gpu -pin on -v -noconfout -nsteps 5000 -s topol.tpr
> -ntomp 4
> and  gmx mdrun -ntmpi 2 -nb gpu -pin on -v -noconfout -nsteps 5000 -s
> topol.tpr -ntomp 4
>
> My CPU is Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz
> and GPU is NV V100 16GB PCIE.
>
> For ADH dodec,
> The perf data of 2xV100 16GB PCIE in
> https://developer.nvidia.com/hpc-application-performance is 176 (ns/day).
> But I only can get 28 (ns/day). actually I can get 67(ns/day) with 1xV100.
> I don't know why I got poorer result with 2xV100.
>
> For water 1.5M
> The perf data of 1xV100 16GB PCIE in
>
> https://www.hpc.co.jp/images/pdf/benchmark/Molecular-Dynamics-March-2018.pdf
> is
> 9.83(ns/day) and 2xV100 is 10.41(ns/day).
> But what I got is 6.5(ns/day) with 1xV100 and 2(ns/day) with 2xV100.
>
> Could anyone give me some suggestions about how to clarify what's problem
> to result to this perf data in my environment? Is my command to perform the
> testing wrong? any suggested command to perform the testing?
> or which source code version is recommended to use now?
>
> btw, after checking the code, it seems MPI doesn't go through PCIE P2p or
> RDMA, is it correct? any plan to implement this in MPI?
>
> Best regards,
> Jimmy
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>


More information about the gromacs.org_gmx-users mailing list