[gmx-users] Updating GTX670 PCIE speed from 5GT/s to 8GT/s resulted in about 10% speedup of md_run.

Szilárd Páll pall.szilard at gmail.com
Wed Dec 4 08:53:09 CET 2013


Hi Henk,

Thanks for the useful comments!

When you run on a single GPU, you do get full timing details both on
CPU and GPU - just have a look at the performance tables at the end of
the log file. Alternatively you can simply run nvrpof mdrun .... which
will by default give you a nice overview of profiling output of CUDA
device and API calls.

Regarding the performance improvement, I'm suspecting that you are
probably seeing the full speed improvement that comes from
5GT/s->8GT/s because of the CPU-GPU load imbalance in your run -
probably the CPU one is waiting >20% of the runtime for the GPU to
finish. Hence, in these imbalanced cases any improvement on the GPU
side - transfer or kernel -, will translate straight into decrease in
wall-time.

We are working on a few things that should improve performance in this
scenario like using multiple weakly dependant non-bonded tasks to some
transfer/kernel overlap; non-bonded task splitting for a better load
balance.

Cheers,
--
Szilárd


On Wed, Dec 4, 2013 at 8:28 AM, Henk Neefs <henk.neefs at gmail.com> wrote:
> Below information might be of interest to the Gromacs
> development/optimization team.
>
> What can we derive from the 10% md_run speedup when PCIE3.0 speed increases
> from 5GT/s->8GT/s?
>
> A 60% PCIE speed increase results in a 10% run time reduction.
> Hence about 10/60=17% of the run time gets spent in (non-overlapping) PCIE
> bus communication for this particular configuration and for this particular
> simulated molecular system.
> I'm refering to the "non-overlapping" part as this is the part that is not
> hidden by (not overlapped with) calculations.
>
> So changing the PCIE speed provides a (non-user-friendly) knob to the
> gromacs developers to estimate the part of the run time that is determined
> by the (non-overlapping) PCIE bus communication.
>
> Not sure whether the Nvidia CUDA profiling environment provides a better way
> to quantify this. In case there isn't a better way, above method is a poor
> man's flow (for which you likely need root access) to provide this
> quantification.
> --
> Henk Neefs
> Gromacs user
>
>
> --
> View this message in context: http://gromacs.5086.x6.nabble.com/Updating-GTX670-PCIE-speed-from-5GT-s-to-8GT-s-resulted-in-about-10-speedup-of-md-run-tp5012945p5013031.html
> Sent from the GROMACS Users Forum mailing list archive at Nabble.com.
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.


More information about the gromacs.org_gmx-users mailing list