[gmx-users] GPU and aux power supply

Fri Jul 3 00:25:52 CEST 2015

Hi,

Sharing log files on a pastebin-like service is more effective :-) Just
observing a performance number doesn't help much without the full context.

Mark

On Thu, Jul 2, 2015 at 11:56 PM Alex <nedomacho at gmail.com> wrote:

> Here's for the CPU-only run:
>
>  R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
>
> On 1 MPI rank, each using 4 OpenMP threads
>
>  Computing:          Num   Num      Call    Wall time         Giga-Cycles
>                      Ranks Threads  Count      (s)         total sum    %
>
> -----------------------------------------------------------------------------
>  Neighbor search        1    4      25001      17.536        168.352   0.6
>  Force                  1    4    1000001    1047.980      10061.133  37.5
>  PME mesh               1    4    1000001    1661.611      15952.292  59.4
>  NB X/F buffer ops.     1    4    1975001      30.176        289.700   1.1
>  COM pull force         1    4    1000001       7.282         69.909   0.3
>  Write traj.            1    4         64       0.402          3.860   0.0
>  Update                 1    4    1000001      19.559        187.772   0.7
>  Rest                                          13.141        126.156   0.5
>
> -----------------------------------------------------------------------------
>  Total                                       2797.685      26859.173 100.0
>
> -----------------------------------------------------------------------------
>  Breakdown of PME mesh computation
>
> -----------------------------------------------------------------------------
>  PME spread/gather      1    4    2000002     318.488       3057.640  11.4
>  PME 3D-FFT             1    4    2000002    1091.863      10482.433  39.0
>  PME solve Elec         1    4    1000001     247.867       2379.646   8.9
>
> -----------------------------------------------------------------------------
>
>                Core t (s)   Wall t (s)        (%)
>        Time:    11193.860     2797.685      400.1
>                          46:37
>                  (ns/day)    (hour/ns)
> Performance:       30.883        0.777
> Finished mdrun on rank 0 Thu Jul  2 17:06:08 2015
>
>
> On Thu, Jul 2, 2015 at 2:47 PM, Alex <nedomacho at gmail.com> wrote:
>
> > Szilárd,
> >
> > I was wrong. When I run with GPU and use -ntomp 4, I have 400% CPU
> > utilization and that yields about 83 ns/day. When I do -ntomp 4 -nb cpu,
> I
> > get 1600% CPU utilization and get similar results.
> > However, when I run -nt 4 -nb cpu, I get 400% CPU utilization, and then
> it
> > is slower. I am doing a short test, will send the stats later on.
> >
> > The stats from GPU-accelerated (-ntomp 4) are below. Pretty poor CPU-GPU
> > sync here, actually. Will post the log for CPU-only run once it finishes.
> >
> > R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
> >
> > On 1 MPI rank, each using 4 OpenMP threads
> >
> >  Computing:          Num   Num      Call    Wall time         Giga-Cycles
> >                      Ranks Threads  Count      (s)         total sum    %
> >
> >
> -----------------------------------------------------------------------------
> >  Neighbor search        1    4    2500001    1881.311      18061.525
>  1.8
> >  Launch GPU ops.        1    4  100000001    4713.584      45252.759
>  4.5
> >  Force                  1    4  100000001   66892.607     642202.401
> 63.5
> >  PME mesh               1    4  100000001   25192.879     241864.204
> 23.9
> >  Wait GPU local         1    4  100000001     869.481       8347.456
>  0.8
> >  NB X/F buffer ops.     1    4  197500001    2014.227      19337.585
>  1.9
> >  COM pull force         1    4  100000001     704.950       6767.871
>  0.7
> >  Write traj.            1    4       6118      15.348        147.345
>  0.0
> >  Update                 1    4  100000001    1747.965      16781.332
>  1.7
> >  Rest                                        1364.705      13101.849
>  1.3
> >
> >
> -----------------------------------------------------------------------------
> >  Total                                     105397.057    1011864.328
> 100.0
> >
> >
> -----------------------------------------------------------------------------
> >  Breakdown of PME mesh computation
> >
> >
> -----------------------------------------------------------------------------
> >  PME spread/gather      1    4  200000002   12874.626     123602.829
> 12.2
> >  PME 3D-FFT             1    4  200000002    9285.345      89143.948
>  8.8
> >  PME solve Elec         1    4  100000001    2746.973      26372.313
>  2.6
> >
> >
> -----------------------------------------------------------------------------
> >
> >  GPU timings
> >
> >
> -----------------------------------------------------------------------------
> >  Computing:                         Count  Wall t (s)      ms/step
>  %
> >
> >
> -----------------------------------------------------------------------------
> >  Pair list H2D                    2500001     124.145        0.050
>  0.4
> >  X / q H2D                      100000001    2089.623        0.021
>  6.0
> >  Nonbonded F kernel              97000000   30164.146        0.311
> 86.2
> >  Nonbonded F+ene k.                500000     227.896        0.456
>  0.7
> >  Nonbonded F+prune k.             2000000     708.250        0.354
>  2.0
> >  Nonbonded F+ene+prune k.          500001     223.082        0.446
>  0.6
> >  F D2H                          100000001    1465.277        0.015
>  4.2
> >
> >
> -----------------------------------------------------------------------------
> >  Total                                      35002.419        0.350
>  100.0
> >
> >
> -----------------------------------------------------------------------------
> >
> > Force evaluation time GPU/CPU: 0.350 ms/0.921 ms = 0.380
> > For optimal performance this ratio should be close to 1!
> >
> >
> > NOTE: The GPU has >25% less load than the CPU. This imbalance causes
> >       performance loss.
> >
> >                Core t (s)   Wall t (s)        (%)
> >        Time:   421720.882   105397.057      400.1
> >                          1d05h16:37
> >                  (ns/day)    (hour/ns)
> > Performance:       81.976        0.293
> > Finished mdrun on rank 0 Thu Jul  2 02:29:57 2015
> >
> >
> > On Thu, Jul 2, 2015 at 7:57 AM, Szilárd Páll <pall.szilard at gmail.com>
> > wrote:
> >
> >> I'm curious what are the conditions under which you get such a
> >> exceptional speedup. Can you share your input files and/or log files?
> >>
> >> --
> >> Szilárd
> >>
> >> On Thu, Jul 2, 2015 at 2:18 AM, Alex <nedomacho at gmail.com> wrote:
> >>
> >>> Yup, about 7-8 times between with and without GPU acceleration, not
> >>> making this up: I had 11 ns/day and now ~80-87 ns/day, the numbers
> vary a
> >>> bit. I've been getting a similar boost on our GPU-accelerated cluster
> node
> >>> (dual core i7, 8 cores each) with two Tesla C2075 cards (I am
> directing my
> >>> simulations to one of them via -gpu_id).
> >>> All runs are -ntomp 4, with or without GPU. The physics in all cases is
> >>> perfectly acceptable. So far I only tested my new box on vacuum
> >>> simulations, about to run the solvated version (~30K particles).
> >>>
> >>> Alex
> >>>
> >>>
> >>> On Wed, Jul 1, 2015 at 6:09 PM, Szilárd Páll <pall.szilard at gmail.com>
> >>> wrote:
> >>>
> >>>> Hmmm, 8x sounds rather high, are you sure you are comparing to
> CPU-only
> >>>> runs that use proper SIMD optimized kernels?
> >>>>
> >>>> Because of the way offload-based acceleration works, the CPU and GPU
> >>>> will inherently be executing concurrently only part of the runtime
> and as a
> >>>> consequence the GPU is idle part of the run-time (during
> >>>> integration+constraints). You can make use of this idle time by
> running
> >>>> multiple independent simulations concurrently. This can yield serious
> >>>> improvements in terms of _aggregate_ simulation performance
> especially with
> >>>> small inputs and many cores (see slide 51 https://goo.gl/7DnSri)/
> >>>>
> >>>> --
> >>>> Szilárd
> >>>>
> >>>> On Wed, Jul 1, 2015 at 4:16 AM, Alex <nedomacho at gmail.com> wrote:
> >>>>
> >>>>>  I am happy to say that I am getting an 8-fold increase in simulation
> >>>>> speeds for $200.
> >>>>>
> >>>>>
> >>>>> An additional question: normally, how many simulations (separate
> >>>>> mdruns on separate CPU cores) can be performed simultaneously on a
> single
> >>>>> GPU? Say, for 20-40K particle sized simulations.
> >>>>>
> >>>>> The coolers are not even spinning during a single test (mdrun -ntomp
> >>>>> 4), and I get massive acceleration. They aren't broken, the card is
> just
> >>>>> cool (small system, ~3K particles).
> >>>>>
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>>
> >>>>> Alex
> >>>>>
> >>>>>
> >>>>>
> >>>>>   >
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>   >
> >>>>>
> >>>>> Ah, ok, so you can get a 6-pin from the PSU and another from a
> >>>>> converted molex connector. That should be just fine, especially as
> the card
> >>>>> should will not pull more than ~155W (under heavy graphics load)
> based on
> >>>>> the Tomshardware review* and you are providing 225W max.
> >>>>>
> >>>>>
> >>>>>
> >>>>> *
> >>>>>
> http://www.tomshardware.com/reviews/evga-super-super-clocked-gtx-960,4063-3.html
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>>
> >>>>> Szilárd
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Tue, Jun 30, 2015 at 7:31 PM, Alex <nedomacho at gmail.com> wrote:
> >>>>>
> >>>>>
> >>>>> Well, I don't have one like this. What I have instead is this:
> >>>>>
> >>>>>
> >>>>> 1. A single 6-pin directly from the PSU.
> >>>>>
> >>>>> 2. A single molex to 6-pin (my PSU does provide one molex).
> >>>>>
> >>>>> 3. Two 6-pins to a single 8-pin converter going to the card.
> >>>>>
> >>>>>
> >>>>> In other words, I can populate both 6-pins on the 6-8 converter, just
> >>>>> not sure about the pinouts in this case.
> >>>>>
> >>>>>
> >>>>> Not good?
> >>>>>
> >>>>>
> >>>>> Alex
> >>>>>
> >>>>>
> >>>>>
> >>>>>   >
> >>>>>
> >>>>> What I meant is this: http://goo.gl/8o1B5P
> >>>>>
> >>>>>
> >>>>> That is 2x molex -> 8pin PCI-E. A single molex may not be enouhg.
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>>
> >>>>> Szilárd
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Tue, Jun 30, 2015 at 7:10 PM, Alex <nedomacho at gmail.com> wrote:
> >>>>>
> >>>>>
> >>>>> It is a 4-core CPU, single GPU box, so I doubt I will be running more
> >>>>>
> >>>>> than one at a time. We will very likely get a different PSU,
> unless...
> >>>>>
> >>>>> I do have a molex to 6 pin concerter sitting on this very desk. Do
> you
> >>>>>
> >>>>> think it will satisfy the card? I just don't know how much a single
> >>>>>
> >>>>> molex line delivers. If you feel this should work, off to installing
> >>>>>
> >>>>> everything I go.
> >>>>>
> >>>>>
> >>>>> Thanks a bunch,
> >>>>>
> >>>>> Alex
> >>>>>
> >>>>>
> >>>>> SP> First of all, unless you run multiple independent simulations on
> >>>>> the same
> >>>>>
> >>>>> SP> GPU, GROMACS runs alone will never get anywhere near the peak
> power
> >>>>>
> >>>>> SP> consumption of the GPU.
> >>>>>
> >>>>>
> >>>>> SP> The good news is that NVIDIA has gained some sanity and stopped
> >>>>> blocking
> >>>>>
> >>>>> SP> GeForce GPU info in nvidia-smi - although only for newer cars,
> but
> >>>>> it does
> >>>>>
> >>>>> SP> work with the 960 if you use a 352.xx driver:
> >>>>>
> >>>>> SP> +------------------------------------------------------+
> >>>>>
> >>>>>
> >>>>> SP> | NVIDIA-SMI 352.21     Driver Version: 352.21         |
> >>>>>
> >>>>>
> >>>>> SP>
> >>>>>
> |-------------------------------+----------------------+----------------------+
> >>>>>
> >>>>> SP> | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile
> >>>>> Uncorr.
> >>>>>
> >>>>> SP> ECC |
> >>>>>
> >>>>> SP> | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util
> >>>>> Compute
> >>>>>
> >>>>> SP> M. |
> >>>>>
> >>>>> SP>
> >>>>>
> |===============================+======================+======================|
> >>>>>
> >>>>> SP> |   0  GeForce GTX 960     Off  | 0000:01:00.0      On |
> >>>>>
> >>>>> SP>  N/A |
> >>>>>
> >>>>> SP> |  8%   45C    P5    15W / 130W |   1168MiB /  2044MiB |     31%
> >>>>>
> >>>>> SP>  Default |
> >>>>>
> >>>>> SP>
> >>>>>
> +-------------------------------+----------------------+----------------------+
> >>>>>
> >>>>>
> >>>>>
> >>>>> SP> A single 6-pin can deliver 75W, an 8-pin 150W, so in your case,
> >>>>> the hard
> >>>>>
> >>>>> SP> limits of what your card can pull is 75W from the PCI-E slow +
> >>>>> 150W from
> >>>>>
> >>>>> SP> the cable = 225 W. With a single 6-pin cable you'll only get
> ~150W
> >>>>> max.
> >>>>>
> >>>>> SP> That can be OK if your card does not pull more power (e.g. the
> >>>>> above
> >>>>>
> >>>>> SP> non-overclocked card would be just fine), but as your card is
> >>>>> overclocked,
> >>>>>
> >>>>> SP> I'm not sure it won't peak above 150W.
> >>>>>
> >>>>>
> >>>>> SP> You can try to get a molex -> PCI-E power cable converter.
> >>>>>
> >>>>>
> >>>>>
> >>>>> SP> --
> >>>>>
> >>>>> SP> Szilárd
> >>>>>
> >>>>>
> >>>>>
> >>>>> SP> On Mon, Jun 29, 2015 at 9:56 PM, Alex <nedomacho at gmail.com>
> wrote:
> >>>>>
> >>>>>
> >>>>> >> Hi all,
> >>>>>
> >>>>> >>
> >>>>>
> >>>>> >> I have a bit of a gromacs-unrelated question here, but I think
> this
> >>>>> is a
> >>>>>
> >>>>> >> better place to ask it than, say, a gaming forum. The Nvidia GTX
> >>>>> 960 card
> >>>>>
> >>>>> >> we got here came with an 8-pin AUX connector on the card side,
> which
> >>>>>
> >>>>> >> interfaces _two_ 6-pin connectors to the PSU. It is a factory
> >>>>> superclocked
> >>>>>
> >>>>> >> card. My 525W PSU can only populate _one_ of those 6-pin
> >>>>> connectors. The
> >>>>>
> >>>>> >> EVGA website states that I need at least 400W PSU, while I have
> 525.
> >>>>>
> >>>>> >>
> >>>>>
> >>>>> >> At the same time, I have a dedicated high-power PCI-e slot, which
> >>>>> on the
> >>>>>
> >>>>> >> motherboard says "75W PCI-e". Do I need a different PSU to
> populate
> >>>>> the AUX
> >>>>>
> >>>>> >> power connector completely? Are these runs equivalent to drawing
> >>>>> max power
> >>>>>
> >>>>> >> during gaming?
> >>>>>
> >>>>> >>
> >>>>>
> >>>>> >> Thanks!
> >>>>>
> >>>>> >>
> >>>>>
> >>>>> >> Alex
> >>>>>
> >>>>> >> --
> >>>>>
> >>>>> >> Gromacs Users mailing list
> >>>>>
> >>>>> >>
> >>>>>
> >>>>> >> * Please search the archive at
> >>>>>
> >>>>> >> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List
> before
> >>>>>
> >>>>> >> posting!
> >>>>>
> >>>>> >>
> >>>>>
> >>>>> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>>>>
> >>>>> >>
> >>>>>
> >>>>> >> * For (un)subscribe requests visit
> >>>>>
> >>>>> >>
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> >>>>>  or
> >>>>>
> >>>>> >> send a mail to gmx-users-request at gromacs.org.
> >>>>>
> >>>>> >>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>>
> >>>>> Gromacs Users mailing list
> >>>>>
> >>>>>
> >>>>> * Please search the archive at
> >>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> >>>>> posting!
> >>>>>
> >>>>>
> >>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>>>>
> >>>>>
> >>>>> * For (un)subscribe requests visit
> >>>>>
> >>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> or
> >>>>> send a mail to gmx-users-request at gromacs.org.
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>>
> >>>>> Best regards,
> >>>>>
> >>>>>  Alex                            mailto:nedomacho at gmail.com
> >>>>> <nedomacho at gmail.com>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>>
> >>>>> Best regards,
> >>>>>
> >>>>>  Alex                            mailto:nedomacho at gmail.com
> >>>>> <nedomacho at gmail.com>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>>
> >>>>> Best regards,
> >>>>>
> >>>>>  Alex                            mailto:nedomacho at gmail.com
> >>>>> <nedomacho at gmail.com>
> >>>>>
> >>>>> --
> >>>>> Gromacs Users mailing list
> >>>>>
> >>>>> * Please search the archive at
> >>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> >>>>> posting!
> >>>>>
> >>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>>>>
> >>>>> * For (un)subscribe requests visit
> >>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> or
> >>>>> send a mail to gmx-users-request at gromacs.org.
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.