[gmx-users] GPU and aux power supply

Thu Jul 2 22:47:16 CEST 2015

Szilárd,

I was wrong. When I run with GPU and use -ntomp 4, I have 400% CPU
utilization and that yields about 83 ns/day. When I do -ntomp 4 -nb cpu, I
get 1600% CPU utilization and get similar results.
However, when I run -nt 4 -nb cpu, I get 400% CPU utilization, and then it
is slower. I am doing a short test, will send the stats later on.

The stats from GPU-accelerated (-ntomp 4) are below. Pretty poor CPU-GPU
sync here, actually. Will post the log for CPU-only run once it finishes.

R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

On 1 MPI rank, each using 4 OpenMP threads

 Computing:          Num   Num      Call    Wall time         Giga-Cycles
                     Ranks Threads  Count      (s)         total sum    %
-----------------------------------------------------------------------------
 Neighbor search        1    4    2500001    1881.311      18061.525   1.8
 Launch GPU ops.        1    4  100000001    4713.584      45252.759   4.5
 Force                  1    4  100000001   66892.607     642202.401  63.5
 PME mesh               1    4  100000001   25192.879     241864.204  23.9
 Wait GPU local         1    4  100000001     869.481       8347.456   0.8
 NB X/F buffer ops.     1    4  197500001    2014.227      19337.585   1.9
 COM pull force         1    4  100000001     704.950       6767.871   0.7
 Write traj.            1    4       6118      15.348        147.345   0.0
 Update                 1    4  100000001    1747.965      16781.332   1.7
 Rest                                        1364.705      13101.849   1.3
-----------------------------------------------------------------------------
 Total                                     105397.057    1011864.328 100.0
-----------------------------------------------------------------------------
 Breakdown of PME mesh computation
-----------------------------------------------------------------------------
 PME spread/gather      1    4  200000002   12874.626     123602.829  12.2
 PME 3D-FFT             1    4  200000002    9285.345      89143.948   8.8
 PME solve Elec         1    4  100000001    2746.973      26372.313   2.6
-----------------------------------------------------------------------------

 GPU timings
-----------------------------------------------------------------------------
 Computing:                         Count  Wall t (s)      ms/step       %
-----------------------------------------------------------------------------
 Pair list H2D                    2500001     124.145        0.050     0.4
 X / q H2D                      100000001    2089.623        0.021     6.0
 Nonbonded F kernel              97000000   30164.146        0.311    86.2
 Nonbonded F+ene k.                500000     227.896        0.456     0.7
 Nonbonded F+prune k.             2000000     708.250        0.354     2.0
 Nonbonded F+ene+prune k.          500001     223.082        0.446     0.6
 F D2H                          100000001    1465.277        0.015     4.2
-----------------------------------------------------------------------------
 Total                                      35002.419        0.350   100.0
-----------------------------------------------------------------------------

Force evaluation time GPU/CPU: 0.350 ms/0.921 ms = 0.380
For optimal performance this ratio should be close to 1!

NOTE: The GPU has >25% less load than the CPU. This imbalance causes
      performance loss.

               Core t (s)   Wall t (s)        (%)
       Time:   421720.882   105397.057      400.1
                         1d05h16:37
                 (ns/day)    (hour/ns)
Performance:       81.976        0.293
Finished mdrun on rank 0 Thu Jul  2 02:29:57 2015

On Thu, Jul 2, 2015 at 7:57 AM, Szilárd Páll <pall.szilard at gmail.com> wrote:

> I'm curious what are the conditions under which you get such a exceptional
> speedup. Can you share your input files and/or log files?
>
> --
> Szilárd
>
> On Thu, Jul 2, 2015 at 2:18 AM, Alex <nedomacho at gmail.com> wrote:
>
>> Yup, about 7-8 times between with and without GPU acceleration, not
>> making this up: I had 11 ns/day and now ~80-87 ns/day, the numbers vary a
>> bit. I've been getting a similar boost on our GPU-accelerated cluster node
>> (dual core i7, 8 cores each) with two Tesla C2075 cards (I am directing my
>> simulations to one of them via -gpu_id).
>> All runs are -ntomp 4, with or without GPU. The physics in all cases is
>> perfectly acceptable. So far I only tested my new box on vacuum
>> simulations, about to run the solvated version (~30K particles).
>>
>> Alex
>>
>>
>> On Wed, Jul 1, 2015 at 6:09 PM, Szilárd Páll <pall.szilard at gmail.com>
>> wrote:
>>
>>> Hmmm, 8x sounds rather high, are you sure you are comparing to CPU-only
>>> runs that use proper SIMD optimized kernels?
>>>
>>> Because of the way offload-based acceleration works, the CPU and GPU
>>> will inherently be executing concurrently only part of the runtime and as a
>>> consequence the GPU is idle part of the run-time (during
>>> integration+constraints). You can make use of this idle time by running
>>> multiple independent simulations concurrently. This can yield serious
>>> improvements in terms of _aggregate_ simulation performance especially with
>>> small inputs and many cores (see slide 51 https://goo.gl/7DnSri)/
>>>
>>> --
>>> Szilárd
>>>
>>> On Wed, Jul 1, 2015 at 4:16 AM, Alex <nedomacho at gmail.com> wrote:
>>>
>>>>  I am happy to say that I am getting an 8-fold increase in simulation
>>>> speeds for $200.
>>>>
>>>>
>>>> An additional question: normally, how many simulations (separate mdruns
>>>> on separate CPU cores) can be performed simultaneously on a single GPU?
>>>> Say, for 20-40K particle sized simulations.
>>>>
>>>> The coolers are not even spinning during a single test (mdrun -ntomp
>>>> 4), and I get massive acceleration. They aren't broken, the card is just
>>>> cool (small system, ~3K particles).
>>>>
>>>>
>>>> Thanks,
>>>>
>>>>
>>>> Alex
>>>>
>>>>
>>>>
>>>>   >
>>>>
>>>>
>>>>
>>>>
>>>>   >
>>>>
>>>> Ah, ok, so you can get a 6-pin from the PSU and another from a
>>>> converted molex connector. That should be just fine, especially as the card
>>>> should will not pull more than ~155W (under heavy graphics load) based on
>>>> the Tomshardware review* and you are providing 225W max.
>>>>
>>>>
>>>>
>>>> *
>>>> http://www.tomshardware.com/reviews/evga-super-super-clocked-gtx-960,4063-3.html
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Szilárd
>>>>
>>>>
>>>>
>>>> On Tue, Jun 30, 2015 at 7:31 PM, Alex <nedomacho at gmail.com> wrote:
>>>>
>>>>
>>>> Well, I don't have one like this. What I have instead is this:
>>>>
>>>>
>>>> 1. A single 6-pin directly from the PSU.
>>>>
>>>> 2. A single molex to 6-pin (my PSU does provide one molex).
>>>>
>>>> 3. Two 6-pins to a single 8-pin converter going to the card.
>>>>
>>>>
>>>> In other words, I can populate both 6-pins on the 6-8 converter, just
>>>> not sure about the pinouts in this case.
>>>>
>>>>
>>>> Not good?
>>>>
>>>>
>>>> Alex
>>>>
>>>>
>>>>
>>>>   >
>>>>
>>>> What I meant is this: http://goo.gl/8o1B5P
>>>>
>>>>
>>>> That is 2x molex -> 8pin PCI-E. A single molex may not be enouhg.
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Szilárd
>>>>
>>>>
>>>>
>>>> On Tue, Jun 30, 2015 at 7:10 PM, Alex <nedomacho at gmail.com> wrote:
>>>>
>>>>
>>>> It is a 4-core CPU, single GPU box, so I doubt I will be running more
>>>>
>>>> than one at a time. We will very likely get a different PSU, unless...
>>>>
>>>> I do have a molex to 6 pin concerter sitting on this very desk. Do you
>>>>
>>>> think it will satisfy the card? I just don't know how much a single
>>>>
>>>> molex line delivers. If you feel this should work, off to installing
>>>>
>>>> everything I go.
>>>>
>>>>
>>>> Thanks a bunch,
>>>>
>>>> Alex
>>>>
>>>>
>>>> SP> First of all, unless you run multiple independent simulations on
>>>> the same
>>>>
>>>> SP> GPU, GROMACS runs alone will never get anywhere near the peak power
>>>>
>>>> SP> consumption of the GPU.
>>>>
>>>>
>>>> SP> The good news is that NVIDIA has gained some sanity and stopped
>>>> blocking
>>>>
>>>> SP> GeForce GPU info in nvidia-smi - although only for newer cars, but
>>>> it does
>>>>
>>>> SP> work with the 960 if you use a 352.xx driver:
>>>>
>>>> SP> +------------------------------------------------------+
>>>>
>>>>
>>>> SP> | NVIDIA-SMI 352.21     Driver Version: 352.21         |
>>>>
>>>>
>>>> SP>
>>>> |-------------------------------+----------------------+----------------------+
>>>>
>>>> SP> | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile
>>>> Uncorr.
>>>>
>>>> SP> ECC |
>>>>
>>>> SP> | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util
>>>> Compute
>>>>
>>>> SP> M. |
>>>>
>>>> SP>
>>>> |===============================+======================+======================|
>>>>
>>>> SP> |   0  GeForce GTX 960     Off  | 0000:01:00.0      On |
>>>>
>>>> SP>  N/A |
>>>>
>>>> SP> |  8%   45C    P5    15W / 130W |   1168MiB /  2044MiB |     31%
>>>>
>>>> SP>  Default |
>>>>
>>>> SP>
>>>> +-------------------------------+----------------------+----------------------+
>>>>
>>>>
>>>>
>>>> SP> A single 6-pin can deliver 75W, an 8-pin 150W, so in your case, the
>>>> hard
>>>>
>>>> SP> limits of what your card can pull is 75W from the PCI-E slow + 150W
>>>> from
>>>>
>>>> SP> the cable = 225 W. With a single 6-pin cable you'll only get ~150W
>>>> max.
>>>>
>>>> SP> That can be OK if your card does not pull more power (e.g. the above
>>>>
>>>> SP> non-overclocked card would be just fine), but as your card is
>>>> overclocked,
>>>>
>>>> SP> I'm not sure it won't peak above 150W.
>>>>
>>>>
>>>> SP> You can try to get a molex -> PCI-E power cable converter.
>>>>
>>>>
>>>>
>>>> SP> --
>>>>
>>>> SP> Szilárd
>>>>
>>>>
>>>>
>>>> SP> On Mon, Jun 29, 2015 at 9:56 PM, Alex <nedomacho at gmail.com> wrote:
>>>>
>>>>
>>>> >> Hi all,
>>>>
>>>> >>
>>>>
>>>> >> I have a bit of a gromacs-unrelated question here, but I think this
>>>> is a
>>>>
>>>> >> better place to ask it than, say, a gaming forum. The Nvidia GTX 960
>>>> card
>>>>
>>>> >> we got here came with an 8-pin AUX connector on the card side, which
>>>>
>>>> >> interfaces _two_ 6-pin connectors to the PSU. It is a factory
>>>> superclocked
>>>>
>>>> >> card. My 525W PSU can only populate _one_ of those 6-pin connectors.
>>>> The
>>>>
>>>> >> EVGA website states that I need at least 400W PSU, while I have 525.
>>>>
>>>> >>
>>>>
>>>> >> At the same time, I have a dedicated high-power PCI-e slot, which on
>>>> the
>>>>
>>>> >> motherboard says "75W PCI-e". Do I need a different PSU to populate
>>>> the AUX
>>>>
>>>> >> power connector completely? Are these runs equivalent to drawing max
>>>> power
>>>>
>>>> >> during gaming?
>>>>
>>>> >>
>>>>
>>>> >> Thanks!
>>>>
>>>> >>
>>>>
>>>> >> Alex
>>>>
>>>> >> --
>>>>
>>>> >> Gromacs Users mailing list
>>>>
>>>> >>
>>>>
>>>> >> * Please search the archive at
>>>>
>>>> >> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>>
>>>> >> posting!
>>>>
>>>> >>
>>>>
>>>> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>
>>>> >>
>>>>
>>>> >> * For (un)subscribe requests visit
>>>>
>>>> >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>>>>  or
>>>>
>>>> >> send a mail to gmx-users-request at gromacs.org.
>>>>
>>>> >>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Gromacs Users mailing list
>>>>
>>>>
>>>> * Please search the archive at
>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>> posting!
>>>>
>>>>
>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>
>>>>
>>>> * For (un)subscribe requests visit
>>>>
>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>>> send a mail to gmx-users-request at gromacs.org.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Best regards,
>>>>
>>>>  Alex                            mailto:nedomacho at gmail.com
>>>> <nedomacho at gmail.com>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Best regards,
>>>>
>>>>  Alex                            mailto:nedomacho at gmail.com
>>>> <nedomacho at gmail.com>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Best regards,
>>>>
>>>>  Alex                            mailto:nedomacho at gmail.com
>>>> <nedomacho at gmail.com>
>>>>
>>>> --
>>>> Gromacs Users mailing list
>>>>
>>>> * Please search the archive at
>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>> posting!
>>>>
>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>
>>>> * For (un)subscribe requests visit
>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>>> send a mail to gmx-users-request at gromacs.org.
>>>>
>>>
>>>
>>
>