[gmx-users] GPU slower than I7

Roland Schulz roland at utk.edu
Sat Oct 23 00:05:53 CEST 2010


Hi,

On Fri, Oct 22, 2010 at 3:20 PM, Renato Freitas <renatoffs at gmail.com> wrote:
>
>
> Do you think that the "NODE" and "Real" time difference could be
> attributed to some compilation problem in the mdrun-gpu. Despite I'm
> asking this I didn't get any error in the compilation.
>

It is very odd that these are different for you system. What operating
system and compiler do you use?

Is HAVE_GETTIMEOFDAY set in src/config.h?

I attached a small test program which uses the two different timers used for
NODE and Real time. You can compile it with cc time.c -o time and run it
with ./time. Do you get roughly the same time twice with the test program or
do you see the same discrepancy as with GROMACS?

Roland

Thanks,
>
> Renato
>
> 2010/10/22 Szilárd Páll <szilard.pall at cbr.su.se>:
> > Hi Renato,
> >
> > First of all, what you're seeing is pretty normal, especially that you
> > have a CPU that is crossing the border of insane :) Why is it normal?
> > The PME algorithms are just simply not very well not well suited for
> > current GPU architectures. With an ill-suited algorithm you won't be
> > able to see the speedups you can often see in other application areas
> > - -even more so that you're comparing to Gromacs on a i7 980X. For
> > more info + benchmarks see the Gromacs-GPU page:
> > http://www.gromacs.org/gpu
> >
> > However, there is one strange thing you also pointed out. The fact
> > that the "NODE" and "Real" time in your mdrun-gpu timing summary is
> > not the same, but has 3x deviation is _very_ unusual. I've ran
> > mdrun-gpu on quite a wide variety of hardware but I've never seen
> > those two counter deviate. It might be an artifact from the cycle
> > counters used internally that behave in an unusual way on your CPU.
> >
> > One other thing I should point out is that you would be better off
> > using the standard mdrun which in 4.5 by default has thread-support
> > and therefore will run on a single cpu/node without MPI!
> >
> > Cheers,
> > --
> > Szilárd
> >
> >
> >
> > On Thu, Oct 21, 2010 at 9:18 PM, Renato Freitas <renatoffs at gmail.com>
> wrote:
> >> Hi gromacs users,
> >>
> >> I have installed the lastest version of gromacs (4.5.1) in an i7 980X
> >> (6 cores or 12 with HT on; 3.3 GHz) with 12GB of RAM and compiled its
> >> mpi version. Also I compiled the GPU-accelerated
> >> version of gromacs. Then I did a  2 ns simulation using a small system
> >> (11042 atoms)  to compare the performance of mdrun-gpu vs mdrun_mpi.
> >> The results that I got are bellow:
> >>
> >> ############################################
> >> My *.mdp is:
> >>
> >> constraints         =  all-bonds
> >> integrator          =  md
> >> dt                  =  0.002    ; ps !
> >> nsteps              =  1000000  ; total 2000 ps.
> >> nstlist             =  10
> >> ns_type             =  grid
> >> coulombtype    = PME
> >> rvdw                = 0.9
> >> rlist               = 0.9
> >> rcoulomb            = 0.9
> >> fourierspacing      = 0.10
> >> pme_order           = 4
> >> ewald_rtol          = 1e-5
> >> vdwtype             =  cut-off
> >> pbc                 =  xyz
> >> epsilon_rf    =  0
> >> comm_mode           =  linear
> >> nstxout             =  1000
> >> nstvout             =  0
> >> nstfout             =  0
> >> nstxtcout           =  1000
> >> nstlog              =  1000
> >> nstenergy           =  1000
> >> ; Berendsen temperature coupling is on in four groups
> >> tcoupl              = berendsen
> >> tc-grps             = system
> >> tau-t               = 0.1
> >> ref-t               = 298
> >> ; Pressure coupling is on
> >> Pcoupl = berendsen
> >> pcoupltype = isotropic
> >> tau_p = 0.5
> >> compressibility = 4.5e-5
> >> ref_p = 1.0
> >> ; Generate velocites is on at 298 K.
> >> gen_vel = no
> >>
> >> ########################
> >> RUNNING GROMACS ON GPU
> >>
> >> mdrun-gpu -s topol.tpr -v > & out &
> >>
> >> Here is a part of the md.log:
> >>
> >> Started mdrun on node 0 Wed Oct 20 09:52:09 2010
> >> .
> >> .
> >> .
> >>     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
> >>
> >>  Computing:     Nodes   Number          G-Cycles        Seconds     %
> >>
> ------------------------------------------------------------------------------------------------------
> >>  Write traj.    1               1021                    106.075 31.7
>        0.2
> >>  Rest                   1               64125.577               19178.6
> 99.8
> >>
> ------------------------------------------------------------------------------------------------------
> >>  Total          1               64231.652               19210.3 100.0
> >>
> ------------------------------------------------------------------------------------------------------
> >>
> >>                        NODE (s)                Real (s)
>  (%)
> >>       Time:    6381.840                19210.349               33.2
> >>                       1h46:21
> >>                        (Mnbf/s)   (MFlops)     (ns/day)        (hour/ns)
> >> Performance:    0.000   0.001   27.077  0.886
> >>
> >> Finished mdrun on node 0 Wed Oct 20 15:12:19 2010
> >>
> >> ########################
> >> RUNNING GROMACS ON MPI
> >>
> >> mpirun -np 6 mdrun_mpi -s topol.tpr -npme 3 -v > & out &
> >>
> >> Here is a part of the md.log:
> >>
> >> Started mdrun on node 0 Wed Oct 20 18:30:52 2010
> >>
> >>     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
> >>
> >>  Computing:             Nodes   Number  G-Cycles    Seconds
> %
> >>
> --------------------------------------------------------------------------------------------------------------
> >>  Domain decomp. 3              100001     1452.166      434.7
>   0.6
> >>  DD comm. load          3              10001        0.745          0.2
> >>       0.0
> >>  Send X to PME         3              1000001    249.003       74.5
> >>          0.1
> >>  Comm. coord.           3              1000001   637.329        190.8
> >>          0.3
> >>  Neighbor search        3              100001     8738.669      2616.0
> >>         3.5
> >>  Force                       3              1000001   99210.202
> >> 29699.2        39.2
> >>  Wait + Comm. F       3              1000001   3361.591       1006.3
>     1.3
> >>  PME mesh               3              1000001   66189.554     19814.2
> >>       26.2
> >>  Wait + Comm. X/F    3              60294.513 8049.5          23.8
> >>  Wait + Recv. PME F 3              1000001    801.897        240.1
>     0.3
> >>  Write traj.                 3              1015         33.464
> >>  10.0             0.0
> >>  Update                     3              1000001    3295.820
> >> 986.6          1.3
> >>  Constraints              3              1000001     6317.568
> >> 1891.2          2.5
> >>  Comm. energies       3              100002      70.784          21.2
> >>           0.0
> >>  Rest                        3                              2314.844
> >>    693.0           0.9
> >>
> --------------------------------------------------------------------------------------------------------------
> >>  Total                        6              252968.148    75727.5
> >>                 100.0
> >>
> --------------------------------------------------------------------------------------------------------------
> >>
> --------------------------------------------------------------------------------------------------------------
> >>  PME redist. X/F        3              2000002    1945.551      582.4
> >>          0.8
> >>  PME spread/gather   3              2000002    37219.607    11141.9
>    14.7
> >>  PME 3D-FFT            3              2000002    21453.362     6422.2
> >>        8.5
> >>  PME solve               3              1000001     5551.056
> >> 1661.7           2.2
> >>
> --------------------------------------------------------------------------------------------------------------
> >>
> >> Parallel run - timing based on wallclock.
> >>
> >>                        NODE (s)         Real (s)                    (%)
> >>       Time:    12621.257       12621.257           100.0
> >>                       3h30:21
> >>                        (Mnbf/s)           (GFlops)
>  (ns/day)              (hour/ns)
> >> Performance:    388.633            28.773          13.691         1.753
> >> Finished mdrun on node 0 Wed Oct 20 22:01:14 2010
> >>
> >> ######################################
> >> Comparing the performance values for the two simulations I saw that in
> >> "numeric terms" the simulation using the GPU gave (for example) ~27
> >> ns/day, while when I used  mpi this value is aproximatelly half (13.7
> >> ns/day).
> >> However, when I compared the time that each simulation
> >> started/finished, the simulation using mpi tooks 211 minutes while the
> >> gpu simulation tooked 320 minutes to finish.
> >>
> >> My questions are:
> >>
> >> 1. Why in the performace values I got better results with the GPU?
> >>
> >> 2. Why the simulation running on GPU was 109 min. slower than on 6
> >> cores, since my video card is a GTX 480 with 480 gpu cores? I was
> >> expecting that the GPU would accelerate greatly the simulations.
> >>
> >>
> >> Does anyone have some idea?
> >>
> >> Thanks,
> >>
> >> Renato
> >> --
> >> gmx-users mailing list    gmx-users at gromacs.org
> >> http://lists.gromacs.org/mailman/listinfo/gmx-users
> >> Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> >> Please don't post (un)subscribe requests to the list. Use the
> >> www interface or send it to gmx-users-request at gromacs.org.
> >> Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>
> > --
> > gmx-users mailing list    gmx-users at gromacs.org
> > http://lists.gromacs.org/mailman/listinfo/gmx-users
> > Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> > Please don't post (un)subscribe requests to the list. Use the
> > www interface or send it to gmx-users-request at gromacs.org.
> > Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> --
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
>


-- 
ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
865-241-1537, ORNL PO BOX 2008 MS6309
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20101022/c0640198/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: time.c
Type: application/octet-stream
Size: 669 bytes
Desc: not available
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20101022/c0640198/attachment.obj>


More information about the gromacs.org_gmx-users mailing list