[gmx-users] GPU slower than I7

Renato Freitas renatoffs at gmail.com
Mon Oct 25 20:45:40 CEST 2010


Hi,

My OS is Fedora 13 (64 bits) and I used gcc 4.4.4. I ran the program
you sent me. Bellow are the results of 5 runs. As you can see the
results are rougly the same

[renato at scrat ~]$ ./time
2.090000 2.102991
[renato at scrat ~]$ ./time
2.090000 2.102808
[renato at scrat ~]$ ./time
2.090000 2.104577
[renato at scrat ~]$ ./time
2.090000 2.103943
[renato at scrat ~]$ ./time
2.090000 2.104471

Bellow are part of the /src/configure.h
.
.
.

/* Define to 1 if you have the MSVC _aligned_malloc() function. */
/* #undef HAVE__ALIGNED_MALLOC */

/* Define to 1 if you have the gettimeofday() function. */
#define HAVE_GETTIMEOFDAY

/* Define to 1 if you have the cbrt() function. */
#define HAVE_CBRT
.
.
.

 Is this OK?

Renato




2010/10/22 Roland Schulz <roland at utk.edu>:
> Hi,
>
> On Fri, Oct 22, 2010 at 3:20 PM, Renato Freitas <renatoffs at gmail.com> wrote:
>>
>> Do you think that the "NODE" and "Real" time difference could be
>> attributed to some compilation problem in the mdrun-gpu. Despite I'm
>> asking this I didn't get any error in the compilation.
>
> It is very odd that these are different for you system. What operating
> system and compiler do you use?
> Is HAVE_GETTIMEOFDAY set in src/config.h?
> I attached a small test program which uses the two different timers used for
> NODE and Real time. You can compile it with cc time.c -o time and run it
> with ./time. Do you get roughly the same time twice with the test program or
> do you see the same discrepancy as with GROMACS?
> Roland
>>
>> Thanks,
>>
>> Renato
>>
>> 2010/10/22 Szilárd Páll <szilard.pall at cbr.su.se>:
>> > Hi Renato,
>> >
>> > First of all, what you're seeing is pretty normal, especially that you
>> > have a CPU that is crossing the border of insane :) Why is it normal?
>> > The PME algorithms are just simply not very well not well suited for
>> > current GPU architectures. With an ill-suited algorithm you won't be
>> > able to see the speedups you can often see in other application areas
>> > - -even more so that you're comparing to Gromacs on a i7 980X. For
>> > more info + benchmarks see the Gromacs-GPU page:
>> > http://www.gromacs.org/gpu
>> >
>> > However, there is one strange thing you also pointed out. The fact
>> > that the "NODE" and "Real" time in your mdrun-gpu timing summary is
>> > not the same, but has 3x deviation is _very_ unusual. I've ran
>> > mdrun-gpu on quite a wide variety of hardware but I've never seen
>> > those two counter deviate. It might be an artifact from the cycle
>> > counters used internally that behave in an unusual way on your CPU.
>> >
>> > One other thing I should point out is that you would be better off
>> > using the standard mdrun which in 4.5 by default has thread-support
>> > and therefore will run on a single cpu/node without MPI!
>> >
>> > Cheers,
>> > --
>> > Szilárd
>> >
>> >
>> >
>> > On Thu, Oct 21, 2010 at 9:18 PM, Renato Freitas <renatoffs at gmail.com>
>> > wrote:
>> >> Hi gromacs users,
>> >>
>> >> I have installed the lastest version of gromacs (4.5.1) in an i7 980X
>> >> (6 cores or 12 with HT on; 3.3 GHz) with 12GB of RAM and compiled its
>> >> mpi version. Also I compiled the GPU-accelerated
>> >> version of gromacs. Then I did a  2 ns simulation using a small system
>> >> (11042 atoms)  to compare the performance of mdrun-gpu vs mdrun_mpi.
>> >> The results that I got are bellow:
>> >>
>> >> ############################################
>> >> My *.mdp is:
>> >>
>> >> constraints         =  all-bonds
>> >> integrator          =  md
>> >> dt                  =  0.002    ; ps !
>> >> nsteps              =  1000000  ; total 2000 ps.
>> >> nstlist             =  10
>> >> ns_type             =  grid
>> >> coulombtype    = PME
>> >> rvdw                = 0.9
>> >> rlist               = 0.9
>> >> rcoulomb            = 0.9
>> >> fourierspacing      = 0.10
>> >> pme_order           = 4
>> >> ewald_rtol          = 1e-5
>> >> vdwtype             =  cut-off
>> >> pbc                 =  xyz
>> >> epsilon_rf    =  0
>> >> comm_mode           =  linear
>> >> nstxout             =  1000
>> >> nstvout             =  0
>> >> nstfout             =  0
>> >> nstxtcout           =  1000
>> >> nstlog              =  1000
>> >> nstenergy           =  1000
>> >> ; Berendsen temperature coupling is on in four groups
>> >> tcoupl              = berendsen
>> >> tc-grps             = system
>> >> tau-t               = 0.1
>> >> ref-t               = 298
>> >> ; Pressure coupling is on
>> >> Pcoupl = berendsen
>> >> pcoupltype = isotropic
>> >> tau_p = 0.5
>> >> compressibility = 4.5e-5
>> >> ref_p = 1.0
>> >> ; Generate velocites is on at 298 K.
>> >> gen_vel = no
>> >>
>> >> ########################
>> >> RUNNING GROMACS ON GPU
>> >>
>> >> mdrun-gpu -s topol.tpr -v > & out &
>> >>
>> >> Here is a part of the md.log:
>> >>
>> >> Started mdrun on node 0 Wed Oct 20 09:52:09 2010
>> >> .
>> >> .
>> >> .
>> >>     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
>> >>
>> >>  Computing:     Nodes   Number          G-Cycles        Seconds     %
>> >>
>> >> ------------------------------------------------------------------------------------------------------
>> >>  Write traj.    1               1021                    106.075 31.7
>> >>          0.2
>> >>  Rest                   1               64125.577               19178.6
>> >> 99.8
>> >>
>> >> ------------------------------------------------------------------------------------------------------
>> >>  Total          1               64231.652               19210.3 100.0
>> >>
>> >> ------------------------------------------------------------------------------------------------------
>> >>
>> >>                        NODE (s)                Real (s)
>> >>  (%)
>> >>       Time:    6381.840                19210.349               33.2
>> >>                       1h46:21
>> >>                        (Mnbf/s)   (MFlops)     (ns/day)
>> >>  (hour/ns)
>> >> Performance:    0.000   0.001   27.077  0.886
>> >>
>> >> Finished mdrun on node 0 Wed Oct 20 15:12:19 2010
>> >>
>> >> ########################
>> >> RUNNING GROMACS ON MPI
>> >>
>> >> mpirun -np 6 mdrun_mpi -s topol.tpr -npme 3 -v > & out &
>> >>
>> >> Here is a part of the md.log:
>> >>
>> >> Started mdrun on node 0 Wed Oct 20 18:30:52 2010
>> >>
>> >>     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
>> >>
>> >>  Computing:             Nodes   Number  G-Cycles    Seconds
>> >> %
>> >>
>> >> --------------------------------------------------------------------------------------------------------------
>> >>  Domain decomp. 3              100001     1452.166      434.7
>> >>   0.6
>> >>  DD comm. load          3              10001        0.745          0.2
>> >>       0.0
>> >>  Send X to PME         3              1000001    249.003       74.5
>> >>          0.1
>> >>  Comm. coord.           3              1000001   637.329        190.8
>> >>          0.3
>> >>  Neighbor search        3              100001     8738.669      2616.0
>> >>         3.5
>> >>  Force                       3              1000001   99210.202
>> >> 29699.2        39.2
>> >>  Wait + Comm. F       3              1000001   3361.591       1006.3
>> >>       1.3
>> >>  PME mesh               3              1000001   66189.554     19814.2
>> >>       26.2
>> >>  Wait + Comm. X/F    3              60294.513 8049.5          23.8
>> >>  Wait + Recv. PME F 3              1000001    801.897        240.1
>> >>       0.3
>> >>  Write traj.                 3              1015         33.464
>> >>  10.0             0.0
>> >>  Update                     3              1000001    3295.820
>> >> 986.6          1.3
>> >>  Constraints              3              1000001     6317.568
>> >> 1891.2          2.5
>> >>  Comm. energies       3              100002      70.784          21.2
>> >>           0.0
>> >>  Rest                        3                              2314.844
>> >>    693.0           0.9
>> >>
>> >> --------------------------------------------------------------------------------------------------------------
>> >>  Total                        6              252968.148    75727.5
>> >>                 100.0
>> >>
>> >> --------------------------------------------------------------------------------------------------------------
>> >>
>> >> --------------------------------------------------------------------------------------------------------------
>> >>  PME redist. X/F        3              2000002    1945.551      582.4
>> >>          0.8
>> >>  PME spread/gather   3              2000002    37219.607    11141.9
>> >>    14.7
>> >>  PME 3D-FFT            3              2000002    21453.362     6422.2
>> >>        8.5
>> >>  PME solve               3              1000001     5551.056
>> >> 1661.7           2.2
>> >>
>> >> --------------------------------------------------------------------------------------------------------------
>> >>
>> >> Parallel run - timing based on wallclock.
>> >>
>> >>                        NODE (s)         Real (s)                    (%)
>> >>       Time:    12621.257       12621.257           100.0
>> >>                       3h30:21
>> >>                        (Mnbf/s)           (GFlops)
>> >>  (ns/day)              (hour/ns)
>> >> Performance:    388.633            28.773          13.691         1.753
>> >> Finished mdrun on node 0 Wed Oct 20 22:01:14 2010
>> >>
>> >> ######################################
>> >> Comparing the performance values for the two simulations I saw that in
>> >> "numeric terms" the simulation using the GPU gave (for example) ~27
>> >> ns/day, while when I used  mpi this value is aproximatelly half (13.7
>> >> ns/day).
>> >> However, when I compared the time that each simulation
>> >> started/finished, the simulation using mpi tooks 211 minutes while the
>> >> gpu simulation tooked 320 minutes to finish.
>> >>
>> >> My questions are:
>> >>
>> >> 1. Why in the performace values I got better results with the GPU?
>> >>
>> >> 2. Why the simulation running on GPU was 109 min. slower than on 6
>> >> cores, since my video card is a GTX 480 with 480 gpu cores? I was
>> >> expecting that the GPU would accelerate greatly the simulations.
>> >>
>> >>
>> >> Does anyone have some idea?
>> >>
>> >> Thanks,
>> >>
>> >> Renato
>> >> --
>> >> gmx-users mailing list    gmx-users at gromacs.org
>> >> http://lists.gromacs.org/mailman/listinfo/gmx-users
>> >> Please search the archive at
>> >> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
>> >> Please don't post (un)subscribe requests to the list. Use the
>> >> www interface or send it to gmx-users-request at gromacs.org.
>> >> Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >>
>> > --
>> > gmx-users mailing list    gmx-users at gromacs.org
>> > http://lists.gromacs.org/mailman/listinfo/gmx-users
>> > Please search the archive at
>> > http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
>> > Please don't post (un)subscribe requests to the list. Use the
>> > www interface or send it to gmx-users-request at gromacs.org.
>> > Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >
>> --
>> gmx-users mailing list    gmx-users at gromacs.org
>> http://lists.gromacs.org/mailman/listinfo/gmx-users
>> Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
>> Please don't post (un)subscribe requests to the list. Use the
>> www interface or send it to gmx-users-request at gromacs.org.
>> Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>
>
>
> --
> ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
> 865-241-1537, ORNL PO BOX 2008 MS6309
>
> --
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>



More information about the gromacs.org_gmx-users mailing list