[gmx-users] GPU slower than I7
Renato Freitas
renatoffs at gmail.com
Mon Oct 25 20:45:40 CEST 2010
Hi,
My OS is Fedora 13 (64 bits) and I used gcc 4.4.4. I ran the program
you sent me. Bellow are the results of 5 runs. As you can see the
results are rougly the same
[renato at scrat ~]$ ./time
2.090000 2.102991
[renato at scrat ~]$ ./time
2.090000 2.102808
[renato at scrat ~]$ ./time
2.090000 2.104577
[renato at scrat ~]$ ./time
2.090000 2.103943
[renato at scrat ~]$ ./time
2.090000 2.104471
Bellow are part of the /src/configure.h
.
.
.
/* Define to 1 if you have the MSVC _aligned_malloc() function. */
/* #undef HAVE__ALIGNED_MALLOC */
/* Define to 1 if you have the gettimeofday() function. */
#define HAVE_GETTIMEOFDAY
/* Define to 1 if you have the cbrt() function. */
#define HAVE_CBRT
.
.
.
Is this OK?
Renato
2010/10/22 Roland Schulz <roland at utk.edu>:
> Hi,
>
> On Fri, Oct 22, 2010 at 3:20 PM, Renato Freitas <renatoffs at gmail.com> wrote:
>>
>> Do you think that the "NODE" and "Real" time difference could be
>> attributed to some compilation problem in the mdrun-gpu. Despite I'm
>> asking this I didn't get any error in the compilation.
>
> It is very odd that these are different for you system. What operating
> system and compiler do you use?
> Is HAVE_GETTIMEOFDAY set in src/config.h?
> I attached a small test program which uses the two different timers used for
> NODE and Real time. You can compile it with cc time.c -o time and run it
> with ./time. Do you get roughly the same time twice with the test program or
> do you see the same discrepancy as with GROMACS?
> Roland
>>
>> Thanks,
>>
>> Renato
>>
>> 2010/10/22 Szilárd Páll <szilard.pall at cbr.su.se>:
>> > Hi Renato,
>> >
>> > First of all, what you're seeing is pretty normal, especially that you
>> > have a CPU that is crossing the border of insane :) Why is it normal?
>> > The PME algorithms are just simply not very well not well suited for
>> > current GPU architectures. With an ill-suited algorithm you won't be
>> > able to see the speedups you can often see in other application areas
>> > - -even more so that you're comparing to Gromacs on a i7 980X. For
>> > more info + benchmarks see the Gromacs-GPU page:
>> > http://www.gromacs.org/gpu
>> >
>> > However, there is one strange thing you also pointed out. The fact
>> > that the "NODE" and "Real" time in your mdrun-gpu timing summary is
>> > not the same, but has 3x deviation is _very_ unusual. I've ran
>> > mdrun-gpu on quite a wide variety of hardware but I've never seen
>> > those two counter deviate. It might be an artifact from the cycle
>> > counters used internally that behave in an unusual way on your CPU.
>> >
>> > One other thing I should point out is that you would be better off
>> > using the standard mdrun which in 4.5 by default has thread-support
>> > and therefore will run on a single cpu/node without MPI!
>> >
>> > Cheers,
>> > --
>> > Szilárd
>> >
>> >
>> >
>> > On Thu, Oct 21, 2010 at 9:18 PM, Renato Freitas <renatoffs at gmail.com>
>> > wrote:
>> >> Hi gromacs users,
>> >>
>> >> I have installed the lastest version of gromacs (4.5.1) in an i7 980X
>> >> (6 cores or 12 with HT on; 3.3 GHz) with 12GB of RAM and compiled its
>> >> mpi version. Also I compiled the GPU-accelerated
>> >> version of gromacs. Then I did a 2 ns simulation using a small system
>> >> (11042 atoms) to compare the performance of mdrun-gpu vs mdrun_mpi.
>> >> The results that I got are bellow:
>> >>
>> >> ############################################
>> >> My *.mdp is:
>> >>
>> >> constraints = all-bonds
>> >> integrator = md
>> >> dt = 0.002 ; ps !
>> >> nsteps = 1000000 ; total 2000 ps.
>> >> nstlist = 10
>> >> ns_type = grid
>> >> coulombtype = PME
>> >> rvdw = 0.9
>> >> rlist = 0.9
>> >> rcoulomb = 0.9
>> >> fourierspacing = 0.10
>> >> pme_order = 4
>> >> ewald_rtol = 1e-5
>> >> vdwtype = cut-off
>> >> pbc = xyz
>> >> epsilon_rf = 0
>> >> comm_mode = linear
>> >> nstxout = 1000
>> >> nstvout = 0
>> >> nstfout = 0
>> >> nstxtcout = 1000
>> >> nstlog = 1000
>> >> nstenergy = 1000
>> >> ; Berendsen temperature coupling is on in four groups
>> >> tcoupl = berendsen
>> >> tc-grps = system
>> >> tau-t = 0.1
>> >> ref-t = 298
>> >> ; Pressure coupling is on
>> >> Pcoupl = berendsen
>> >> pcoupltype = isotropic
>> >> tau_p = 0.5
>> >> compressibility = 4.5e-5
>> >> ref_p = 1.0
>> >> ; Generate velocites is on at 298 K.
>> >> gen_vel = no
>> >>
>> >> ########################
>> >> RUNNING GROMACS ON GPU
>> >>
>> >> mdrun-gpu -s topol.tpr -v > & out &
>> >>
>> >> Here is a part of the md.log:
>> >>
>> >> Started mdrun on node 0 Wed Oct 20 09:52:09 2010
>> >> .
>> >> .
>> >> .
>> >> R E A L C Y C L E A N D T I M E A C C O U N T I N G
>> >>
>> >> Computing: Nodes Number G-Cycles Seconds %
>> >>
>> >> ------------------------------------------------------------------------------------------------------
>> >> Write traj. 1 1021 106.075 31.7
>> >> 0.2
>> >> Rest 1 64125.577 19178.6
>> >> 99.8
>> >>
>> >> ------------------------------------------------------------------------------------------------------
>> >> Total 1 64231.652 19210.3 100.0
>> >>
>> >> ------------------------------------------------------------------------------------------------------
>> >>
>> >> NODE (s) Real (s)
>> >> (%)
>> >> Time: 6381.840 19210.349 33.2
>> >> 1h46:21
>> >> (Mnbf/s) (MFlops) (ns/day)
>> >> (hour/ns)
>> >> Performance: 0.000 0.001 27.077 0.886
>> >>
>> >> Finished mdrun on node 0 Wed Oct 20 15:12:19 2010
>> >>
>> >> ########################
>> >> RUNNING GROMACS ON MPI
>> >>
>> >> mpirun -np 6 mdrun_mpi -s topol.tpr -npme 3 -v > & out &
>> >>
>> >> Here is a part of the md.log:
>> >>
>> >> Started mdrun on node 0 Wed Oct 20 18:30:52 2010
>> >>
>> >> R E A L C Y C L E A N D T I M E A C C O U N T I N G
>> >>
>> >> Computing: Nodes Number G-Cycles Seconds
>> >> %
>> >>
>> >> --------------------------------------------------------------------------------------------------------------
>> >> Domain decomp. 3 100001 1452.166 434.7
>> >> 0.6
>> >> DD comm. load 3 10001 0.745 0.2
>> >> 0.0
>> >> Send X to PME 3 1000001 249.003 74.5
>> >> 0.1
>> >> Comm. coord. 3 1000001 637.329 190.8
>> >> 0.3
>> >> Neighbor search 3 100001 8738.669 2616.0
>> >> 3.5
>> >> Force 3 1000001 99210.202
>> >> 29699.2 39.2
>> >> Wait + Comm. F 3 1000001 3361.591 1006.3
>> >> 1.3
>> >> PME mesh 3 1000001 66189.554 19814.2
>> >> 26.2
>> >> Wait + Comm. X/F 3 60294.513 8049.5 23.8
>> >> Wait + Recv. PME F 3 1000001 801.897 240.1
>> >> 0.3
>> >> Write traj. 3 1015 33.464
>> >> 10.0 0.0
>> >> Update 3 1000001 3295.820
>> >> 986.6 1.3
>> >> Constraints 3 1000001 6317.568
>> >> 1891.2 2.5
>> >> Comm. energies 3 100002 70.784 21.2
>> >> 0.0
>> >> Rest 3 2314.844
>> >> 693.0 0.9
>> >>
>> >> --------------------------------------------------------------------------------------------------------------
>> >> Total 6 252968.148 75727.5
>> >> 100.0
>> >>
>> >> --------------------------------------------------------------------------------------------------------------
>> >>
>> >> --------------------------------------------------------------------------------------------------------------
>> >> PME redist. X/F 3 2000002 1945.551 582.4
>> >> 0.8
>> >> PME spread/gather 3 2000002 37219.607 11141.9
>> >> 14.7
>> >> PME 3D-FFT 3 2000002 21453.362 6422.2
>> >> 8.5
>> >> PME solve 3 1000001 5551.056
>> >> 1661.7 2.2
>> >>
>> >> --------------------------------------------------------------------------------------------------------------
>> >>
>> >> Parallel run - timing based on wallclock.
>> >>
>> >> NODE (s) Real (s) (%)
>> >> Time: 12621.257 12621.257 100.0
>> >> 3h30:21
>> >> (Mnbf/s) (GFlops)
>> >> (ns/day) (hour/ns)
>> >> Performance: 388.633 28.773 13.691 1.753
>> >> Finished mdrun on node 0 Wed Oct 20 22:01:14 2010
>> >>
>> >> ######################################
>> >> Comparing the performance values for the two simulations I saw that in
>> >> "numeric terms" the simulation using the GPU gave (for example) ~27
>> >> ns/day, while when I used mpi this value is aproximatelly half (13.7
>> >> ns/day).
>> >> However, when I compared the time that each simulation
>> >> started/finished, the simulation using mpi tooks 211 minutes while the
>> >> gpu simulation tooked 320 minutes to finish.
>> >>
>> >> My questions are:
>> >>
>> >> 1. Why in the performace values I got better results with the GPU?
>> >>
>> >> 2. Why the simulation running on GPU was 109 min. slower than on 6
>> >> cores, since my video card is a GTX 480 with 480 gpu cores? I was
>> >> expecting that the GPU would accelerate greatly the simulations.
>> >>
>> >>
>> >> Does anyone have some idea?
>> >>
>> >> Thanks,
>> >>
>> >> Renato
>> >> --
>> >> gmx-users mailing list gmx-users at gromacs.org
>> >> http://lists.gromacs.org/mailman/listinfo/gmx-users
>> >> Please search the archive at
>> >> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
>> >> Please don't post (un)subscribe requests to the list. Use the
>> >> www interface or send it to gmx-users-request at gromacs.org.
>> >> Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >>
>> > --
>> > gmx-users mailing list gmx-users at gromacs.org
>> > http://lists.gromacs.org/mailman/listinfo/gmx-users
>> > Please search the archive at
>> > http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
>> > Please don't post (un)subscribe requests to the list. Use the
>> > www interface or send it to gmx-users-request at gromacs.org.
>> > Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >
>> --
>> gmx-users mailing list gmx-users at gromacs.org
>> http://lists.gromacs.org/mailman/listinfo/gmx-users
>> Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
>> Please don't post (un)subscribe requests to the list. Use the
>> www interface or send it to gmx-users-request at gromacs.org.
>> Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>
>
>
> --
> ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
> 865-241-1537, ORNL PO BOX 2008 MS6309
>
> --
> gmx-users mailing list gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
More information about the gromacs.org_gmx-users
mailing list