[gmx-users] GPU slower than I7
Roland Schulz
roland at utk.edu
Sat Oct 23 00:05:53 CEST 2010
Hi,
On Fri, Oct 22, 2010 at 3:20 PM, Renato Freitas <renatoffs at gmail.com> wrote:
>
>
> Do you think that the "NODE" and "Real" time difference could be
> attributed to some compilation problem in the mdrun-gpu. Despite I'm
> asking this I didn't get any error in the compilation.
>
It is very odd that these are different for you system. What operating
system and compiler do you use?
Is HAVE_GETTIMEOFDAY set in src/config.h?
I attached a small test program which uses the two different timers used for
NODE and Real time. You can compile it with cc time.c -o time and run it
with ./time. Do you get roughly the same time twice with the test program or
do you see the same discrepancy as with GROMACS?
Roland
Thanks,
>
> Renato
>
> 2010/10/22 Szilárd Páll <szilard.pall at cbr.su.se>:
> > Hi Renato,
> >
> > First of all, what you're seeing is pretty normal, especially that you
> > have a CPU that is crossing the border of insane :) Why is it normal?
> > The PME algorithms are just simply not very well not well suited for
> > current GPU architectures. With an ill-suited algorithm you won't be
> > able to see the speedups you can often see in other application areas
> > - -even more so that you're comparing to Gromacs on a i7 980X. For
> > more info + benchmarks see the Gromacs-GPU page:
> > http://www.gromacs.org/gpu
> >
> > However, there is one strange thing you also pointed out. The fact
> > that the "NODE" and "Real" time in your mdrun-gpu timing summary is
> > not the same, but has 3x deviation is _very_ unusual. I've ran
> > mdrun-gpu on quite a wide variety of hardware but I've never seen
> > those two counter deviate. It might be an artifact from the cycle
> > counters used internally that behave in an unusual way on your CPU.
> >
> > One other thing I should point out is that you would be better off
> > using the standard mdrun which in 4.5 by default has thread-support
> > and therefore will run on a single cpu/node without MPI!
> >
> > Cheers,
> > --
> > Szilárd
> >
> >
> >
> > On Thu, Oct 21, 2010 at 9:18 PM, Renato Freitas <renatoffs at gmail.com>
> wrote:
> >> Hi gromacs users,
> >>
> >> I have installed the lastest version of gromacs (4.5.1) in an i7 980X
> >> (6 cores or 12 with HT on; 3.3 GHz) with 12GB of RAM and compiled its
> >> mpi version. Also I compiled the GPU-accelerated
> >> version of gromacs. Then I did a 2 ns simulation using a small system
> >> (11042 atoms) to compare the performance of mdrun-gpu vs mdrun_mpi.
> >> The results that I got are bellow:
> >>
> >> ############################################
> >> My *.mdp is:
> >>
> >> constraints = all-bonds
> >> integrator = md
> >> dt = 0.002 ; ps !
> >> nsteps = 1000000 ; total 2000 ps.
> >> nstlist = 10
> >> ns_type = grid
> >> coulombtype = PME
> >> rvdw = 0.9
> >> rlist = 0.9
> >> rcoulomb = 0.9
> >> fourierspacing = 0.10
> >> pme_order = 4
> >> ewald_rtol = 1e-5
> >> vdwtype = cut-off
> >> pbc = xyz
> >> epsilon_rf = 0
> >> comm_mode = linear
> >> nstxout = 1000
> >> nstvout = 0
> >> nstfout = 0
> >> nstxtcout = 1000
> >> nstlog = 1000
> >> nstenergy = 1000
> >> ; Berendsen temperature coupling is on in four groups
> >> tcoupl = berendsen
> >> tc-grps = system
> >> tau-t = 0.1
> >> ref-t = 298
> >> ; Pressure coupling is on
> >> Pcoupl = berendsen
> >> pcoupltype = isotropic
> >> tau_p = 0.5
> >> compressibility = 4.5e-5
> >> ref_p = 1.0
> >> ; Generate velocites is on at 298 K.
> >> gen_vel = no
> >>
> >> ########################
> >> RUNNING GROMACS ON GPU
> >>
> >> mdrun-gpu -s topol.tpr -v > & out &
> >>
> >> Here is a part of the md.log:
> >>
> >> Started mdrun on node 0 Wed Oct 20 09:52:09 2010
> >> .
> >> .
> >> .
> >> R E A L C Y C L E A N D T I M E A C C O U N T I N G
> >>
> >> Computing: Nodes Number G-Cycles Seconds %
> >>
> ------------------------------------------------------------------------------------------------------
> >> Write traj. 1 1021 106.075 31.7
> 0.2
> >> Rest 1 64125.577 19178.6
> 99.8
> >>
> ------------------------------------------------------------------------------------------------------
> >> Total 1 64231.652 19210.3 100.0
> >>
> ------------------------------------------------------------------------------------------------------
> >>
> >> NODE (s) Real (s)
> (%)
> >> Time: 6381.840 19210.349 33.2
> >> 1h46:21
> >> (Mnbf/s) (MFlops) (ns/day) (hour/ns)
> >> Performance: 0.000 0.001 27.077 0.886
> >>
> >> Finished mdrun on node 0 Wed Oct 20 15:12:19 2010
> >>
> >> ########################
> >> RUNNING GROMACS ON MPI
> >>
> >> mpirun -np 6 mdrun_mpi -s topol.tpr -npme 3 -v > & out &
> >>
> >> Here is a part of the md.log:
> >>
> >> Started mdrun on node 0 Wed Oct 20 18:30:52 2010
> >>
> >> R E A L C Y C L E A N D T I M E A C C O U N T I N G
> >>
> >> Computing: Nodes Number G-Cycles Seconds
> %
> >>
> --------------------------------------------------------------------------------------------------------------
> >> Domain decomp. 3 100001 1452.166 434.7
> 0.6
> >> DD comm. load 3 10001 0.745 0.2
> >> 0.0
> >> Send X to PME 3 1000001 249.003 74.5
> >> 0.1
> >> Comm. coord. 3 1000001 637.329 190.8
> >> 0.3
> >> Neighbor search 3 100001 8738.669 2616.0
> >> 3.5
> >> Force 3 1000001 99210.202
> >> 29699.2 39.2
> >> Wait + Comm. F 3 1000001 3361.591 1006.3
> 1.3
> >> PME mesh 3 1000001 66189.554 19814.2
> >> 26.2
> >> Wait + Comm. X/F 3 60294.513 8049.5 23.8
> >> Wait + Recv. PME F 3 1000001 801.897 240.1
> 0.3
> >> Write traj. 3 1015 33.464
> >> 10.0 0.0
> >> Update 3 1000001 3295.820
> >> 986.6 1.3
> >> Constraints 3 1000001 6317.568
> >> 1891.2 2.5
> >> Comm. energies 3 100002 70.784 21.2
> >> 0.0
> >> Rest 3 2314.844
> >> 693.0 0.9
> >>
> --------------------------------------------------------------------------------------------------------------
> >> Total 6 252968.148 75727.5
> >> 100.0
> >>
> --------------------------------------------------------------------------------------------------------------
> >>
> --------------------------------------------------------------------------------------------------------------
> >> PME redist. X/F 3 2000002 1945.551 582.4
> >> 0.8
> >> PME spread/gather 3 2000002 37219.607 11141.9
> 14.7
> >> PME 3D-FFT 3 2000002 21453.362 6422.2
> >> 8.5
> >> PME solve 3 1000001 5551.056
> >> 1661.7 2.2
> >>
> --------------------------------------------------------------------------------------------------------------
> >>
> >> Parallel run - timing based on wallclock.
> >>
> >> NODE (s) Real (s) (%)
> >> Time: 12621.257 12621.257 100.0
> >> 3h30:21
> >> (Mnbf/s) (GFlops)
> (ns/day) (hour/ns)
> >> Performance: 388.633 28.773 13.691 1.753
> >> Finished mdrun on node 0 Wed Oct 20 22:01:14 2010
> >>
> >> ######################################
> >> Comparing the performance values for the two simulations I saw that in
> >> "numeric terms" the simulation using the GPU gave (for example) ~27
> >> ns/day, while when I used mpi this value is aproximatelly half (13.7
> >> ns/day).
> >> However, when I compared the time that each simulation
> >> started/finished, the simulation using mpi tooks 211 minutes while the
> >> gpu simulation tooked 320 minutes to finish.
> >>
> >> My questions are:
> >>
> >> 1. Why in the performace values I got better results with the GPU?
> >>
> >> 2. Why the simulation running on GPU was 109 min. slower than on 6
> >> cores, since my video card is a GTX 480 with 480 gpu cores? I was
> >> expecting that the GPU would accelerate greatly the simulations.
> >>
> >>
> >> Does anyone have some idea?
> >>
> >> Thanks,
> >>
> >> Renato
> >> --
> >> gmx-users mailing list gmx-users at gromacs.org
> >> http://lists.gromacs.org/mailman/listinfo/gmx-users
> >> Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> >> Please don't post (un)subscribe requests to the list. Use the
> >> www interface or send it to gmx-users-request at gromacs.org.
> >> Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>
> > --
> > gmx-users mailing list gmx-users at gromacs.org
> > http://lists.gromacs.org/mailman/listinfo/gmx-users
> > Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> > Please don't post (un)subscribe requests to the list. Use the
> > www interface or send it to gmx-users-request at gromacs.org.
> > Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> --
> gmx-users mailing list gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
>
--
ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
865-241-1537, ORNL PO BOX 2008 MS6309
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20101022/c0640198/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: time.c
Type: application/octet-stream
Size: 669 bytes
Desc: not available
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20101022/c0640198/attachment.obj>
More information about the gromacs.org_gmx-users
mailing list