[gmx-users] GPU slower than I7
Renato Freitas
renatoffs at gmail.com
Thu Oct 21 21:18:01 CEST 2010
Hi gromacs users,
I have installed the lastest version of gromacs (4.5.1) in an i7 980X
(6 cores or 12 with HT on; 3.3 GHz) with 12GB of RAM and compiled its
mpi version. Also I compiled the GPU-accelerated
version of gromacs. Then I did a 2 ns simulation using a small system
(11042 atoms) to compare the performance of mdrun-gpu vs mdrun_mpi.
The results that I got are bellow:
############################################
My *.mdp is:
constraints = all-bonds
integrator = md
dt = 0.002 ; ps !
nsteps = 1000000 ; total 2000 ps.
nstlist = 10
ns_type = grid
coulombtype = PME
rvdw = 0.9
rlist = 0.9
rcoulomb = 0.9
fourierspacing = 0.10
pme_order = 4
ewald_rtol = 1e-5
vdwtype = cut-off
pbc = xyz
epsilon_rf = 0
comm_mode = linear
nstxout = 1000
nstvout = 0
nstfout = 0
nstxtcout = 1000
nstlog = 1000
nstenergy = 1000
; Berendsen temperature coupling is on in four groups
tcoupl = berendsen
tc-grps = system
tau-t = 0.1
ref-t = 298
; Pressure coupling is on
Pcoupl = berendsen
pcoupltype = isotropic
tau_p = 0.5
compressibility = 4.5e-5
ref_p = 1.0
; Generate velocites is on at 298 K.
gen_vel = no
########################
RUNNING GROMACS ON GPU
mdrun-gpu -s topol.tpr -v > & out &
Here is a part of the md.log:
Started mdrun on node 0 Wed Oct 20 09:52:09 2010
.
.
.
R E A L C Y C L E A N D T I M E A C C O U N T I N G
Computing: Nodes Number G-Cycles Seconds %
------------------------------------------------------------------------------------------------------
Write traj. 1 1021 106.075 31.7 0.2
Rest 1 64125.577 19178.6 99.8
------------------------------------------------------------------------------------------------------
Total 1 64231.652 19210.3 100.0
------------------------------------------------------------------------------------------------------
NODE (s) Real (s) (%)
Time: 6381.840 19210.349 33.2
1h46:21
(Mnbf/s) (MFlops) (ns/day) (hour/ns)
Performance: 0.000 0.001 27.077 0.886
Finished mdrun on node 0 Wed Oct 20 15:12:19 2010
########################
RUNNING GROMACS ON MPI
mpirun -np 6 mdrun_mpi -s topol.tpr -npme 3 -v > & out &
Here is a part of the md.log:
Started mdrun on node 0 Wed Oct 20 18:30:52 2010
R E A L C Y C L E A N D T I M E A C C O U N T I N G
Computing: Nodes Number G-Cycles Seconds %
--------------------------------------------------------------------------------------------------------------
Domain decomp. 3 100001 1452.166 434.7 0.6
DD comm. load 3 10001 0.745 0.2
0.0
Send X to PME 3 1000001 249.003 74.5
0.1
Comm. coord. 3 1000001 637.329 190.8
0.3
Neighbor search 3 100001 8738.669 2616.0
3.5
Force 3 1000001 99210.202
29699.2 39.2
Wait + Comm. F 3 1000001 3361.591 1006.3 1.3
PME mesh 3 1000001 66189.554 19814.2
26.2
Wait + Comm. X/F 3 60294.513 8049.5 23.8
Wait + Recv. PME F 3 1000001 801.897 240.1 0.3
Write traj. 3 1015 33.464
10.0 0.0
Update 3 1000001 3295.820
986.6 1.3
Constraints 3 1000001 6317.568
1891.2 2.5
Comm. energies 3 100002 70.784 21.2
0.0
Rest 3 2314.844
693.0 0.9
--------------------------------------------------------------------------------------------------------------
Total 6 252968.148 75727.5
100.0
--------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------
PME redist. X/F 3 2000002 1945.551 582.4
0.8
PME spread/gather 3 2000002 37219.607 11141.9 14.7
PME 3D-FFT 3 2000002 21453.362 6422.2
8.5
PME solve 3 1000001 5551.056
1661.7 2.2
--------------------------------------------------------------------------------------------------------------
Parallel run - timing based on wallclock.
NODE (s) Real (s) (%)
Time: 12621.257 12621.257 100.0
3h30:21
(Mnbf/s) (GFlops) (ns/day) (hour/ns)
Performance: 388.633 28.773 13.691 1.753
Finished mdrun on node 0 Wed Oct 20 22:01:14 2010
######################################
Comparing the performance values for the two simulations I saw that in
"numeric terms" the simulation using the GPU gave (for example) ~27
ns/day, while when I used mpi this value is aproximatelly half (13.7
ns/day).
However, when I compared the time that each simulation
started/finished, the simulation using mpi tooks 211 minutes while the
gpu simulation tooked 320 minutes to finish.
My questions are:
1. Why in the performace values I got better results with the GPU?
2. Why the simulation running on GPU was 109 min. slower than on 6
cores, since my video card is a GTX 480 with 480 gpu cores? I was
expecting that the GPU would accelerate greatly the simulations.
Does anyone have some idea?
Thanks,
Renato
More information about the gromacs.org_gmx-users
mailing list