[gmx-users] GPU low performance
Carmen Di Giovanni
cdigiova at unina.it
Wed Feb 18 17:57:33 CET 2015
Dear all, the full log file is too big.
However in the middle part of it, there are only informations about
the energies at each time. The first part is alrady posted.
So I post the final part of it:
-------------------------------------------------------------
Step Time Lambda
10000000 20000.00000 0.00000
Writing checkpoint, step 10000000 at Mon Dec 29 13:16:22 2014
Energies (kJ/mol)
G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14
9.34206e+03 4.14342e+03 2.79172e+03 -1.75465e+02 7.99811e+04
LJ (SR) Coulomb (SR) Coul. recip. Potential Kinetic En.
1.01135e+06 -7.13064e+06 2.01349e+04 -6.00306e+06 1.08201e+06
Total Energy Conserved En. Temperature Pressure (bar) Constr. rmsd
-4.92106e+06 -5.86747e+06 2.99426e+02 1.29480e+02 2.16280e-05
<====== ############### ==>
<==== A V E R A G E S ====>
<== ############### ======>
Statistics over 10000001 steps using 10000001 frames
Energies (kJ/mol)
G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14
9.45818e+03 4.30665e+03 2.92407e+03 -1.75556e+02 8.02473e+04
LJ (SR) Coulomb (SR) Coul. recip. Potential Kinetic En.
1.01284e+06 -7.13138e+06 2.01510e+04 -6.00163e+06 1.08407e+06
Total Energy Conserved En. Temperature Pressure (bar) Constr. rmsd
-4.91756e+06 -5.38519e+06 2.99998e+02 1.37549e+02 0.00000e+00
Total Virial (kJ/mol)
3.42887e+05 1.63625e+01 1.23658e+02
1.67406e+01 3.42916e+05 -4.27834e+01
1.23997e+02 -4.29636e+01 3.42881e+05
Pressure (bar)
1.37573e+02 7.50214e-02 -1.03916e-01
7.22048e-02 1.37623e+02 -1.66417e-02
-1.06444e-01 -1.52990e-02 1.37453e+02
M E G A - F L O P S A C C O U N T I N G
NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels
RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
W3=SPC/TIP3p W4=TIP4p (single or pairs)
V&F=Potential and force V=Potential only F=Force only
Computing: M-Number M-Flops % Flops
-----------------------------------------------------------------------------
Pair Search distance check 16343508.605344 147091577.448 0.0
NxN Ewald Elec. + LJ [V&F] 5072118956.506304 542716728346.174 98.1
1,4 nonbonded interactions 95860.009586 8627400.863 0.0
Calc Weights 13039741.303974 469430686.943 0.1
Spread Q Bspline 278181147.818112 556362295.636 0.1
Gather F Bspline 278181147.818112 1669086886.909 0.3
3D-FFT 880787450.909824 7046299607.279 1.3
Solve PME 163837.909504 10485626.208 0.0
Shift-X 108664.934658 651989.608 0.0
Angles 86090.008609 14463121.446 0.0
Propers 31380.003138 7186020.719 0.0
Impropers 28790.002879 5988320.599 0.0
Virial 4347030.434703 78246547.825 0.0
Stop-CM 4346580.869316 43465808.693 0.0
Calc-Ekin 4346580.869316 117357683.472 0.0
Lincs 59130.017739 3547801.064 0.0
Lincs-Mat 1033080.309924 4132321.240 0.0
Constraint-V 4406580.881316 35252647.051 0.0
Constraint-Vir 4347450.434745 104338810.434 0.0
Settle 1429440.428832 461709258.513 0.1
-----------------------------------------------------------------------------
Total 553500452758.122 100.0
-----------------------------------------------------------------------------
R E A L C Y C L E A N D T I M E A C C O U N T I N G
On 1 MPI rank, each using 32 OpenMP threads
Computing: Num Num Call Wall time Giga-Cycles
Ranks Threads Count (s) total sum %
-----------------------------------------------------------------------------
Neighbor search 1 32 250001 6231.657 518475.694 1.1
Launch GPU ops. 1 32 10000001 1825.689 151897.833 0.3
Force 1 32 10000001 49568.959 4124152.027 8.4
PME mesh 1 32 10000001 194798.850 16207321.863 32.8
Wait GPU local 1 32 10000001 170272.438 14166717.115 28.7
NB X/F buffer ops. 1 32 19750001 29175.632 2427421.177 4.9
Write traj. 1 32 20635 1567.928 130452.056 0.3
Update 1 32 10000001 13312.819 1107630.452 2.2
Constraints 1 32 10000001 34210.142 2846293.908 5.8
Rest 92338.781 7682613.897 15.6
-----------------------------------------------------------------------------
Total 593302.894 49362976.023 100.0
-----------------------------------------------------------------------------
Breakdown of PME mesh computation
-----------------------------------------------------------------------------
PME spread/gather 1 32 20000002 144767.207 12044674.424 24.4
PME 3D-FFT 1 32 20000002 39499.157 3286341.501 6.7
PME solve Elec 1 32 10000001 9947.340 827621.589 1.7
-----------------------------------------------------------------------------
GPU timings
-----------------------------------------------------------------------------
Computing: Count Wall t (s) ms/step %
-----------------------------------------------------------------------------
Pair list H2D 250001 935.751 3.743 0.2
X / q H2D 10000001 11509.209 1.151 2.8
Nonbonded F+ene k. 9750000 377111.949 38.678 92.0
Nonbonded F+ene+prune k. 250001 12049.010 48.196 2.9
F D2H 10000001 8129.292 0.813 2.0
-----------------------------------------------------------------------------
Total 409735.211 40.974 100.0
-----------------------------------------------------------------------------
Force evaluation time GPU/CPU: 40.974 ms/24.437 ms = 1.677
For optimal performance this ratio should be close to 1!
NOTE: The GPU has >20% more load than the CPU. This imbalance causes
performance loss, consider using a shorter cut-off and a finer PME grid.
Core t (s) Wall t (s) (%)
Time: 18713831.228 593302.894 3154.2
6d20h48:22
(ns/day) (hour/ns)
Performance: 2.913 8.240
Finished mdrun on rank 0 Mon Dec 29 13:16:24 2014
-------------------------------------------------------
thank you in advance
Carmen
--
Carmen Di Giovanni, PhD
Dept. of Pharmaceutical and Toxicological Chemistry
"Drug Discovery Lab"
University of Naples "Federico II"
Via D. Montesano, 49
80131 Naples
Tel.: ++39 081 678623
Fax: ++39 081 678100
Email: cdigiova at unina.it
Quoting Szilárd Páll <pall.szilard at gmail.com>:
> We need a *full* log file, not parts of it!
>
> You can try running with "-ntomp 16 -pin on" - it may be a bit faster
> not not use HyperThreading.
> --
> Szilárd
>
>
> On Wed, Feb 18, 2015 at 5:20 PM, Carmen Di Giovanni
> <cdigiova at unina.it> wrote:
>> Justin,
>> the problem is evident for all calculations.
>> This is the log file of a recent run:
>>
>> --------------------------------------------------------------------------------
>>
>> Log file opened on Mon Dec 22 16:28:00 2014
>> Host: localhost.localdomain pid: 8378 rank ID: 0 number of ranks: 1
>> GROMACS: gmx mdrun, VERSION 5.0
>>
>> GROMACS is written by:
>> Emile Apol Rossen Apostolov Herman J.C. Berendsen Par Bjelkmar
>> Aldert van Buuren Rudi van Drunen Anton Feenstra Sebastian Fritsch
>> Gerrit Groenhof Christoph Junghans Peter Kasson Carsten Kutzner
>> Per Larsson Justin A. Lemkul Magnus Lundborg Pieter Meulenhoff
>> Erik Marklund Teemu Murtola Szilard Pall Sander Pronk
>> Roland Schulz Alexey Shvetsov Michael Shirts Alfons Sijbers
>> Peter Tieleman Christian Wennberg Maarten Wolf
>> and the project leaders:
>> Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel
>>
>> Copyright (c) 1991-2000, University of Groningen, The Netherlands.
>> Copyright (c) 2001-2014, The GROMACS development team at
>> Uppsala University, Stockholm University and
>> the Royal Institute of Technology, Sweden.
>> check out http://www.gromacs.org for more information.
>>
>> GROMACS is free software; you can redistribute it and/or modify it
>> under the terms of the GNU Lesser General Public License
>> as published by the Free Software Foundation; either version 2.1
>> of the License, or (at your option) any later version.
>>
>> GROMACS: gmx mdrun, VERSION 5.0
>> Executable: /opt/SW/gromacs-5.0/build/mpi-cuda/bin/gmx_mpi
>> Library dir: /opt/SW/gromacs-5.0/share/top
>> Command line:
>> gmx_mpi mdrun -deffnm prod_20ns
>>
>> Gromacs version: VERSION 5.0
>> Precision: single
>> Memory model: 64 bit
>> MPI library: MPI
>> OpenMP support: enabled
>> GPU support: enabled
>> invsqrt routine: gmx_software_invsqrt(x)
>> SIMD instructions: AVX_256
>> FFT library: fftw-3.3.3-sse2
>> RDTSCP usage: enabled
>> C++11 compilation: disabled
>> TNG support: enabled
>> Tracing support: disabled
>> Built on: Thu Jul 31 18:30:37 CEST 2014
>> Built by: root at localhost.localdomain [CMAKE]
>> Build OS/arch: Linux 2.6.32-431.el6.x86_64 x86_64
>> Build CPU vendor: GenuineIntel
>> Build CPU brand: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
>> Build CPU family: 6 Model: 62 Stepping: 4
>> Build CPU features: aes apic avx clfsh cmov cx8 cx16 f16c htt lahf_lm mmx
>> msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp sse2 sse3
>> sse4.1 sse4.2 ssse3 tdt x2apic
>> C compiler: /usr/bin/cc GNU 4.4.7
>> C compiler flags: -mavx -Wno-maybe-uninitialized -Wextra
>> -Wno-missing-field-initializers -Wno-sign-compare -Wpointer-arith -Wall
>> -Wno-unused -Wunused-value -Wunused-parameter -fomit-frame-pointer
>> -funroll-all-loops -Wno-array-bounds -O3 -DNDEBUG
>> C++ compiler: /usr/bin/c++ GNU 4.4.7
>> C++ compiler flags: -mavx -Wextra -Wno-missing-field-initializers
>> -Wpointer-arith -Wall -Wno-unused-function -fomit-frame-pointer
>> -funroll-all-loops -Wno-array-bounds -O3 -DNDEBUG
>> Boost version: 1.55.0 (internal)
>> CUDA compiler: /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda compiler
>> driver;Copyright (c) 2005-2013 NVIDIA Corporation;Built on
>> Thu_Mar_13_11:58:58_PDT_2014;Cuda compilation tools, release 6.0, V6.0.1
>> CUDA compiler
>> flags:-gencode;arch=compute_20,code=sm_20;-gencode;arch=compute_20,code=sm_21;-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_35,code=compute_35;-use_fast_math;-Xcompiler;-fPIC
>> ;
>> ;-mavx;-Wextra;-Wno-missing-field-initializers;-Wpointer-arith;-Wall;-Wno-unused-function;-fomit-frame-pointer;-funroll-all-loops;-Wno-array-bounds;-O3;-DNDEBUG
>> CUDA driver: 6.50
>> CUDA runtime: 6.0
>>
>>
>>
>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>> B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
>> GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable
>> molecular simulation
>> J. Chem. Theory Comput. 4 (2008) pp. 435-447
>> -------- -------- --- Thank You --- -------- --------
>>
>>
>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>> D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C.
>> Berendsen
>> GROMACS: Fast, Flexible and Free
>> J. Comp. Chem. 26 (2005) pp. 1701-1719
>> -------- -------- --- Thank You --- -------- --------
>>
>>
>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>> E. Lindahl and B. Hess and D. van der Spoel
>> GROMACS 3.0: A package for molecular simulation and trajectory analysis
>> J. Mol. Mod. 7 (2001) pp. 306-317
>> -------- -------- --- Thank You --- -------- --------
>>
>>
>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>> H. J. C. Berendsen, D. van der Spoel and R. van Drunen
>> GROMACS: A message-passing parallel molecular dynamics implementation
>> Comp. Phys. Comm. 91 (1995) pp. 43-56
>> -------- -------- --- Thank You --- -------- --------
>>
>>
>> For optimal performance with a GPU nstlist (now 10) should be larger.
>> The optimum depends on your CPU and GPU resources.
>> You might want to try several nstlist values.
>> Changing nstlist from 10 to 40, rlist from 1.2 to 1.285
>>
>> Input Parameters:
>> integrator = md
>> tinit = 0
>> dt = 0.002
>> nsteps = 10000000
>> init-step = 0
>> simulation-part = 1
>> comm-mode = Linear
>> nstcomm = 1
>> bd-fric = 0
>> ld-seed = 1993
>> emtol = 10
>> emstep = 0.01
>> niter = 20
>> fcstep = 0
>> nstcgsteep = 1000
>> nbfgscorr = 10
>> rtpi = 0.05
>> nstxout = 2500
>> nstvout = 2500
>> nstfout = 0
>> nstlog = 2500
>> nstcalcenergy = 1
>> nstenergy = 2500
>> nstxout-compressed = 500
>> compressed-x-precision = 1000
>> cutoff-scheme = Verlet
>> nstlist = 40
>> ns-type = Grid
>> pbc = xyz
>> periodic-molecules = FALSE
>> verlet-buffer-tolerance = 0.005
>> rlist = 1.285
>> rlistlong = 1.285
>> nstcalclr = 10
>> coulombtype = PME
>> coulomb-modifier = Potential-shift
>> rcoulomb-switch = 0
>> rcoulomb = 1.2
>> epsilon-r = 1
>> epsilon-rf = 1
>> vdw-type = Cut-off
>> vdw-modifier = Potential-shift
>> rvdw-switch = 0
>> rvdw = 1.2
>> DispCorr = No
>> table-extension = 1
>> fourierspacing = 0.135
>> fourier-nx = 128
>> fourier-ny = 128
>> fourier-nz = 128
>> pme-order = 4
>> ewald-rtol = 1e-05
>> ewald-rtol-lj = 0.001
>> lj-pme-comb-rule = Geometric
>> ewald-geometry = 0
>> epsilon-surface = 0
>> implicit-solvent = No
>> gb-algorithm = Still
>> nstgbradii = 1
>> rgbradii = 2
>> gb-epsilon-solvent = 80
>> gb-saltconc = 0
>> gb-obc-alpha = 1
>> gb-obc-beta = 0.8
>> gb-obc-gamma = 4.85
>> gb-dielectric-offset = 0.009
>> sa-algorithm = Ace-approximation
>> sa-surface-tension = 2.092
>> tcoupl = V-rescale
>> nsttcouple = 10
>> nh-chain-length = 0
>> print-nose-hoover-chain-variables = FALSE
>> pcoupl = No
>> pcoupltype = Semiisotropic
>> nstpcouple = -1
>> tau-p = 0.5
>> compressibility (3x3):
>> compressibility[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>> compressibility[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>> compressibility[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>> ref-p (3x3):
>> ref-p[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>> ref-p[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>> ref-p[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>> refcoord-scaling = No
>> posres-com (3):
>> posres-com[0]= 0.00000e+00
>> posres-com[1]= 0.00000e+00
>> posres-com[2]= 0.00000e+00
>> posres-comB (3):
>> posres-comB[0]= 0.00000e+00
>> posres-comB[1]= 0.00000e+00
>> posres-comB[2]= 0.00000e+00
>> QMMM = FALSE
>> QMconstraints = 0
>> QMMMscheme = 0
>> MMChargeScaleFactor = 1
>> qm-opts:
>> ngQM = 0
>> constraint-algorithm = Lincs
>> continuation = FALSE
>> Shake-SOR = FALSE
>> shake-tol = 0.0001
>> lincs-order = 4
>> lincs-iter = 1
>> lincs-warnangle = 30
>> nwall = 0
>> wall-type = 9-3
>> wall-r-linpot = -1
>> wall-atomtype[0] = -1
>> wall-atomtype[1] = -1
>> wall-density[0] = 0
>> wall-density[1] = 0
>> wall-ewald-zfac = 3
>> pull = no
>> rotation = FALSE
>> interactiveMD = FALSE
>> disre = No
>> disre-weighting = Conservative
>> disre-mixed = FALSE
>> dr-fc = 1000
>> dr-tau = 0
>> nstdisreout = 100
>> orire-fc = 0
>> orire-tau = 0
>> nstorireout = 100
>> free-energy = no
>> cos-acceleration = 0
>> deform (3x3):
>> deform[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>> deform[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>> deform[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>> simulated-tempering = FALSE
>> E-x:
>> n = 0
>> E-xt:
>> n = 0
>> E-y:
>> n = 0
>> E-yt:
>> n = 0
>> E-z:
>> n = 0
>> E-zt:
>> n = 0
>> swapcoords = no
>> adress = FALSE
>> userint1 = 0
>> userint2 = 0
>> userint3 = 0
>> userint4 = 0
>> userreal1 = 0
>> userreal2 = 0
>> userreal3 = 0
>> userreal4 = 0
>> grpopts:
>> nrdf: 869226
>> ref-t: 300
>> tau-t: 0.1
>> annealing: No
>> annealing-npoints: 0
>> acc: 0 0 0
>> nfreeze: N N N
>> energygrp-flags[ 0]: 0
>> Using 1 MPI process
>> Using 32 OpenMP threads
>>
>> Detecting CPU SIMD instructions.
>> Present hardware specification:
>> Vendor: GenuineIntel
>> Brand: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
>> Family: 6 Model: 62 Stepping: 4
>> Features: aes apic avx clfsh cmov cx8 cx16 f16c htt lahf_lm mmx msr
>> nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp sse2 sse3
>> sse4.1 sse4.2 ssse3 tdt x2apic
>> SIMD instructions most likely to fit this hardware: AVX_256
>> SIMD instructions selected at GROMACS compile time: AVX_256
>>
>>
>> 2 GPUs detected on host localhost.localdomain:
>> #0: NVIDIA Tesla K20c, compute cap.: 3.5, ECC: yes, stat: compatible
>> #1: NVIDIA GeForce GTX 650, compute cap.: 3.0, ECC: no, stat: compatible
>>
>> 1 GPU auto-selected for this run.
>> Mapping of GPU to the 1 PP rank in this node: #0
>>
>>
>> NOTE: potentially sub-optimal launch configuration, gmx_mpi started with
>> less
>> PP MPI process per node than GPUs available.
>> Each PP MPI process can use only one GPU, 1 GPU per node will be used.
>>
>> Will do PME sum in reciprocal space for electrostatic interactions.
>>
>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>> U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L. G. Pedersen
>> A smooth particle mesh Ewald method
>> J. Chem. Phys. 103 (1995) pp. 8577-8592
>> -------- -------- --- Thank You --- -------- --------
>>
>> Will do ordinary reciprocal space Ewald sum.
>> Using a Gaussian width (1/beta) of 0.384195 nm for Ewald
>> Cut-off's: NS: 1.285 Coulomb: 1.2 LJ: 1.2
>> System total charge: -0.012
>> Generated table with 1142 data points for Ewald.
>> Tabscale = 500 points/nm
>> Generated table with 1142 data points for LJ6.
>> Tabscale = 500 points/nm
>> Generated table with 1142 data points for LJ12.
>> Tabscale = 500 points/nm
>> Generated table with 1142 data points for 1-4 COUL.
>> Tabscale = 500 points/nm
>> Generated table with 1142 data points for 1-4 LJ6.
>> Tabscale = 500 points/nm
>> Generated table with 1142 data points for 1-4 LJ12.
>> Tabscale = 500 points/nm
>>
>> Using CUDA 8x8 non-bonded kernels
>>
>> Potential shift: LJ r^-12: -1.122e-01 r^-6: -3.349e-01, Ewald -1.000e-05
>> Initialized non-bonded Ewald correction tables, spacing: 7.82e-04 size: 1536
>>
>> Removing pbc first time
>> Pinning threads with an auto-selected logical core stride of 1
>>
>> Initializing LINear Constraint Solver
>>
>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>> B. Hess and H. Bekker and H. J. C. Berendsen and J. G. E. M. Fraaije
>> LINCS: A Linear Constraint Solver for molecular simulations
>> J. Comp. Chem. 18 (1997) pp. 1463-1472
>> -------- -------- --- Thank You --- -------- --------
>>
>> The number of constraints is 5913
>>
>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>> S. Miyamoto and P. A. Kollman
>> SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for Rigid
>> Water Models
>> J. Comp. Chem. 13 (1992) pp. 952-962
>> -------- -------- --- Thank You --- -------- --------
>>
>> Center of mass motion removal mode is Linear
>> We have the following groups for center of mass motion removal:
>> 0: rest
>>
>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>> G. Bussi, D. Donadio and M. Parrinello
>> Canonical sampling through velocity rescaling
>> J. Chem. Phys. 126 (2007) pp. 014101
>> -------- -------- --- Thank You --- -------- --------
>>
>> There are: 434658 Atoms
>>
>> Constraining the starting coordinates (step 0)
>>
>> Constraining the coordinates at t0-dt (step 0)
>> RMS relative constraint deviation after constraining: 3.67e-05
>> Initial temperature: 300.5 K
>>
>> Started mdrun on rank 0 Mon Dec 22 16:28:01 2014
>> Step Time Lambda
>> 0 0.00000 0.00000
>>
>> Energies (kJ/mol)
>> G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14
>> 9.74139e+03 4.34956e+03 2.97359e+03 -1.93107e+02 8.05534e+04
>> LJ (SR) Coulomb (SR) Coul. recip. Potential Kinetic En.
>> 1.01340e+06 -7.13271e+06 2.01361e+04 -6.00175e+06 1.09887e+06
>> Total Energy Conserved En. Temperature Pressure (bar) Constr. rmsd
>> -4.90288e+06 -4.90288e+06 3.04092e+02 1.70897e+02 2.16683e-05
>>
>> step 80: timed with pme grid 128 128 128, coulomb cutoff 1.200: 6279.0
>> M-cycles
>> step 160: timed with pme grid 112 112 112, coulomb cutoff 1.306: 6962.2
>> M-cycles
>> step 240: timed with pme grid 100 100 100, coulomb cutoff 1.463: 8406.5
>> M-cycles
>> step 320: timed with pme grid 128 128 128, coulomb cutoff 1.200: 6424.0
>> M-cycles
>> step 400: timed with pme grid 120 120 120, coulomb cutoff 1.219: 6369.1
>> M-cycles
>> step 480: timed with pme grid 112 112 112, coulomb cutoff 1.306: 7309.0
>> M-cycles
>> step 560: timed with pme grid 108 108 108, coulomb cutoff 1.355: 7521.2
>> M-cycles
>> step 640: timed with pme grid 104 104 104, coulomb cutoff 1.407: 8369.8
>> M-cycles
>> optimal pme grid 128 128 128, coulomb cutoff 1.200
>> Step Time Lambda
>> 2500 5.00000 0.00000
>>
>> Energies (kJ/mol)
>> G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14
>> 9.72545e+03 4.33046e+03 2.98087e+03 -1.95794e+02 8.05967e+04
>> LJ (SR) Coulomb (SR) Coul. recip. Potential Kinetic En.
>> 1.01293e+06 -7.13110e+06 2.01689e+04 -6.00057e+06 1.08489e+06
>> Total Energy Conserved En. Temperature Pressure (bar) Constr. rmsd
>> -4.91567e+06 -4.90300e+06 3.00225e+02 1.36173e+02 2.25998e-05
>>
>> Step Time Lambda
>> 5000 10.00000 0.00000
>>
>> ............
>>
>> -------------------------------------------------------------------------------
>>
>>
>> Thank you in advance
>>
>> --
>> Carmen Di Giovanni, PhD
>> Dept. of Pharmaceutical and Toxicological Chemistry
>> "Drug Discovery Lab"
>> University of Naples "Federico II"
>> Via D. Montesano, 49
>> 80131 Naples
>> Tel.: ++39 081 678623
>> Fax: ++39 081 678100
>> Email: cdigiova at unina.it
>>
>>
>>
>> Quoting Justin Lemkul <jalemkul at vt.edu>:
>>
>>>
>>>
>>> On 2/18/15 11:09 AM, Barnett, James W wrote:
>>>>
>>>> What's your exact command?
>>>>
>>>
>>> A full .log file would be even better; it would tell us everything we need
>>> to know :)
>>>
>>> -Justin
>>>
>>>> Have you reviewed this page:
>>>> http://www.gromacs.org/Documentation/Acceleration_and_parallelization
>>>>
>>>> James "Wes" Barnett
>>>> Ph.D. Candidate
>>>> Chemical and Biomolecular Engineering
>>>>
>>>> Tulane University
>>>> Boggs Center for Energy and Biotechnology, Room 341-B
>>>>
>>>> ________________________________________
>>>> From: gromacs.org_gmx-users-bounces at maillist.sys.kth.se
>>>> <gromacs.org_gmx-users-bounces at maillist.sys.kth.se> on behalf of Carmen Di
>>>> Giovanni <cdigiova at unina.it>
>>>> Sent: Wednesday, February 18, 2015 10:06 AM
>>>> To: gromacs.org_gmx-users at maillist.sys.kth.se
>>>> Subject: Re: [gmx-users] GPU low performance
>>>>
>>>> I post the message of a md run :
>>>>
>>>>
>>>> Force evaluation time GPU/CPU: 40.974 ms/24.437 ms = 1.677
>>>> For optimal performance this ratio should be close to 1!
>>>>
>>>>
>>>> NOTE: The GPU has >20% more load than the CPU. This imbalance causes
>>>> performance loss, consider using a shorter cut-off and a finer PME
>>>> grid.
>>>>
>>>> As can I solved this problem ?
>>>> Thank you in advance
>>>>
>>>>
>>>> --
>>>> Carmen Di Giovanni, PhD
>>>> Dept. of Pharmaceutical and Toxicological Chemistry
>>>> "Drug Discovery Lab"
>>>> University of Naples "Federico II"
>>>> Via D. Montesano, 49
>>>> 80131 Naples
>>>> Tel.: ++39 081 678623
>>>> Fax: ++39 081 678100
>>>> Email: cdigiova at unina.it
>>>>
>>>>
>>>>
>>>> Quoting Justin Lemkul <jalemkul at vt.edu>:
>>>>
>>>>>
>>>>>
>>>>> On 2/18/15 10:30 AM, Carmen Di Giovanni wrote:
>>>>>>
>>>>>> Daear all,
>>>>>> I'm working on a machine with an INVIDIA Teska K20.
>>>>>> After a minimization on a protein of 1925 atoms this is the mesage:
>>>>>>
>>>>>> Force evaluation time GPU/CPU: 2.923 ms/116.774 ms = 0.025
>>>>>> For optimal performance this ratio should be close to 1!
>>>>>>
>>>>>
>>>>> Minimization is a poor indicator of performance. Do a real MD run.
>>>>>
>>>>>>
>>>>>> NOTE: The GPU has >25% less load than the CPU. This imbalance causes
>>>>>> performance loss.
>>>>>>
>>>>>> Core t (s) Wall t (s) (%)
>>>>>> Time: 3289.010 205.891 1597.4
>>>>>> (steps/hour)
>>>>>> Performance: 8480.2
>>>>>> Finished mdrun on rank 0 Wed Feb 18 15:50:06 2015
>>>>>>
>>>>>>
>>>>>> Cai I improve the performance?
>>>>>> At the moment in the forum I didn't full informations to solve this
>>>>>> problem.
>>>>>> In attachment there is the log. file
>>>>>>
>>>>>
>>>>> The list does not accept attachments. If you wish to share a file,
>>>>> upload it to a file-sharing service and provide a URL. The full
>>>>> .log is quite important for understanding your hardware,
>>>>> optimizations, and seeing full details of the performance breakdown.
>>>>> But again, base your assessment on MD, not EM.
>>>>>
>>>>> -Justin
>>>>>
>>>>> --
>>>>> ==================================================
>>>>>
>>>>> Justin A. Lemkul, Ph.D.
>>>>> Ruth L. Kirschstein NRSA Postdoctoral Fellow
>>>>>
>>>>> Department of Pharmaceutical Sciences
>>>>> School of Pharmacy
>>>>> Health Sciences Facility II, Room 629
>>>>> University of Maryland, Baltimore
>>>>> 20 Penn St.
>>>>> Baltimore, MD 21201
>>>>>
>>>>> jalemkul at outerbanks.umaryland.edu | (410) 706-7441
>>>>> http://mackerell.umaryland.edu/~jalemkul
>>>>>
>>>>> ==================================================
>>>>> --
>>>>> Gromacs Users mailing list
>>>>>
>>>>> * Please search the archive at
>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>>> posting!
>>>>>
>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>
>>>>> * For (un)subscribe requests visit
>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>>>>> or send a mail to gmx-users-request at gromacs.org.
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Gromacs Users mailing list
>>>>
>>>> * Please search the archive at
>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List
>>>> before posting!
>>>>
>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>
>>>> * For (un)subscribe requests visit
>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>>> send a mail to gmx-users-request at gromacs.org.
>>>>
>>>
>>> --
>>> ==================================================
>>>
>>> Justin A. Lemkul, Ph.D.
>>> Ruth L. Kirschstein NRSA Postdoctoral Fellow
>>>
>>> Department of Pharmaceutical Sciences
>>> School of Pharmacy
>>> Health Sciences Facility II, Room 629
>>> University of Maryland, Baltimore
>>> 20 Penn St.
>>> Baltimore, MD 21201
>>>
>>> jalemkul at outerbanks.umaryland.edu | (410) 706-7441
>>> http://mackerell.umaryland.edu/~jalemkul
>>>
>>> ==================================================
>>> --
>>> Gromacs Users mailing list
>>>
>>> * Please search the archive at
>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>>>
>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>
>>> * For (un)subscribe requests visit
>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send
>>> a mail to gmx-users-request at gromacs.org.
>>>
>>>
>>
>>
>>
>> --
>> Gromacs Users mailing list
>>
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a
>> mail to gmx-users-request at gromacs.org.
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> or send a mail to gmx-users-request at gromacs.org.
>
More information about the gromacs.org_gmx-users
mailing list