[gmx-users] GPU low performance
Szilárd Páll
pall.szilard at gmail.com
Wed Feb 18 18:14:29 CET 2015
On Wed, Feb 18, 2015 at 5:57 PM, Carmen Di Giovanni <cdigiova at unina.it> wrote:
> Dear all, the full log file is too big.
Use pastebin or similar services.
> However in the middle part of it, there are only informations about the
> energies at each time. The first part is alrady posted.
OK, so first of all, this looks nothing like the alarmingly low
CPU-GPU overlap you posted about initially. Here, the GPU you are
using simply can't keep up with 2x8 Haswell-E cores. You observing
this by looking at the fraction of runtime spent by the CPU waiting
for the GPU displayed in the performace table's "Wait GPU local" row
which shows 28.7% idling.
At the moment, the non-bonded computation which is fully don on the
GPU can't be split between CPU and GPU, so your options are limited
and most of these will a minor effect:
i) indirectly shift work back to the CPU and/or improve the overlap efficiency
a) try decreasing nstlist to 10-20-25
b) run on less threads (as suggested before) which will likely
improve performance in some non-overlap code parts
c) run with DD, e.g. -ntmpi 4 -ntomp 4/8 -gpu_id 0011 or -ntmpi 8
-gpu_id 00001111
ii) Reduce the "Rest" time. Not sure what's causing it, but you
simulation spends a substantial amount (15.6%) of the runtime in
unaccounted for likely serial calculation; i-b and i-c will likely
reduce this somewhat too;
iii) get more/faster or GPUs
> So I post the final part of it:
> -------------------------------------------------------------
> Step Time Lambda
> 10000000 20000.00000 0.00000
>
> Writing checkpoint, step 10000000 at Mon Dec 29 13:16:22 2014
>
>
> Energies (kJ/mol)
> G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14
> 9.34206e+03 4.14342e+03 2.79172e+03 -1.75465e+02 7.99811e+04
> LJ (SR) Coulomb (SR) Coul. recip. Potential Kinetic En.
> 1.01135e+06 -7.13064e+06 2.01349e+04 -6.00306e+06 1.08201e+06
> Total Energy Conserved En. Temperature Pressure (bar) Constr. rmsd
> -4.92106e+06 -5.86747e+06 2.99426e+02 1.29480e+02 2.16280e-05
>
> <====== ############### ==>
> <==== A V E R A G E S ====>
> <== ############### ======>
>
> Statistics over 10000001 steps using 10000001 frames
>
> Energies (kJ/mol)
> G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14
> 9.45818e+03 4.30665e+03 2.92407e+03 -1.75556e+02 8.02473e+04
> LJ (SR) Coulomb (SR) Coul. recip. Potential Kinetic En.
> 1.01284e+06 -7.13138e+06 2.01510e+04 -6.00163e+06 1.08407e+06
> Total Energy Conserved En. Temperature Pressure (bar) Constr. rmsd
> -4.91756e+06 -5.38519e+06 2.99998e+02 1.37549e+02 0.00000e+00
>
> Total Virial (kJ/mol)
> 3.42887e+05 1.63625e+01 1.23658e+02
> 1.67406e+01 3.42916e+05 -4.27834e+01
> 1.23997e+02 -4.29636e+01 3.42881e+05
>
> Pressure (bar)
> 1.37573e+02 7.50214e-02 -1.03916e-01
> 7.22048e-02 1.37623e+02 -1.66417e-02
> -1.06444e-01 -1.52990e-02 1.37453e+02
>
>
> M E G A - F L O P S A C C O U N T I N G
>
> NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels
> RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
> W3=SPC/TIP3p W4=TIP4p (single or pairs)
> V&F=Potential and force V=Potential only F=Force only
>
> Computing: M-Number M-Flops % Flops
> -----------------------------------------------------------------------------
> Pair Search distance check 16343508.605344 147091577.448 0.0
> NxN Ewald Elec. + LJ [V&F] 5072118956.506304 542716728346.174 98.1
> 1,4 nonbonded interactions 95860.009586 8627400.863 0.0
> Calc Weights 13039741.303974 469430686.943 0.1
> Spread Q Bspline 278181147.818112 556362295.636 0.1
> Gather F Bspline 278181147.818112 1669086886.909 0.3
> 3D-FFT 880787450.909824 7046299607.279 1.3
> Solve PME 163837.909504 10485626.208 0.0
> Shift-X 108664.934658 651989.608 0.0
> Angles 86090.008609 14463121.446 0.0
> Propers 31380.003138 7186020.719 0.0
> Impropers 28790.002879 5988320.599 0.0
> Virial 4347030.434703 78246547.825 0.0
> Stop-CM 4346580.869316 43465808.693 0.0
> Calc-Ekin 4346580.869316 117357683.472 0.0
> Lincs 59130.017739 3547801.064 0.0
> Lincs-Mat 1033080.309924 4132321.240 0.0
> Constraint-V 4406580.881316 35252647.051 0.0
> Constraint-Vir 4347450.434745 104338810.434 0.0
> Settle 1429440.428832 461709258.513 0.1
> -----------------------------------------------------------------------------
> Total 553500452758.122 100.0
> -----------------------------------------------------------------------------
>
>
> R E A L C Y C L E A N D T I M E A C C O U N T I N G
>
> On 1 MPI rank, each using 32 OpenMP threads
>
> Computing: Num Num Call Wall time Giga-Cycles
> Ranks Threads Count (s) total sum %
> -----------------------------------------------------------------------------
> Neighbor search 1 32 250001 6231.657 518475.694 1.1
> Launch GPU ops. 1 32 10000001 1825.689 151897.833 0.3
> Force 1 32 10000001 49568.959 4124152.027 8.4
> PME mesh 1 32 10000001 194798.850 16207321.863 32.8
> Wait GPU local 1 32 10000001 170272.438 14166717.115 28.7
> NB X/F buffer ops. 1 32 19750001 29175.632 2427421.177 4.9
> Write traj. 1 32 20635 1567.928 130452.056 0.3
> Update 1 32 10000001 13312.819 1107630.452 2.2
> Constraints 1 32 10000001 34210.142 2846293.908 5.8
> Rest 92338.781 7682613.897 15.6
> -----------------------------------------------------------------------------
> Total 593302.894 49362976.023 100.0
> -----------------------------------------------------------------------------
> Breakdown of PME mesh computation
> -----------------------------------------------------------------------------
> PME spread/gather 1 32 20000002 144767.207 12044674.424 24.4
> PME 3D-FFT 1 32 20000002 39499.157 3286341.501 6.7
> PME solve Elec 1 32 10000001 9947.340 827621.589 1.7
> -----------------------------------------------------------------------------
>
> GPU timings
> -----------------------------------------------------------------------------
> Computing: Count Wall t (s) ms/step %
> -----------------------------------------------------------------------------
> Pair list H2D 250001 935.751 3.743 0.2
> X / q H2D 10000001 11509.209 1.151 2.8
> Nonbonded F+ene k. 9750000 377111.949 38.678 92.0
> Nonbonded F+ene+prune k. 250001 12049.010 48.196 2.9
> F D2H 10000001 8129.292 0.813 2.0
> -----------------------------------------------------------------------------
> Total 409735.211 40.974 100.0
> -----------------------------------------------------------------------------
>
> Force evaluation time GPU/CPU: 40.974 ms/24.437 ms = 1.677
> For optimal performance this ratio should be close to 1!
>
>
> NOTE: The GPU has >20% more load than the CPU. This imbalance causes
> performance loss, consider using a shorter cut-off and a finer PME
> grid.
>
> Core t (s) Wall t (s) (%)
> Time: 18713831.228 593302.894 3154.2
> 6d20h48:22
> (ns/day) (hour/ns)
> Performance: 2.913 8.240
> Finished mdrun on rank 0 Mon Dec 29 13:16:24 2014
>
>
> -------------------------------------------------------
> thank you in advance
> Carmen
>
>
>
> --
> Carmen Di Giovanni, PhD
> Dept. of Pharmaceutical and Toxicological Chemistry
> "Drug Discovery Lab"
> University of Naples "Federico II"
> Via D. Montesano, 49
> 80131 Naples
> Tel.: ++39 081 678623
> Fax: ++39 081 678100
> Email: cdigiova at unina.it
>
>
>
> Quoting Szilárd Páll <pall.szilard at gmail.com>:
>
>> We need a *full* log file, not parts of it!
>>
>> You can try running with "-ntomp 16 -pin on" - it may be a bit faster
>> not not use HyperThreading.
>> --
>> Szilárd
>>
>>
>> On Wed, Feb 18, 2015 at 5:20 PM, Carmen Di Giovanni <cdigiova at unina.it>
>> wrote:
>>>
>>> Justin,
>>> the problem is evident for all calculations.
>>> This is the log file of a recent run:
>>>
>>>
>>> --------------------------------------------------------------------------------
>>>
>>> Log file opened on Mon Dec 22 16:28:00 2014
>>> Host: localhost.localdomain pid: 8378 rank ID: 0 number of ranks: 1
>>> GROMACS: gmx mdrun, VERSION 5.0
>>>
>>> GROMACS is written by:
>>> Emile Apol Rossen Apostolov Herman J.C. Berendsen Par Bjelkmar
>>> Aldert van Buuren Rudi van Drunen Anton Feenstra Sebastian
>>> Fritsch
>>> Gerrit Groenhof Christoph Junghans Peter Kasson Carsten Kutzner
>>> Per Larsson Justin A. Lemkul Magnus Lundborg Pieter
>>> Meulenhoff
>>> Erik Marklund Teemu Murtola Szilard Pall Sander Pronk
>>> Roland Schulz Alexey Shvetsov Michael Shirts Alfons Sijbers
>>> Peter Tieleman Christian Wennberg Maarten Wolf
>>> and the project leaders:
>>> Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel
>>>
>>> Copyright (c) 1991-2000, University of Groningen, The Netherlands.
>>> Copyright (c) 2001-2014, The GROMACS development team at
>>> Uppsala University, Stockholm University and
>>> the Royal Institute of Technology, Sweden.
>>> check out http://www.gromacs.org for more information.
>>>
>>> GROMACS is free software; you can redistribute it and/or modify it
>>> under the terms of the GNU Lesser General Public License
>>> as published by the Free Software Foundation; either version 2.1
>>> of the License, or (at your option) any later version.
>>>
>>> GROMACS: gmx mdrun, VERSION 5.0
>>> Executable: /opt/SW/gromacs-5.0/build/mpi-cuda/bin/gmx_mpi
>>> Library dir: /opt/SW/gromacs-5.0/share/top
>>> Command line:
>>> gmx_mpi mdrun -deffnm prod_20ns
>>>
>>> Gromacs version: VERSION 5.0
>>> Precision: single
>>> Memory model: 64 bit
>>> MPI library: MPI
>>> OpenMP support: enabled
>>> GPU support: enabled
>>> invsqrt routine: gmx_software_invsqrt(x)
>>> SIMD instructions: AVX_256
>>> FFT library: fftw-3.3.3-sse2
>>> RDTSCP usage: enabled
>>> C++11 compilation: disabled
>>> TNG support: enabled
>>> Tracing support: disabled
>>> Built on: Thu Jul 31 18:30:37 CEST 2014
>>> Built by: root at localhost.localdomain [CMAKE]
>>> Build OS/arch: Linux 2.6.32-431.el6.x86_64 x86_64
>>> Build CPU vendor: GenuineIntel
>>> Build CPU brand: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
>>> Build CPU family: 6 Model: 62 Stepping: 4
>>> Build CPU features: aes apic avx clfsh cmov cx8 cx16 f16c htt lahf_lm mmx
>>> msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp sse2
>>> sse3
>>> sse4.1 sse4.2 ssse3 tdt x2apic
>>> C compiler: /usr/bin/cc GNU 4.4.7
>>> C compiler flags: -mavx -Wno-maybe-uninitialized -Wextra
>>> -Wno-missing-field-initializers -Wno-sign-compare -Wpointer-arith -Wall
>>> -Wno-unused -Wunused-value -Wunused-parameter -fomit-frame-pointer
>>> -funroll-all-loops -Wno-array-bounds -O3 -DNDEBUG
>>> C++ compiler: /usr/bin/c++ GNU 4.4.7
>>> C++ compiler flags: -mavx -Wextra -Wno-missing-field-initializers
>>> -Wpointer-arith -Wall -Wno-unused-function -fomit-frame-pointer
>>> -funroll-all-loops -Wno-array-bounds -O3 -DNDEBUG
>>> Boost version: 1.55.0 (internal)
>>> CUDA compiler: /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda
>>> compiler
>>> driver;Copyright (c) 2005-2013 NVIDIA Corporation;Built on
>>> Thu_Mar_13_11:58:58_PDT_2014;Cuda compilation tools, release 6.0, V6.0.1
>>> CUDA compiler
>>>
>>> flags:-gencode;arch=compute_20,code=sm_20;-gencode;arch=compute_20,code=sm_21;-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_35,code=compute_35;-use_fast_math;-Xcompiler;-fPIC
>>> ;
>>>
>>> ;-mavx;-Wextra;-Wno-missing-field-initializers;-Wpointer-arith;-Wall;-Wno-unused-function;-fomit-frame-pointer;-funroll-all-loops;-Wno-array-bounds;-O3;-DNDEBUG
>>> CUDA driver: 6.50
>>> CUDA runtime: 6.0
>>>
>>>
>>>
>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>> B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
>>> GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable
>>> molecular simulation
>>> J. Chem. Theory Comput. 4 (2008) pp. 435-447
>>> -------- -------- --- Thank You --- -------- --------
>>>
>>>
>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>> D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J.
>>> C.
>>> Berendsen
>>> GROMACS: Fast, Flexible and Free
>>> J. Comp. Chem. 26 (2005) pp. 1701-1719
>>> -------- -------- --- Thank You --- -------- --------
>>>
>>>
>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>> E. Lindahl and B. Hess and D. van der Spoel
>>> GROMACS 3.0: A package for molecular simulation and trajectory analysis
>>> J. Mol. Mod. 7 (2001) pp. 306-317
>>> -------- -------- --- Thank You --- -------- --------
>>>
>>>
>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>> H. J. C. Berendsen, D. van der Spoel and R. van Drunen
>>> GROMACS: A message-passing parallel molecular dynamics implementation
>>> Comp. Phys. Comm. 91 (1995) pp. 43-56
>>> -------- -------- --- Thank You --- -------- --------
>>>
>>>
>>> For optimal performance with a GPU nstlist (now 10) should be larger.
>>> The optimum depends on your CPU and GPU resources.
>>> You might want to try several nstlist values.
>>> Changing nstlist from 10 to 40, rlist from 1.2 to 1.285
>>>
>>> Input Parameters:
>>> integrator = md
>>> tinit = 0
>>> dt = 0.002
>>> nsteps = 10000000
>>> init-step = 0
>>> simulation-part = 1
>>> comm-mode = Linear
>>> nstcomm = 1
>>> bd-fric = 0
>>> ld-seed = 1993
>>> emtol = 10
>>> emstep = 0.01
>>> niter = 20
>>> fcstep = 0
>>> nstcgsteep = 1000
>>> nbfgscorr = 10
>>> rtpi = 0.05
>>> nstxout = 2500
>>> nstvout = 2500
>>> nstfout = 0
>>> nstlog = 2500
>>> nstcalcenergy = 1
>>> nstenergy = 2500
>>> nstxout-compressed = 500
>>> compressed-x-precision = 1000
>>> cutoff-scheme = Verlet
>>> nstlist = 40
>>> ns-type = Grid
>>> pbc = xyz
>>> periodic-molecules = FALSE
>>> verlet-buffer-tolerance = 0.005
>>> rlist = 1.285
>>> rlistlong = 1.285
>>> nstcalclr = 10
>>> coulombtype = PME
>>> coulomb-modifier = Potential-shift
>>> rcoulomb-switch = 0
>>> rcoulomb = 1.2
>>> epsilon-r = 1
>>> epsilon-rf = 1
>>> vdw-type = Cut-off
>>> vdw-modifier = Potential-shift
>>> rvdw-switch = 0
>>> rvdw = 1.2
>>> DispCorr = No
>>> table-extension = 1
>>> fourierspacing = 0.135
>>> fourier-nx = 128
>>> fourier-ny = 128
>>> fourier-nz = 128
>>> pme-order = 4
>>> ewald-rtol = 1e-05
>>> ewald-rtol-lj = 0.001
>>> lj-pme-comb-rule = Geometric
>>> ewald-geometry = 0
>>> epsilon-surface = 0
>>> implicit-solvent = No
>>> gb-algorithm = Still
>>> nstgbradii = 1
>>> rgbradii = 2
>>> gb-epsilon-solvent = 80
>>> gb-saltconc = 0
>>> gb-obc-alpha = 1
>>> gb-obc-beta = 0.8
>>> gb-obc-gamma = 4.85
>>> gb-dielectric-offset = 0.009
>>> sa-algorithm = Ace-approximation
>>> sa-surface-tension = 2.092
>>> tcoupl = V-rescale
>>> nsttcouple = 10
>>> nh-chain-length = 0
>>> print-nose-hoover-chain-variables = FALSE
>>> pcoupl = No
>>> pcoupltype = Semiisotropic
>>> nstpcouple = -1
>>> tau-p = 0.5
>>> compressibility (3x3):
>>> compressibility[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>>> compressibility[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>>> compressibility[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>>> ref-p (3x3):
>>> ref-p[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>>> ref-p[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>>> ref-p[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>>> refcoord-scaling = No
>>> posres-com (3):
>>> posres-com[0]= 0.00000e+00
>>> posres-com[1]= 0.00000e+00
>>> posres-com[2]= 0.00000e+00
>>> posres-comB (3):
>>> posres-comB[0]= 0.00000e+00
>>> posres-comB[1]= 0.00000e+00
>>> posres-comB[2]= 0.00000e+00
>>> QMMM = FALSE
>>> QMconstraints = 0
>>> QMMMscheme = 0
>>> MMChargeScaleFactor = 1
>>> qm-opts:
>>> ngQM = 0
>>> constraint-algorithm = Lincs
>>> continuation = FALSE
>>> Shake-SOR = FALSE
>>> shake-tol = 0.0001
>>> lincs-order = 4
>>> lincs-iter = 1
>>> lincs-warnangle = 30
>>> nwall = 0
>>> wall-type = 9-3
>>> wall-r-linpot = -1
>>> wall-atomtype[0] = -1
>>> wall-atomtype[1] = -1
>>> wall-density[0] = 0
>>> wall-density[1] = 0
>>> wall-ewald-zfac = 3
>>> pull = no
>>> rotation = FALSE
>>> interactiveMD = FALSE
>>> disre = No
>>> disre-weighting = Conservative
>>> disre-mixed = FALSE
>>> dr-fc = 1000
>>> dr-tau = 0
>>> nstdisreout = 100
>>> orire-fc = 0
>>> orire-tau = 0
>>> nstorireout = 100
>>> free-energy = no
>>> cos-acceleration = 0
>>> deform (3x3):
>>> deform[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>>> deform[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>>> deform[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>>> simulated-tempering = FALSE
>>> E-x:
>>> n = 0
>>> E-xt:
>>> n = 0
>>> E-y:
>>> n = 0
>>> E-yt:
>>> n = 0
>>> E-z:
>>> n = 0
>>> E-zt:
>>> n = 0
>>> swapcoords = no
>>> adress = FALSE
>>> userint1 = 0
>>> userint2 = 0
>>> userint3 = 0
>>> userint4 = 0
>>> userreal1 = 0
>>> userreal2 = 0
>>> userreal3 = 0
>>> userreal4 = 0
>>> grpopts:
>>> nrdf: 869226
>>> ref-t: 300
>>> tau-t: 0.1
>>> annealing: No
>>> annealing-npoints: 0
>>> acc: 0 0 0
>>> nfreeze: N N N
>>> energygrp-flags[ 0]: 0
>>> Using 1 MPI process
>>> Using 32 OpenMP threads
>>>
>>> Detecting CPU SIMD instructions.
>>> Present hardware specification:
>>> Vendor: GenuineIntel
>>> Brand: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
>>> Family: 6 Model: 62 Stepping: 4
>>> Features: aes apic avx clfsh cmov cx8 cx16 f16c htt lahf_lm mmx msr
>>> nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp sse2 sse3
>>> sse4.1 sse4.2 ssse3 tdt x2apic
>>> SIMD instructions most likely to fit this hardware: AVX_256
>>> SIMD instructions selected at GROMACS compile time: AVX_256
>>>
>>>
>>> 2 GPUs detected on host localhost.localdomain:
>>> #0: NVIDIA Tesla K20c, compute cap.: 3.5, ECC: yes, stat: compatible
>>> #1: NVIDIA GeForce GTX 650, compute cap.: 3.0, ECC: no, stat:
>>> compatible
>>>
>>> 1 GPU auto-selected for this run.
>>> Mapping of GPU to the 1 PP rank in this node: #0
>>>
>>>
>>> NOTE: potentially sub-optimal launch configuration, gmx_mpi started with
>>> less
>>> PP MPI process per node than GPUs available.
>>> Each PP MPI process can use only one GPU, 1 GPU per node will be
>>> used.
>>>
>>> Will do PME sum in reciprocal space for electrostatic interactions.
>>>
>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>> U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L. G.
>>> Pedersen
>>> A smooth particle mesh Ewald method
>>> J. Chem. Phys. 103 (1995) pp. 8577-8592
>>> -------- -------- --- Thank You --- -------- --------
>>>
>>> Will do ordinary reciprocal space Ewald sum.
>>> Using a Gaussian width (1/beta) of 0.384195 nm for Ewald
>>> Cut-off's: NS: 1.285 Coulomb: 1.2 LJ: 1.2
>>> System total charge: -0.012
>>> Generated table with 1142 data points for Ewald.
>>> Tabscale = 500 points/nm
>>> Generated table with 1142 data points for LJ6.
>>> Tabscale = 500 points/nm
>>> Generated table with 1142 data points for LJ12.
>>> Tabscale = 500 points/nm
>>> Generated table with 1142 data points for 1-4 COUL.
>>> Tabscale = 500 points/nm
>>> Generated table with 1142 data points for 1-4 LJ6.
>>> Tabscale = 500 points/nm
>>> Generated table with 1142 data points for 1-4 LJ12.
>>> Tabscale = 500 points/nm
>>>
>>> Using CUDA 8x8 non-bonded kernels
>>>
>>> Potential shift: LJ r^-12: -1.122e-01 r^-6: -3.349e-01, Ewald -1.000e-05
>>> Initialized non-bonded Ewald correction tables, spacing: 7.82e-04 size:
>>> 1536
>>>
>>> Removing pbc first time
>>> Pinning threads with an auto-selected logical core stride of 1
>>>
>>> Initializing LINear Constraint Solver
>>>
>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>> B. Hess and H. Bekker and H. J. C. Berendsen and J. G. E. M. Fraaije
>>> LINCS: A Linear Constraint Solver for molecular simulations
>>> J. Comp. Chem. 18 (1997) pp. 1463-1472
>>> -------- -------- --- Thank You --- -------- --------
>>>
>>> The number of constraints is 5913
>>>
>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>> S. Miyamoto and P. A. Kollman
>>> SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for
>>> Rigid
>>> Water Models
>>> J. Comp. Chem. 13 (1992) pp. 952-962
>>> -------- -------- --- Thank You --- -------- --------
>>>
>>> Center of mass motion removal mode is Linear
>>> We have the following groups for center of mass motion removal:
>>> 0: rest
>>>
>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>> G. Bussi, D. Donadio and M. Parrinello
>>> Canonical sampling through velocity rescaling
>>> J. Chem. Phys. 126 (2007) pp. 014101
>>> -------- -------- --- Thank You --- -------- --------
>>>
>>> There are: 434658 Atoms
>>>
>>> Constraining the starting coordinates (step 0)
>>>
>>> Constraining the coordinates at t0-dt (step 0)
>>> RMS relative constraint deviation after constraining: 3.67e-05
>>> Initial temperature: 300.5 K
>>>
>>> Started mdrun on rank 0 Mon Dec 22 16:28:01 2014
>>> Step Time Lambda
>>> 0 0.00000 0.00000
>>>
>>> Energies (kJ/mol)
>>> G96Angle Proper Dih. Improper Dih. LJ-14
>>> Coulomb-14
>>> 9.74139e+03 4.34956e+03 2.97359e+03 -1.93107e+02
>>> 8.05534e+04
>>> LJ (SR) Coulomb (SR) Coul. recip. Potential Kinetic
>>> En.
>>> 1.01340e+06 -7.13271e+06 2.01361e+04 -6.00175e+06
>>> 1.09887e+06
>>> Total Energy Conserved En. Temperature Pressure (bar) Constr.
>>> rmsd
>>> -4.90288e+06 -4.90288e+06 3.04092e+02 1.70897e+02
>>> 2.16683e-05
>>>
>>> step 80: timed with pme grid 128 128 128, coulomb cutoff 1.200: 6279.0
>>> M-cycles
>>> step 160: timed with pme grid 112 112 112, coulomb cutoff 1.306: 6962.2
>>> M-cycles
>>> step 240: timed with pme grid 100 100 100, coulomb cutoff 1.463: 8406.5
>>> M-cycles
>>> step 320: timed with pme grid 128 128 128, coulomb cutoff 1.200: 6424.0
>>> M-cycles
>>> step 400: timed with pme grid 120 120 120, coulomb cutoff 1.219: 6369.1
>>> M-cycles
>>> step 480: timed with pme grid 112 112 112, coulomb cutoff 1.306: 7309.0
>>> M-cycles
>>> step 560: timed with pme grid 108 108 108, coulomb cutoff 1.355: 7521.2
>>> M-cycles
>>> step 640: timed with pme grid 104 104 104, coulomb cutoff 1.407: 8369.8
>>> M-cycles
>>> optimal pme grid 128 128 128, coulomb cutoff 1.200
>>> Step Time Lambda
>>> 2500 5.00000 0.00000
>>>
>>> Energies (kJ/mol)
>>> G96Angle Proper Dih. Improper Dih. LJ-14
>>> Coulomb-14
>>> 9.72545e+03 4.33046e+03 2.98087e+03 -1.95794e+02
>>> 8.05967e+04
>>> LJ (SR) Coulomb (SR) Coul. recip. Potential Kinetic
>>> En.
>>> 1.01293e+06 -7.13110e+06 2.01689e+04 -6.00057e+06
>>> 1.08489e+06
>>> Total Energy Conserved En. Temperature Pressure (bar) Constr.
>>> rmsd
>>> -4.91567e+06 -4.90300e+06 3.00225e+02 1.36173e+02
>>> 2.25998e-05
>>>
>>> Step Time Lambda
>>> 5000 10.00000 0.00000
>>>
>>> ............
>>>
>>>
>>> -------------------------------------------------------------------------------
>>>
>>>
>>> Thank you in advance
>>>
>>> --
>>> Carmen Di Giovanni, PhD
>>> Dept. of Pharmaceutical and Toxicological Chemistry
>>> "Drug Discovery Lab"
>>> University of Naples "Federico II"
>>> Via D. Montesano, 49
>>> 80131 Naples
>>> Tel.: ++39 081 678623
>>> Fax: ++39 081 678100
>>> Email: cdigiova at unina.it
>>>
>>>
>>>
>>> Quoting Justin Lemkul <jalemkul at vt.edu>:
>>>
>>>>
>>>>
>>>> On 2/18/15 11:09 AM, Barnett, James W wrote:
>>>>>
>>>>>
>>>>> What's your exact command?
>>>>>
>>>>
>>>> A full .log file would be even better; it would tell us everything we
>>>> need
>>>> to know :)
>>>>
>>>> -Justin
>>>>
>>>>> Have you reviewed this page:
>>>>> http://www.gromacs.org/Documentation/Acceleration_and_parallelization
>>>>>
>>>>> James "Wes" Barnett
>>>>> Ph.D. Candidate
>>>>> Chemical and Biomolecular Engineering
>>>>>
>>>>> Tulane University
>>>>> Boggs Center for Energy and Biotechnology, Room 341-B
>>>>>
>>>>> ________________________________________
>>>>> From: gromacs.org_gmx-users-bounces at maillist.sys.kth.se
>>>>> <gromacs.org_gmx-users-bounces at maillist.sys.kth.se> on behalf of Carmen
>>>>> Di
>>>>> Giovanni <cdigiova at unina.it>
>>>>> Sent: Wednesday, February 18, 2015 10:06 AM
>>>>> To: gromacs.org_gmx-users at maillist.sys.kth.se
>>>>> Subject: Re: [gmx-users] GPU low performance
>>>>>
>>>>> I post the message of a md run :
>>>>>
>>>>>
>>>>> Force evaluation time GPU/CPU: 40.974 ms/24.437 ms = 1.677
>>>>> For optimal performance this ratio should be close to 1!
>>>>>
>>>>>
>>>>> NOTE: The GPU has >20% more load than the CPU. This imbalance causes
>>>>> performance loss, consider using a shorter cut-off and a finer
>>>>> PME
>>>>> grid.
>>>>>
>>>>> As can I solved this problem ?
>>>>> Thank you in advance
>>>>>
>>>>>
>>>>> --
>>>>> Carmen Di Giovanni, PhD
>>>>> Dept. of Pharmaceutical and Toxicological Chemistry
>>>>> "Drug Discovery Lab"
>>>>> University of Naples "Federico II"
>>>>> Via D. Montesano, 49
>>>>> 80131 Naples
>>>>> Tel.: ++39 081 678623
>>>>> Fax: ++39 081 678100
>>>>> Email: cdigiova at unina.it
>>>>>
>>>>>
>>>>>
>>>>> Quoting Justin Lemkul <jalemkul at vt.edu>:
>>>>>
>>>>>>
>>>>>>
>>>>>> On 2/18/15 10:30 AM, Carmen Di Giovanni wrote:
>>>>>>>
>>>>>>>
>>>>>>> Daear all,
>>>>>>> I'm working on a machine with an INVIDIA Teska K20.
>>>>>>> After a minimization on a protein of 1925 atoms this is the mesage:
>>>>>>>
>>>>>>> Force evaluation time GPU/CPU: 2.923 ms/116.774 ms = 0.025
>>>>>>> For optimal performance this ratio should be close to 1!
>>>>>>>
>>>>>>
>>>>>> Minimization is a poor indicator of performance. Do a real MD run.
>>>>>>
>>>>>>>
>>>>>>> NOTE: The GPU has >25% less load than the CPU. This imbalance causes
>>>>>>> performance loss.
>>>>>>>
>>>>>>> Core t (s) Wall t (s) (%)
>>>>>>> Time: 3289.010 205.891 1597.4
>>>>>>> (steps/hour)
>>>>>>> Performance: 8480.2
>>>>>>> Finished mdrun on rank 0 Wed Feb 18 15:50:06 2015
>>>>>>>
>>>>>>>
>>>>>>> Cai I improve the performance?
>>>>>>> At the moment in the forum I didn't full informations to solve this
>>>>>>> problem.
>>>>>>> In attachment there is the log. file
>>>>>>>
>>>>>>
>>>>>> The list does not accept attachments. If you wish to share a file,
>>>>>> upload it to a file-sharing service and provide a URL. The full
>>>>>> .log is quite important for understanding your hardware,
>>>>>> optimizations, and seeing full details of the performance breakdown.
>>>>>> But again, base your assessment on MD, not EM.
>>>>>>
>>>>>> -Justin
>>>>>>
>>>>>> --
>>>>>> ==================================================
>>>>>>
>>>>>> Justin A. Lemkul, Ph.D.
>>>>>> Ruth L. Kirschstein NRSA Postdoctoral Fellow
>>>>>>
>>>>>> Department of Pharmaceutical Sciences
>>>>>> School of Pharmacy
>>>>>> Health Sciences Facility II, Room 629
>>>>>> University of Maryland, Baltimore
>>>>>> 20 Penn St.
>>>>>> Baltimore, MD 21201
>>>>>>
>>>>>> jalemkul at outerbanks.umaryland.edu | (410) 706-7441
>>>>>> http://mackerell.umaryland.edu/~jalemkul
>>>>>>
>>>>>> ==================================================
>>>>>> --
>>>>>> Gromacs Users mailing list
>>>>>>
>>>>>> * Please search the archive at
>>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>>>> posting!
>>>>>>
>>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>>
>>>>>> * For (un)subscribe requests visit
>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>>>>>> or send a mail to gmx-users-request at gromacs.org.
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Gromacs Users mailing list
>>>>>
>>>>> * Please search the archive at
>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>>> posting!
>>>>>
>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>
>>>>> * For (un)subscribe requests visit
>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>>>> send a mail to gmx-users-request at gromacs.org.
>>>>>
>>>>
>>>> --
>>>> ==================================================
>>>>
>>>> Justin A. Lemkul, Ph.D.
>>>> Ruth L. Kirschstein NRSA Postdoctoral Fellow
>>>>
>>>> Department of Pharmaceutical Sciences
>>>> School of Pharmacy
>>>> Health Sciences Facility II, Room 629
>>>> University of Maryland, Baltimore
>>>> 20 Penn St.
>>>> Baltimore, MD 21201
>>>>
>>>> jalemkul at outerbanks.umaryland.edu | (410) 706-7441
>>>> http://mackerell.umaryland.edu/~jalemkul
>>>>
>>>> ==================================================
>>>> --
>>>> Gromacs Users mailing list
>>>>
>>>> * Please search the archive at
>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>> posting!
>>>>
>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>
>>>> * For (un)subscribe requests visit
>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>>> send
>>>> a mail to gmx-users-request at gromacs.org.
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Gromacs Users mailing list
>>>
>>> * Please search the archive at
>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>> posting!
>>>
>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>
>>> * For (un)subscribe requests visit
>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>> send a
>>> mail to gmx-users-request at gromacs.org.
>>
>> --
>> Gromacs Users mailing list
>>
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send
>> a mail to gmx-users-request at gromacs.org.
>>
>
>
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a
> mail to gmx-users-request at gromacs.org.
More information about the gromacs.org_gmx-users
mailing list