[gmx-users] GPU low performance
Szilárd Páll
pall.szilard at gmail.com
Thu Feb 19 15:49:01 CET 2015
On Thu, Feb 19, 2015 at 11:32 AM, Carmen Di Giovanni <cdigiova at unina.it> wrote:
> Dear Szilárd,
>
> 1) the output of command nvidia-smi -ac 2600,758 is
>
> [root at localhost test_gpu]# nvidia-smi -ac 2600,758
> Applications clocks set to "(MEM 2600, SM 758)" for GPU 0000:03:00.0
>
> Warning: persistence mode is disabled on this device. This settings will go
> back to default as soon as driver unloads (e.g. last application like
> nvidia-smi or cuda application terminates). Run with [--help | -h] switch to
> get more information on how to enable persistence mode.
run nvidia-smi -pm 1 if you want to avoid that.
> Setting applications clocks is not supported for GPU 0000:82:00.0.
> Treating as warning and moving on.
> All done.
> ----------------------------------------------------------------------------
> 2) I decreased nlists to 20
> However when I do the command:
> gmx_mpi mdrun -deffnm nvt -ntmpi 8 -gpu_id 00001111
> give me a fatal error:
>
> GROMACS: gmx mdrun, VERSION 5.0
> Executable: /opt/SW/gromacs-5.0/build/mpi-cuda/bin/gmx_mpi
> Library dir: /opt/SW/gromacs-5.0/share/top
> Command line:
> gmx_mpi mdrun -deffnm nvt -ntmpi 8 -gpu_id 00001111
>
>
> Back Off! I just backed up nvt.log to ./#nvt.log.8#
> Reading file nvt.tpr, VERSION 5.0 (single precision)
> Changing nstlist from 10 to 40, rlist from 1 to 1.097
>
>
> -------------------------------------------------------
> Program gmx_mpi, VERSION 5.0
> Source code file: /opt/SW/gromacs-5.0/src/programs/mdrun/runner.c, line: 876
>
> Fatal error:
> Setting the number of thread-MPI threads is only supported with thread-MPI
> and Gromacs was compiled without thread-MPI
> For more information and tips for troubleshooting, please check the GROMACS
> website at http://www.gromacs.org/Documentation/Errors
> -------------------------------------------------------
The error is quite clearly explains that you're trying to use mdrun's
built-in thread-MPI parallelization, but you have a binary that does
not support it. Use the MPI launching syntax instead.
> Halting program gmx_mpi
>
> gcq#223: "Jesus Not Only Saves, He Also Frequently Makes Backups." (Myron
> Bradshaw)
>
> --------------------------------------------------------------------------
> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
> with errorcode -1.
>
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> -------------------------------------------------------------------------
>
>
> 4) I don't understand as I can reduce the "Rest" time
Have you looked at the the performance table at the end of the log?
You are wasting a large amount of runtime calculating energies every
step and this overhead comes in multiple places in the code - one of
them being the non-timed code parts which typically take <3%.
Cheers,
--
Szilard
>
> Carmen
>
>
>
> --
> Carmen Di Giovanni, PhD
> Dept. of Pharmaceutical and Toxicological Chemistry
> "Drug Discovery Lab"
> University of Naples "Federico II"
> Via D. Montesano, 49
> 80131 Naples
> Tel.: ++39 081 678623
> Fax: ++39 081 678100
> Email: cdigiova at unina.it
>
>
>
> Quoting Szilárd Páll <pall.szilard at gmail.com>:
>
>> Please keep the mails on the list.
>>
>> On Wed, Feb 18, 2015 at 6:32 PM, Carmen Di Giovanni <cdigiova at unina.it>
>> wrote:
>>>
>>> nvidia-smi -q -g 0
>>>
>>> ==============NVSMI LOG==============
>>>
>>> Timestamp : Wed Feb 18 18:30:01 2015
>>> Driver Version : 340.24
>>>
>>> Attached GPUs : 2
>>> GPU 0000:03:00.0
>>> Product Name : Tesla K20c
>>
>> [...
>>>
>>> Clocks
>>> Graphics : 705 MHz
>>> SM : 705 MHz
>>> Memory : 2600 MHz
>>> Applications Clocks
>>> Graphics : 705 MHz
>>> Memory : 2600 MHz
>>> Default Applications Clocks
>>> Graphics : 705 MHz
>>> Memory : 2600 MHz
>>> Max Clocks
>>> Graphics : 758 MHz
>>> SM : 758 MHz
>>> Memory : 2600 MHz
>>
>>
>> This is the relevant part I was looking for. The Tesla K20c supports
>> setting a so-called application clock which is essentially means that
>> you can bump its clock frequency using the NVDIA management tool
>> nvidia-smi from the default 705 MHz to 758 MHz.
>>
>> Use the command:
>> nvidia-smi -ac 2600,758
>>
>> This should give you another 7% or so (I didn't remember the correct
>> max clock before, that's why I guessing 5%).
>>
>> Cheers,
>> Szilard
>>
>>> Clock Policy
>>> Auto Boost : N/A
>>> Auto Boost Default : N/A
>>> Compute Processes
>>> Process ID : 19441
>>> Name : gmx_mpi
>>> Used GPU Memory : 110 MiB
>>>
>>> [carmendigi at localhost test_gpu]$
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Carmen Di Giovanni, PhD
>>> Dept. of Pharmaceutical and Toxicological Chemistry
>>> "Drug Discovery Lab"
>>> University of Naples "Federico II"
>>> Via D. Montesano, 49
>>> 80131 Naples
>>> Tel.: ++39 081 678623
>>> Fax: ++39 081 678100
>>> Email: cdigiova at unina.it
>>>
>>>
>>>
>>> Quoting Szilárd Páll <pall.szilard at gmail.com>:
>>>
>>>> As I suggested above please use pastebin.com or similar!
>>>> --
>>>> Szilárd
>>>>
>>>>
>>>> On Wed, Feb 18, 2015 at 6:09 PM, Carmen Di Giovanni <cdigiova at unina.it>
>>>> wrote:
>>>>>
>>>>>
>>>>> Dear Szilàrd, it's not possible attach the full log file in the forum
>>>>> mail
>>>>> because it is too big.
>>>>> I send it by your private mail address.
>>>>> Thank you in advance
>>>>> Carmen
>>>>>
>>>>>
>>>>> --
>>>>> Carmen Di Giovanni, PhD
>>>>> Dept. of Pharmaceutical and Toxicological Chemistry
>>>>> "Drug Discovery Lab"
>>>>> University of Naples "Federico II"
>>>>> Via D. Montesano, 49
>>>>> 80131 Naples
>>>>> Tel.: ++39 081 678623
>>>>> Fax: ++39 081 678100
>>>>> Email: cdigiova at unina.it
>>>>>
>>>>>
>>>>>
>>>>> Quoting Szilárd Páll <pall.szilard at gmail.com>:
>>>>>
>>>>>> We need a *full* log file, not parts of it!
>>>>>>
>>>>>> You can try running with "-ntomp 16 -pin on" - it may be a bit faster
>>>>>> not not use HyperThreading.
>>>>>> --
>>>>>> Szilárd
>>>>>>
>>>>>>
>>>>>> On Wed, Feb 18, 2015 at 5:20 PM, Carmen Di Giovanni
>>>>>> <cdigiova at unina.it>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Justin,
>>>>>>> the problem is evident for all calculations.
>>>>>>> This is the log file of a recent run:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --------------------------------------------------------------------------------
>>>>>>>
>>>>>>> Log file opened on Mon Dec 22 16:28:00 2014
>>>>>>> Host: localhost.localdomain pid: 8378 rank ID: 0 number of ranks:
>>>>>>> 1
>>>>>>> GROMACS: gmx mdrun, VERSION 5.0
>>>>>>>
>>>>>>> GROMACS is written by:
>>>>>>> Emile Apol Rossen Apostolov Herman J.C. Berendsen Par
>>>>>>> Bjelkmar
>>>>>>> Aldert van Buuren Rudi van Drunen Anton Feenstra Sebastian
>>>>>>> Fritsch
>>>>>>> Gerrit Groenhof Christoph Junghans Peter Kasson Carsten
>>>>>>> Kutzner
>>>>>>> Per Larsson Justin A. Lemkul Magnus Lundborg Pieter
>>>>>>> Meulenhoff
>>>>>>> Erik Marklund Teemu Murtola Szilard Pall Sander Pronk
>>>>>>> Roland Schulz Alexey Shvetsov Michael Shirts Alfons
>>>>>>> Sijbers
>>>>>>> Peter Tieleman Christian Wennberg Maarten Wolf
>>>>>>> and the project leaders:
>>>>>>> Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel
>>>>>>>
>>>>>>> Copyright (c) 1991-2000, University of Groningen, The Netherlands.
>>>>>>> Copyright (c) 2001-2014, The GROMACS development team at
>>>>>>> Uppsala University, Stockholm University and
>>>>>>> the Royal Institute of Technology, Sweden.
>>>>>>> check out http://www.gromacs.org for more information.
>>>>>>>
>>>>>>> GROMACS is free software; you can redistribute it and/or modify it
>>>>>>> under the terms of the GNU Lesser General Public License
>>>>>>> as published by the Free Software Foundation; either version 2.1
>>>>>>> of the License, or (at your option) any later version.
>>>>>>>
>>>>>>> GROMACS: gmx mdrun, VERSION 5.0
>>>>>>> Executable: /opt/SW/gromacs-5.0/build/mpi-cuda/bin/gmx_mpi
>>>>>>> Library dir: /opt/SW/gromacs-5.0/share/top
>>>>>>> Command line:
>>>>>>> gmx_mpi mdrun -deffnm prod_20ns
>>>>>>>
>>>>>>> Gromacs version: VERSION 5.0
>>>>>>> Precision: single
>>>>>>> Memory model: 64 bit
>>>>>>> MPI library: MPI
>>>>>>> OpenMP support: enabled
>>>>>>> GPU support: enabled
>>>>>>> invsqrt routine: gmx_software_invsqrt(x)
>>>>>>> SIMD instructions: AVX_256
>>>>>>> FFT library: fftw-3.3.3-sse2
>>>>>>> RDTSCP usage: enabled
>>>>>>> C++11 compilation: disabled
>>>>>>> TNG support: enabled
>>>>>>> Tracing support: disabled
>>>>>>> Built on: Thu Jul 31 18:30:37 CEST 2014
>>>>>>> Built by: root at localhost.localdomain [CMAKE]
>>>>>>> Build OS/arch: Linux 2.6.32-431.el6.x86_64 x86_64
>>>>>>> Build CPU vendor: GenuineIntel
>>>>>>> Build CPU brand: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
>>>>>>> Build CPU family: 6 Model: 62 Stepping: 4
>>>>>>> Build CPU features: aes apic avx clfsh cmov cx8 cx16 f16c htt lahf_lm
>>>>>>> mmx
>>>>>>> msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp
>>>>>>> sse2
>>>>>>> sse3
>>>>>>> sse4.1 sse4.2 ssse3 tdt x2apic
>>>>>>> C compiler: /usr/bin/cc GNU 4.4.7
>>>>>>> C compiler flags: -mavx -Wno-maybe-uninitialized -Wextra
>>>>>>> -Wno-missing-field-initializers -Wno-sign-compare -Wpointer-arith
>>>>>>> -Wall
>>>>>>> -Wno-unused -Wunused-value -Wunused-parameter -fomit-frame-pointer
>>>>>>> -funroll-all-loops -Wno-array-bounds -O3 -DNDEBUG
>>>>>>> C++ compiler: /usr/bin/c++ GNU 4.4.7
>>>>>>> C++ compiler flags: -mavx -Wextra -Wno-missing-field-initializers
>>>>>>> -Wpointer-arith -Wall -Wno-unused-function -fomit-frame-pointer
>>>>>>> -funroll-all-loops -Wno-array-bounds -O3 -DNDEBUG
>>>>>>> Boost version: 1.55.0 (internal)
>>>>>>> CUDA compiler: /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda
>>>>>>> compiler
>>>>>>> driver;Copyright (c) 2005-2013 NVIDIA Corporation;Built on
>>>>>>> Thu_Mar_13_11:58:58_PDT_2014;Cuda compilation tools, release 6.0,
>>>>>>> V6.0.1
>>>>>>> CUDA compiler
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> flags:-gencode;arch=compute_20,code=sm_20;-gencode;arch=compute_20,code=sm_21;-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_35,code=compute_35;-use_fast_math;-Xcompiler;-fPIC
>>>>>>> ;
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ;-mavx;-Wextra;-Wno-missing-field-initializers;-Wpointer-arith;-Wall;-Wno-unused-function;-fomit-frame-pointer;-funroll-all-loops;-Wno-array-bounds;-O3;-DNDEBUG
>>>>>>> CUDA driver: 6.50
>>>>>>> CUDA runtime: 6.0
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>>>>>> B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
>>>>>>> GROMACS 4: Algorithms for highly efficient, load-balanced, and
>>>>>>> scalable
>>>>>>> molecular simulation
>>>>>>> J. Chem. Theory Comput. 4 (2008) pp. 435-447
>>>>>>> -------- -------- --- Thank You --- -------- --------
>>>>>>>
>>>>>>>
>>>>>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>>>>>> D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H.
>>>>>>> J.
>>>>>>> C.
>>>>>>> Berendsen
>>>>>>> GROMACS: Fast, Flexible and Free
>>>>>>> J. Comp. Chem. 26 (2005) pp. 1701-1719
>>>>>>> -------- -------- --- Thank You --- -------- --------
>>>>>>>
>>>>>>>
>>>>>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>>>>>> E. Lindahl and B. Hess and D. van der Spoel
>>>>>>> GROMACS 3.0: A package for molecular simulation and trajectory
>>>>>>> analysis
>>>>>>> J. Mol. Mod. 7 (2001) pp. 306-317
>>>>>>> -------- -------- --- Thank You --- -------- --------
>>>>>>>
>>>>>>>
>>>>>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>>>>>> H. J. C. Berendsen, D. van der Spoel and R. van Drunen
>>>>>>> GROMACS: A message-passing parallel molecular dynamics implementation
>>>>>>> Comp. Phys. Comm. 91 (1995) pp. 43-56
>>>>>>> -------- -------- --- Thank You --- -------- --------
>>>>>>>
>>>>>>>
>>>>>>> For optimal performance with a GPU nstlist (now 10) should be larger.
>>>>>>> The optimum depends on your CPU and GPU resources.
>>>>>>> You might want to try several nstlist values.
>>>>>>> Changing nstlist from 10 to 40, rlist from 1.2 to 1.285
>>>>>>>
>>>>>>> Input Parameters:
>>>>>>> integrator = md
>>>>>>> tinit = 0
>>>>>>> dt = 0.002
>>>>>>> nsteps = 10000000
>>>>>>> init-step = 0
>>>>>>> simulation-part = 1
>>>>>>> comm-mode = Linear
>>>>>>> nstcomm = 1
>>>>>>> bd-fric = 0
>>>>>>> ld-seed = 1993
>>>>>>> emtol = 10
>>>>>>> emstep = 0.01
>>>>>>> niter = 20
>>>>>>> fcstep = 0
>>>>>>> nstcgsteep = 1000
>>>>>>> nbfgscorr = 10
>>>>>>> rtpi = 0.05
>>>>>>> nstxout = 2500
>>>>>>> nstvout = 2500
>>>>>>> nstfout = 0
>>>>>>> nstlog = 2500
>>>>>>> nstcalcenergy = 1
>>>>>>> nstenergy = 2500
>>>>>>> nstxout-compressed = 500
>>>>>>> compressed-x-precision = 1000
>>>>>>> cutoff-scheme = Verlet
>>>>>>> nstlist = 40
>>>>>>> ns-type = Grid
>>>>>>> pbc = xyz
>>>>>>> periodic-molecules = FALSE
>>>>>>> verlet-buffer-tolerance = 0.005
>>>>>>> rlist = 1.285
>>>>>>> rlistlong = 1.285
>>>>>>> nstcalclr = 10
>>>>>>> coulombtype = PME
>>>>>>> coulomb-modifier = Potential-shift
>>>>>>> rcoulomb-switch = 0
>>>>>>> rcoulomb = 1.2
>>>>>>> epsilon-r = 1
>>>>>>> epsilon-rf = 1
>>>>>>> vdw-type = Cut-off
>>>>>>> vdw-modifier = Potential-shift
>>>>>>> rvdw-switch = 0
>>>>>>> rvdw = 1.2
>>>>>>> DispCorr = No
>>>>>>> table-extension = 1
>>>>>>> fourierspacing = 0.135
>>>>>>> fourier-nx = 128
>>>>>>> fourier-ny = 128
>>>>>>> fourier-nz = 128
>>>>>>> pme-order = 4
>>>>>>> ewald-rtol = 1e-05
>>>>>>> ewald-rtol-lj = 0.001
>>>>>>> lj-pme-comb-rule = Geometric
>>>>>>> ewald-geometry = 0
>>>>>>> epsilon-surface = 0
>>>>>>> implicit-solvent = No
>>>>>>> gb-algorithm = Still
>>>>>>> nstgbradii = 1
>>>>>>> rgbradii = 2
>>>>>>> gb-epsilon-solvent = 80
>>>>>>> gb-saltconc = 0
>>>>>>> gb-obc-alpha = 1
>>>>>>> gb-obc-beta = 0.8
>>>>>>> gb-obc-gamma = 4.85
>>>>>>> gb-dielectric-offset = 0.009
>>>>>>> sa-algorithm = Ace-approximation
>>>>>>> sa-surface-tension = 2.092
>>>>>>> tcoupl = V-rescale
>>>>>>> nsttcouple = 10
>>>>>>> nh-chain-length = 0
>>>>>>> print-nose-hoover-chain-variables = FALSE
>>>>>>> pcoupl = No
>>>>>>> pcoupltype = Semiisotropic
>>>>>>> nstpcouple = -1
>>>>>>> tau-p = 0.5
>>>>>>> compressibility (3x3):
>>>>>>> compressibility[ 0]={ 0.00000e+00, 0.00000e+00,
>>>>>>> 0.00000e+00}
>>>>>>> compressibility[ 1]={ 0.00000e+00, 0.00000e+00,
>>>>>>> 0.00000e+00}
>>>>>>> compressibility[ 2]={ 0.00000e+00, 0.00000e+00,
>>>>>>> 0.00000e+00}
>>>>>>> ref-p (3x3):
>>>>>>> ref-p[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>>>>>>> ref-p[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>>>>>>> ref-p[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>>>>>>> refcoord-scaling = No
>>>>>>> posres-com (3):
>>>>>>> posres-com[0]= 0.00000e+00
>>>>>>> posres-com[1]= 0.00000e+00
>>>>>>> posres-com[2]= 0.00000e+00
>>>>>>> posres-comB (3):
>>>>>>> posres-comB[0]= 0.00000e+00
>>>>>>> posres-comB[1]= 0.00000e+00
>>>>>>> posres-comB[2]= 0.00000e+00
>>>>>>> QMMM = FALSE
>>>>>>> QMconstraints = 0
>>>>>>> QMMMscheme = 0
>>>>>>> MMChargeScaleFactor = 1
>>>>>>> qm-opts:
>>>>>>> ngQM = 0
>>>>>>> constraint-algorithm = Lincs
>>>>>>> continuation = FALSE
>>>>>>> Shake-SOR = FALSE
>>>>>>> shake-tol = 0.0001
>>>>>>> lincs-order = 4
>>>>>>> lincs-iter = 1
>>>>>>> lincs-warnangle = 30
>>>>>>> nwall = 0
>>>>>>> wall-type = 9-3
>>>>>>> wall-r-linpot = -1
>>>>>>> wall-atomtype[0] = -1
>>>>>>> wall-atomtype[1] = -1
>>>>>>> wall-density[0] = 0
>>>>>>> wall-density[1] = 0
>>>>>>> wall-ewald-zfac = 3
>>>>>>> pull = no
>>>>>>> rotation = FALSE
>>>>>>> interactiveMD = FALSE
>>>>>>> disre = No
>>>>>>> disre-weighting = Conservative
>>>>>>> disre-mixed = FALSE
>>>>>>> dr-fc = 1000
>>>>>>> dr-tau = 0
>>>>>>> nstdisreout = 100
>>>>>>> orire-fc = 0
>>>>>>> orire-tau = 0
>>>>>>> nstorireout = 100
>>>>>>> free-energy = no
>>>>>>> cos-acceleration = 0
>>>>>>> deform (3x3):
>>>>>>> deform[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>>>>>>> deform[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>>>>>>> deform[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>>>>>>> simulated-tempering = FALSE
>>>>>>> E-x:
>>>>>>> n = 0
>>>>>>> E-xt:
>>>>>>> n = 0
>>>>>>> E-y:
>>>>>>> n = 0
>>>>>>> E-yt:
>>>>>>> n = 0
>>>>>>> E-z:
>>>>>>> n = 0
>>>>>>> E-zt:
>>>>>>> n = 0
>>>>>>> swapcoords = no
>>>>>>> adress = FALSE
>>>>>>> userint1 = 0
>>>>>>> userint2 = 0
>>>>>>> userint3 = 0
>>>>>>> userint4 = 0
>>>>>>> userreal1 = 0
>>>>>>> userreal2 = 0
>>>>>>> userreal3 = 0
>>>>>>> userreal4 = 0
>>>>>>> grpopts:
>>>>>>> nrdf: 869226
>>>>>>> ref-t: 300
>>>>>>> tau-t: 0.1
>>>>>>> annealing: No
>>>>>>> annealing-npoints: 0
>>>>>>> acc: 0 0 0
>>>>>>> nfreeze: N N N
>>>>>>> energygrp-flags[ 0]: 0
>>>>>>> Using 1 MPI process
>>>>>>> Using 32 OpenMP threads
>>>>>>>
>>>>>>> Detecting CPU SIMD instructions.
>>>>>>> Present hardware specification:
>>>>>>> Vendor: GenuineIntel
>>>>>>> Brand: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
>>>>>>> Family: 6 Model: 62 Stepping: 4
>>>>>>> Features: aes apic avx clfsh cmov cx8 cx16 f16c htt lahf_lm mmx msr
>>>>>>> nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp sse2
>>>>>>> sse3
>>>>>>> sse4.1 sse4.2 ssse3 tdt x2apic
>>>>>>> SIMD instructions most likely to fit this hardware: AVX_256
>>>>>>> SIMD instructions selected at GROMACS compile time: AVX_256
>>>>>>>
>>>>>>>
>>>>>>> 2 GPUs detected on host localhost.localdomain:
>>>>>>> #0: NVIDIA Tesla K20c, compute cap.: 3.5, ECC: yes, stat:
>>>>>>> compatible
>>>>>>> #1: NVIDIA GeForce GTX 650, compute cap.: 3.0, ECC: no, stat:
>>>>>>> compatible
>>>>>>>
>>>>>>> 1 GPU auto-selected for this run.
>>>>>>> Mapping of GPU to the 1 PP rank in this node: #0
>>>>>>>
>>>>>>>
>>>>>>> NOTE: potentially sub-optimal launch configuration, gmx_mpi started
>>>>>>> with
>>>>>>> less
>>>>>>> PP MPI process per node than GPUs available.
>>>>>>> Each PP MPI process can use only one GPU, 1 GPU per node will
>>>>>>> be
>>>>>>> used.
>>>>>>>
>>>>>>> Will do PME sum in reciprocal space for electrostatic interactions.
>>>>>>>
>>>>>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>>>>>> U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L. G.
>>>>>>> Pedersen
>>>>>>> A smooth particle mesh Ewald method
>>>>>>> J. Chem. Phys. 103 (1995) pp. 8577-8592
>>>>>>> -------- -------- --- Thank You --- -------- --------
>>>>>>>
>>>>>>> Will do ordinary reciprocal space Ewald sum.
>>>>>>> Using a Gaussian width (1/beta) of 0.384195 nm for Ewald
>>>>>>> Cut-off's: NS: 1.285 Coulomb: 1.2 LJ: 1.2
>>>>>>> System total charge: -0.012
>>>>>>> Generated table with 1142 data points for Ewald.
>>>>>>> Tabscale = 500 points/nm
>>>>>>> Generated table with 1142 data points for LJ6.
>>>>>>> Tabscale = 500 points/nm
>>>>>>> Generated table with 1142 data points for LJ12.
>>>>>>> Tabscale = 500 points/nm
>>>>>>> Generated table with 1142 data points for 1-4 COUL.
>>>>>>> Tabscale = 500 points/nm
>>>>>>> Generated table with 1142 data points for 1-4 LJ6.
>>>>>>> Tabscale = 500 points/nm
>>>>>>> Generated table with 1142 data points for 1-4 LJ12.
>>>>>>> Tabscale = 500 points/nm
>>>>>>>
>>>>>>> Using CUDA 8x8 non-bonded kernels
>>>>>>>
>>>>>>> Potential shift: LJ r^-12: -1.122e-01 r^-6: -3.349e-01, Ewald
>>>>>>> -1.000e-05
>>>>>>> Initialized non-bonded Ewald correction tables, spacing: 7.82e-04
>>>>>>> size:
>>>>>>> 1536
>>>>>>>
>>>>>>> Removing pbc first time
>>>>>>> Pinning threads with an auto-selected logical core stride of 1
>>>>>>>
>>>>>>> Initializing LINear Constraint Solver
>>>>>>>
>>>>>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>>>>>> B. Hess and H. Bekker and H. J. C. Berendsen and J. G. E. M. Fraaije
>>>>>>> LINCS: A Linear Constraint Solver for molecular simulations
>>>>>>> J. Comp. Chem. 18 (1997) pp. 1463-1472
>>>>>>> -------- -------- --- Thank You --- -------- --------
>>>>>>>
>>>>>>> The number of constraints is 5913
>>>>>>>
>>>>>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>>>>>> S. Miyamoto and P. A. Kollman
>>>>>>> SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for
>>>>>>> Rigid
>>>>>>> Water Models
>>>>>>> J. Comp. Chem. 13 (1992) pp. 952-962
>>>>>>> -------- -------- --- Thank You --- -------- --------
>>>>>>>
>>>>>>> Center of mass motion removal mode is Linear
>>>>>>> We have the following groups for center of mass motion removal:
>>>>>>> 0: rest
>>>>>>>
>>>>>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>>>>>> G. Bussi, D. Donadio and M. Parrinello
>>>>>>> Canonical sampling through velocity rescaling
>>>>>>> J. Chem. Phys. 126 (2007) pp. 014101
>>>>>>> -------- -------- --- Thank You --- -------- --------
>>>>>>>
>>>>>>> There are: 434658 Atoms
>>>>>>>
>>>>>>> Constraining the starting coordinates (step 0)
>>>>>>>
>>>>>>> Constraining the coordinates at t0-dt (step 0)
>>>>>>> RMS relative constraint deviation after constraining: 3.67e-05
>>>>>>> Initial temperature: 300.5 K
>>>>>>>
>>>>>>> Started mdrun on rank 0 Mon Dec 22 16:28:01 2014
>>>>>>> Step Time Lambda
>>>>>>> 0 0.00000 0.00000
>>>>>>>
>>>>>>> Energies (kJ/mol)
>>>>>>> G96Angle Proper Dih. Improper Dih. LJ-14
>>>>>>> Coulomb-14
>>>>>>> 9.74139e+03 4.34956e+03 2.97359e+03 -1.93107e+02
>>>>>>> 8.05534e+04
>>>>>>> LJ (SR) Coulomb (SR) Coul. recip. Potential
>>>>>>> Kinetic
>>>>>>> En.
>>>>>>> 1.01340e+06 -7.13271e+06 2.01361e+04 -6.00175e+06
>>>>>>> 1.09887e+06
>>>>>>> Total Energy Conserved En. Temperature Pressure (bar)
>>>>>>> Constr.
>>>>>>> rmsd
>>>>>>> -4.90288e+06 -4.90288e+06 3.04092e+02 1.70897e+02
>>>>>>> 2.16683e-05
>>>>>>>
>>>>>>> step 80: timed with pme grid 128 128 128, coulomb cutoff 1.200:
>>>>>>> 6279.0
>>>>>>> M-cycles
>>>>>>> step 160: timed with pme grid 112 112 112, coulomb cutoff 1.306:
>>>>>>> 6962.2
>>>>>>> M-cycles
>>>>>>> step 240: timed with pme grid 100 100 100, coulomb cutoff 1.463:
>>>>>>> 8406.5
>>>>>>> M-cycles
>>>>>>> step 320: timed with pme grid 128 128 128, coulomb cutoff 1.200:
>>>>>>> 6424.0
>>>>>>> M-cycles
>>>>>>> step 400: timed with pme grid 120 120 120, coulomb cutoff 1.219:
>>>>>>> 6369.1
>>>>>>> M-cycles
>>>>>>> step 480: timed with pme grid 112 112 112, coulomb cutoff 1.306:
>>>>>>> 7309.0
>>>>>>> M-cycles
>>>>>>> step 560: timed with pme grid 108 108 108, coulomb cutoff 1.355:
>>>>>>> 7521.2
>>>>>>> M-cycles
>>>>>>> step 640: timed with pme grid 104 104 104, coulomb cutoff 1.407:
>>>>>>> 8369.8
>>>>>>> M-cycles
>>>>>>> optimal pme grid 128 128 128, coulomb cutoff 1.200
>>>>>>> Step Time Lambda
>>>>>>> 2500 5.00000 0.00000
>>>>>>>
>>>>>>> Energies (kJ/mol)
>>>>>>> G96Angle Proper Dih. Improper Dih. LJ-14
>>>>>>> Coulomb-14
>>>>>>> 9.72545e+03 4.33046e+03 2.98087e+03 -1.95794e+02
>>>>>>> 8.05967e+04
>>>>>>> LJ (SR) Coulomb (SR) Coul. recip. Potential
>>>>>>> Kinetic
>>>>>>> En.
>>>>>>> 1.01293e+06 -7.13110e+06 2.01689e+04 -6.00057e+06
>>>>>>> 1.08489e+06
>>>>>>> Total Energy Conserved En. Temperature Pressure (bar)
>>>>>>> Constr.
>>>>>>> rmsd
>>>>>>> -4.91567e+06 -4.90300e+06 3.00225e+02 1.36173e+02
>>>>>>> 2.25998e-05
>>>>>>>
>>>>>>> Step Time Lambda
>>>>>>> 5000 10.00000 0.00000
>>>>>>>
>>>>>>> ............
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -------------------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> Thank you in advance
>>>>>>>
>>>>>>> --
>>>>>>> Carmen Di Giovanni, PhD
>>>>>>> Dept. of Pharmaceutical and Toxicological Chemistry
>>>>>>> "Drug Discovery Lab"
>>>>>>> University of Naples "Federico II"
>>>>>>> Via D. Montesano, 49
>>>>>>> 80131 Naples
>>>>>>> Tel.: ++39 081 678623
>>>>>>> Fax: ++39 081 678100
>>>>>>> Email: cdigiova at unina.it
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Quoting Justin Lemkul <jalemkul at vt.edu>:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2/18/15 11:09 AM, Barnett, James W wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> What's your exact command?
>>>>>>>>>
>>>>>>>>
>>>>>>>> A full .log file would be even better; it would tell us everything
>>>>>>>> we
>>>>>>>> need
>>>>>>>> to know :)
>>>>>>>>
>>>>>>>> -Justin
>>>>>>>>
>>>>>>>>> Have you reviewed this page:
>>>>>>>>>
>>>>>>>>> http://www.gromacs.org/Documentation/Acceleration_and_parallelization
>>>>>>>>>
>>>>>>>>> James "Wes" Barnett
>>>>>>>>> Ph.D. Candidate
>>>>>>>>> Chemical and Biomolecular Engineering
>>>>>>>>>
>>>>>>>>> Tulane University
>>>>>>>>> Boggs Center for Energy and Biotechnology, Room 341-B
>>>>>>>>>
>>>>>>>>> ________________________________________
>>>>>>>>> From: gromacs.org_gmx-users-bounces at maillist.sys.kth.se
>>>>>>>>> <gromacs.org_gmx-users-bounces at maillist.sys.kth.se> on behalf of
>>>>>>>>> Carmen
>>>>>>>>> Di
>>>>>>>>> Giovanni <cdigiova at unina.it>
>>>>>>>>> Sent: Wednesday, February 18, 2015 10:06 AM
>>>>>>>>> To: gromacs.org_gmx-users at maillist.sys.kth.se
>>>>>>>>> Subject: Re: [gmx-users] GPU low performance
>>>>>>>>>
>>>>>>>>> I post the message of a md run :
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Force evaluation time GPU/CPU: 40.974 ms/24.437 ms = 1.677
>>>>>>>>> For optimal performance this ratio should be close to 1!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> NOTE: The GPU has >20% more load than the CPU. This imbalance
>>>>>>>>> causes
>>>>>>>>> performance loss, consider using a shorter cut-off and a
>>>>>>>>> finer
>>>>>>>>> PME
>>>>>>>>> grid.
>>>>>>>>>
>>>>>>>>> As can I solved this problem ?
>>>>>>>>> Thank you in advance
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Carmen Di Giovanni, PhD
>>>>>>>>> Dept. of Pharmaceutical and Toxicological Chemistry
>>>>>>>>> "Drug Discovery Lab"
>>>>>>>>> University of Naples "Federico II"
>>>>>>>>> Via D. Montesano, 49
>>>>>>>>> 80131 Naples
>>>>>>>>> Tel.: ++39 081 678623
>>>>>>>>> Fax: ++39 081 678100
>>>>>>>>> Email: cdigiova at unina.it
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Quoting Justin Lemkul <jalemkul at vt.edu>:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 2/18/15 10:30 AM, Carmen Di Giovanni wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Daear all,
>>>>>>>>>>> I'm working on a machine with an INVIDIA Teska K20.
>>>>>>>>>>> After a minimization on a protein of 1925 atoms this is the
>>>>>>>>>>> mesage:
>>>>>>>>>>>
>>>>>>>>>>> Force evaluation time GPU/CPU: 2.923 ms/116.774 ms = 0.025
>>>>>>>>>>> For optimal performance this ratio should be close to 1!
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Minimization is a poor indicator of performance. Do a real MD
>>>>>>>>>> run.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> NOTE: The GPU has >25% less load than the CPU. This imbalance
>>>>>>>>>>> causes
>>>>>>>>>>> performance loss.
>>>>>>>>>>>
>>>>>>>>>>> Core t (s) Wall t (s) (%)
>>>>>>>>>>> Time: 3289.010 205.891 1597.4
>>>>>>>>>>> (steps/hour)
>>>>>>>>>>> Performance: 8480.2
>>>>>>>>>>> Finished mdrun on rank 0 Wed Feb 18 15:50:06 2015
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Cai I improve the performance?
>>>>>>>>>>> At the moment in the forum I didn't full informations to solve
>>>>>>>>>>> this
>>>>>>>>>>> problem.
>>>>>>>>>>> In attachment there is the log. file
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The list does not accept attachments. If you wish to share a
>>>>>>>>>> file,
>>>>>>>>>> upload it to a file-sharing service and provide a URL. The full
>>>>>>>>>> .log is quite important for understanding your hardware,
>>>>>>>>>> optimizations, and seeing full details of the performance
>>>>>>>>>> breakdown.
>>>>>>>>>> But again, base your assessment on MD, not EM.
>>>>>>>>>>
>>>>>>>>>> -Justin
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> ==================================================
>>>>>>>>>>
>>>>>>>>>> Justin A. Lemkul, Ph.D.
>>>>>>>>>> Ruth L. Kirschstein NRSA Postdoctoral Fellow
>>>>>>>>>>
>>>>>>>>>> Department of Pharmaceutical Sciences
>>>>>>>>>> School of Pharmacy
>>>>>>>>>> Health Sciences Facility II, Room 629
>>>>>>>>>> University of Maryland, Baltimore
>>>>>>>>>> 20 Penn St.
>>>>>>>>>> Baltimore, MD 21201
>>>>>>>>>>
>>>>>>>>>> jalemkul at outerbanks.umaryland.edu | (410) 706-7441
>>>>>>>>>> http://mackerell.umaryland.edu/~jalemkul
>>>>>>>>>>
>>>>>>>>>> ==================================================
>>>>>>>>>> --
>>>>>>>>>> Gromacs Users mailing list
>>>>>>>>>>
>>>>>>>>>> * Please search the archive at
>>>>>>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>>>>>>>> posting!
>>>>>>>>>>
>>>>>>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>>>>>>
>>>>>>>>>> * For (un)subscribe requests visit
>>>>>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>>>>>>>>>> or send a mail to gmx-users-request at gromacs.org.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Gromacs Users mailing list
>>>>>>>>>
>>>>>>>>> * Please search the archive at
>>>>>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>>>>>>> posting!
>>>>>>>>>
>>>>>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>>>>>
>>>>>>>>> * For (un)subscribe requests visit
>>>>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>>>>>>>>> or
>>>>>>>>> send a mail to gmx-users-request at gromacs.org.
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> ==================================================
>>>>>>>>
>>>>>>>> Justin A. Lemkul, Ph.D.
>>>>>>>> Ruth L. Kirschstein NRSA Postdoctoral Fellow
>>>>>>>>
>>>>>>>> Department of Pharmaceutical Sciences
>>>>>>>> School of Pharmacy
>>>>>>>> Health Sciences Facility II, Room 629
>>>>>>>> University of Maryland, Baltimore
>>>>>>>> 20 Penn St.
>>>>>>>> Baltimore, MD 21201
>>>>>>>>
>>>>>>>> jalemkul at outerbanks.umaryland.edu | (410) 706-7441
>>>>>>>> http://mackerell.umaryland.edu/~jalemkul
>>>>>>>>
>>>>>>>> ==================================================
>>>>>>>> --
>>>>>>>> Gromacs Users mailing list
>>>>>>>>
>>>>>>>> * Please search the archive at
>>>>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>>>>>> posting!
>>>>>>>>
>>>>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>>>>
>>>>>>>> * For (un)subscribe requests visit
>>>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>>>>>>>> or
>>>>>>>> send
>>>>>>>> a mail to gmx-users-request at gromacs.org.
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Gromacs Users mailing list
>>>>>>>
>>>>>>> * Please search the archive at
>>>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>>>>> posting!
>>>>>>>
>>>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>>>
>>>>>>> * For (un)subscribe requests visit
>>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>>>>>> send a
>>>>>>> mail to gmx-users-request at gromacs.org.
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Gromacs Users mailing list
>>>>>>
>>>>>> * Please search the archive at
>>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>>>> posting!
>>>>>>
>>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>>
>>>>>> * For (un)subscribe requests visit
>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>>>>> send
>>>>>> a mail to gmx-users-request at gromacs.org.
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>
>
>
More information about the gromacs.org_gmx-users
mailing list