[gmx-users] GPU low performance
Carmen Di Giovanni
cdigiova at unina.it
Thu Feb 19 11:32:52 CET 2015
Dear Szilárd,
1) the output of command nvidia-smi -ac 2600,758 is
[root at localhost test_gpu]# nvidia-smi -ac 2600,758
Applications clocks set to "(MEM 2600, SM 758)" for GPU 0000:03:00.0
Warning: persistence mode is disabled on this device. This settings
will go back to default as soon as driver unloads (e.g. last
application like nvidia-smi or cuda application terminates). Run with
[--help | -h] switch to get more information on how to enable
persistence mode.
Setting applications clocks is not supported for GPU 0000:82:00.0.
Treating as warning and moving on.
All done.
----------------------------------------------------------------------------
2) I decreased nlists to 20
However when I do the command:
gmx_mpi mdrun -deffnm nvt -ntmpi 8 -gpu_id 00001111
give me a fatal error:
GROMACS: gmx mdrun, VERSION 5.0
Executable: /opt/SW/gromacs-5.0/build/mpi-cuda/bin/gmx_mpi
Library dir: /opt/SW/gromacs-5.0/share/top
Command line:
gmx_mpi mdrun -deffnm nvt -ntmpi 8 -gpu_id 00001111
Back Off! I just backed up nvt.log to ./#nvt.log.8#
Reading file nvt.tpr, VERSION 5.0 (single precision)
Changing nstlist from 10 to 40, rlist from 1 to 1.097
-------------------------------------------------------
Program gmx_mpi, VERSION 5.0
Source code file: /opt/SW/gromacs-5.0/src/programs/mdrun/runner.c, line: 876
Fatal error:
Setting the number of thread-MPI threads is only supported with
thread-MPI and Gromacs was compiled without thread-MPI
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------
Halting program gmx_mpi
gcq#223: "Jesus Not Only Saves, He Also Frequently Makes Backups."
(Myron Bradshaw)
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode -1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
-------------------------------------------------------------------------
4) I don't understand as I can reduce the "Rest" time
Carmen
--
Carmen Di Giovanni, PhD
Dept. of Pharmaceutical and Toxicological Chemistry
"Drug Discovery Lab"
University of Naples "Federico II"
Via D. Montesano, 49
80131 Naples
Tel.: ++39 081 678623
Fax: ++39 081 678100
Email: cdigiova at unina.it
Quoting Szilárd Páll <pall.szilard at gmail.com>:
> Please keep the mails on the list.
>
> On Wed, Feb 18, 2015 at 6:32 PM, Carmen Di Giovanni
> <cdigiova at unina.it> wrote:
>> nvidia-smi -q -g 0
>>
>> ==============NVSMI LOG==============
>>
>> Timestamp : Wed Feb 18 18:30:01 2015
>> Driver Version : 340.24
>>
>> Attached GPUs : 2
>> GPU 0000:03:00.0
>> Product Name : Tesla K20c
> [...
>> Clocks
>> Graphics : 705 MHz
>> SM : 705 MHz
>> Memory : 2600 MHz
>> Applications Clocks
>> Graphics : 705 MHz
>> Memory : 2600 MHz
>> Default Applications Clocks
>> Graphics : 705 MHz
>> Memory : 2600 MHz
>> Max Clocks
>> Graphics : 758 MHz
>> SM : 758 MHz
>> Memory : 2600 MHz
>
> This is the relevant part I was looking for. The Tesla K20c supports
> setting a so-called application clock which is essentially means that
> you can bump its clock frequency using the NVDIA management tool
> nvidia-smi from the default 705 MHz to 758 MHz.
>
> Use the command:
> nvidia-smi -ac 2600,758
>
> This should give you another 7% or so (I didn't remember the correct
> max clock before, that's why I guessing 5%).
>
> Cheers,
> Szilard
>
>> Clock Policy
>> Auto Boost : N/A
>> Auto Boost Default : N/A
>> Compute Processes
>> Process ID : 19441
>> Name : gmx_mpi
>> Used GPU Memory : 110 MiB
>>
>> [carmendigi at localhost test_gpu]$
>>
>>
>>
>>
>>
>>
>>
>> --
>> Carmen Di Giovanni, PhD
>> Dept. of Pharmaceutical and Toxicological Chemistry
>> "Drug Discovery Lab"
>> University of Naples "Federico II"
>> Via D. Montesano, 49
>> 80131 Naples
>> Tel.: ++39 081 678623
>> Fax: ++39 081 678100
>> Email: cdigiova at unina.it
>>
>>
>>
>> Quoting Szilárd Páll <pall.szilard at gmail.com>:
>>
>>> As I suggested above please use pastebin.com or similar!
>>> --
>>> Szilárd
>>>
>>>
>>> On Wed, Feb 18, 2015 at 6:09 PM, Carmen Di Giovanni <cdigiova at unina.it>
>>> wrote:
>>>>
>>>> Dear Szilàrd, it's not possible attach the full log file in the forum
>>>> mail
>>>> because it is too big.
>>>> I send it by your private mail address.
>>>> Thank you in advance
>>>> Carmen
>>>>
>>>>
>>>> --
>>>> Carmen Di Giovanni, PhD
>>>> Dept. of Pharmaceutical and Toxicological Chemistry
>>>> "Drug Discovery Lab"
>>>> University of Naples "Federico II"
>>>> Via D. Montesano, 49
>>>> 80131 Naples
>>>> Tel.: ++39 081 678623
>>>> Fax: ++39 081 678100
>>>> Email: cdigiova at unina.it
>>>>
>>>>
>>>>
>>>> Quoting Szilárd Páll <pall.szilard at gmail.com>:
>>>>
>>>>> We need a *full* log file, not parts of it!
>>>>>
>>>>> You can try running with "-ntomp 16 -pin on" - it may be a bit faster
>>>>> not not use HyperThreading.
>>>>> --
>>>>> Szilárd
>>>>>
>>>>>
>>>>> On Wed, Feb 18, 2015 at 5:20 PM, Carmen Di Giovanni <cdigiova at unina.it>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> Justin,
>>>>>> the problem is evident for all calculations.
>>>>>> This is the log file of a recent run:
>>>>>>
>>>>>>
>>>>>>
>>>>>> --------------------------------------------------------------------------------
>>>>>>
>>>>>> Log file opened on Mon Dec 22 16:28:00 2014
>>>>>> Host: localhost.localdomain pid: 8378 rank ID: 0 number of ranks: 1
>>>>>> GROMACS: gmx mdrun, VERSION 5.0
>>>>>>
>>>>>> GROMACS is written by:
>>>>>> Emile Apol Rossen Apostolov Herman J.C. Berendsen Par
>>>>>> Bjelkmar
>>>>>> Aldert van Buuren Rudi van Drunen Anton Feenstra Sebastian
>>>>>> Fritsch
>>>>>> Gerrit Groenhof Christoph Junghans Peter Kasson Carsten
>>>>>> Kutzner
>>>>>> Per Larsson Justin A. Lemkul Magnus Lundborg Pieter
>>>>>> Meulenhoff
>>>>>> Erik Marklund Teemu Murtola Szilard Pall Sander Pronk
>>>>>> Roland Schulz Alexey Shvetsov Michael Shirts Alfons Sijbers
>>>>>> Peter Tieleman Christian Wennberg Maarten Wolf
>>>>>> and the project leaders:
>>>>>> Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel
>>>>>>
>>>>>> Copyright (c) 1991-2000, University of Groningen, The Netherlands.
>>>>>> Copyright (c) 2001-2014, The GROMACS development team at
>>>>>> Uppsala University, Stockholm University and
>>>>>> the Royal Institute of Technology, Sweden.
>>>>>> check out http://www.gromacs.org for more information.
>>>>>>
>>>>>> GROMACS is free software; you can redistribute it and/or modify it
>>>>>> under the terms of the GNU Lesser General Public License
>>>>>> as published by the Free Software Foundation; either version 2.1
>>>>>> of the License, or (at your option) any later version.
>>>>>>
>>>>>> GROMACS: gmx mdrun, VERSION 5.0
>>>>>> Executable: /opt/SW/gromacs-5.0/build/mpi-cuda/bin/gmx_mpi
>>>>>> Library dir: /opt/SW/gromacs-5.0/share/top
>>>>>> Command line:
>>>>>> gmx_mpi mdrun -deffnm prod_20ns
>>>>>>
>>>>>> Gromacs version: VERSION 5.0
>>>>>> Precision: single
>>>>>> Memory model: 64 bit
>>>>>> MPI library: MPI
>>>>>> OpenMP support: enabled
>>>>>> GPU support: enabled
>>>>>> invsqrt routine: gmx_software_invsqrt(x)
>>>>>> SIMD instructions: AVX_256
>>>>>> FFT library: fftw-3.3.3-sse2
>>>>>> RDTSCP usage: enabled
>>>>>> C++11 compilation: disabled
>>>>>> TNG support: enabled
>>>>>> Tracing support: disabled
>>>>>> Built on: Thu Jul 31 18:30:37 CEST 2014
>>>>>> Built by: root at localhost.localdomain [CMAKE]
>>>>>> Build OS/arch: Linux 2.6.32-431.el6.x86_64 x86_64
>>>>>> Build CPU vendor: GenuineIntel
>>>>>> Build CPU brand: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
>>>>>> Build CPU family: 6 Model: 62 Stepping: 4
>>>>>> Build CPU features: aes apic avx clfsh cmov cx8 cx16 f16c htt lahf_lm
>>>>>> mmx
>>>>>> msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp sse2
>>>>>> sse3
>>>>>> sse4.1 sse4.2 ssse3 tdt x2apic
>>>>>> C compiler: /usr/bin/cc GNU 4.4.7
>>>>>> C compiler flags: -mavx -Wno-maybe-uninitialized -Wextra
>>>>>> -Wno-missing-field-initializers -Wno-sign-compare -Wpointer-arith -Wall
>>>>>> -Wno-unused -Wunused-value -Wunused-parameter -fomit-frame-pointer
>>>>>> -funroll-all-loops -Wno-array-bounds -O3 -DNDEBUG
>>>>>> C++ compiler: /usr/bin/c++ GNU 4.4.7
>>>>>> C++ compiler flags: -mavx -Wextra -Wno-missing-field-initializers
>>>>>> -Wpointer-arith -Wall -Wno-unused-function -fomit-frame-pointer
>>>>>> -funroll-all-loops -Wno-array-bounds -O3 -DNDEBUG
>>>>>> Boost version: 1.55.0 (internal)
>>>>>> CUDA compiler: /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda
>>>>>> compiler
>>>>>> driver;Copyright (c) 2005-2013 NVIDIA Corporation;Built on
>>>>>> Thu_Mar_13_11:58:58_PDT_2014;Cuda compilation tools, release 6.0,
>>>>>> V6.0.1
>>>>>> CUDA compiler
>>>>>>
>>>>>>
>>>>>> flags:-gencode;arch=compute_20,code=sm_20;-gencode;arch=compute_20,code=sm_21;-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_35,code=compute_35;-use_fast_math;-Xcompiler;-fPIC
>>>>>> ;
>>>>>>
>>>>>>
>>>>>> ;-mavx;-Wextra;-Wno-missing-field-initializers;-Wpointer-arith;-Wall;-Wno-unused-function;-fomit-frame-pointer;-funroll-all-loops;-Wno-array-bounds;-O3;-DNDEBUG
>>>>>> CUDA driver: 6.50
>>>>>> CUDA runtime: 6.0
>>>>>>
>>>>>>
>>>>>>
>>>>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>>>>> B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
>>>>>> GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable
>>>>>> molecular simulation
>>>>>> J. Chem. Theory Comput. 4 (2008) pp. 435-447
>>>>>> -------- -------- --- Thank You --- -------- --------
>>>>>>
>>>>>>
>>>>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>>>>> D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H.
>>>>>> J.
>>>>>> C.
>>>>>> Berendsen
>>>>>> GROMACS: Fast, Flexible and Free
>>>>>> J. Comp. Chem. 26 (2005) pp. 1701-1719
>>>>>> -------- -------- --- Thank You --- -------- --------
>>>>>>
>>>>>>
>>>>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>>>>> E. Lindahl and B. Hess and D. van der Spoel
>>>>>> GROMACS 3.0: A package for molecular simulation and trajectory analysis
>>>>>> J. Mol. Mod. 7 (2001) pp. 306-317
>>>>>> -------- -------- --- Thank You --- -------- --------
>>>>>>
>>>>>>
>>>>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>>>>> H. J. C. Berendsen, D. van der Spoel and R. van Drunen
>>>>>> GROMACS: A message-passing parallel molecular dynamics implementation
>>>>>> Comp. Phys. Comm. 91 (1995) pp. 43-56
>>>>>> -------- -------- --- Thank You --- -------- --------
>>>>>>
>>>>>>
>>>>>> For optimal performance with a GPU nstlist (now 10) should be larger.
>>>>>> The optimum depends on your CPU and GPU resources.
>>>>>> You might want to try several nstlist values.
>>>>>> Changing nstlist from 10 to 40, rlist from 1.2 to 1.285
>>>>>>
>>>>>> Input Parameters:
>>>>>> integrator = md
>>>>>> tinit = 0
>>>>>> dt = 0.002
>>>>>> nsteps = 10000000
>>>>>> init-step = 0
>>>>>> simulation-part = 1
>>>>>> comm-mode = Linear
>>>>>> nstcomm = 1
>>>>>> bd-fric = 0
>>>>>> ld-seed = 1993
>>>>>> emtol = 10
>>>>>> emstep = 0.01
>>>>>> niter = 20
>>>>>> fcstep = 0
>>>>>> nstcgsteep = 1000
>>>>>> nbfgscorr = 10
>>>>>> rtpi = 0.05
>>>>>> nstxout = 2500
>>>>>> nstvout = 2500
>>>>>> nstfout = 0
>>>>>> nstlog = 2500
>>>>>> nstcalcenergy = 1
>>>>>> nstenergy = 2500
>>>>>> nstxout-compressed = 500
>>>>>> compressed-x-precision = 1000
>>>>>> cutoff-scheme = Verlet
>>>>>> nstlist = 40
>>>>>> ns-type = Grid
>>>>>> pbc = xyz
>>>>>> periodic-molecules = FALSE
>>>>>> verlet-buffer-tolerance = 0.005
>>>>>> rlist = 1.285
>>>>>> rlistlong = 1.285
>>>>>> nstcalclr = 10
>>>>>> coulombtype = PME
>>>>>> coulomb-modifier = Potential-shift
>>>>>> rcoulomb-switch = 0
>>>>>> rcoulomb = 1.2
>>>>>> epsilon-r = 1
>>>>>> epsilon-rf = 1
>>>>>> vdw-type = Cut-off
>>>>>> vdw-modifier = Potential-shift
>>>>>> rvdw-switch = 0
>>>>>> rvdw = 1.2
>>>>>> DispCorr = No
>>>>>> table-extension = 1
>>>>>> fourierspacing = 0.135
>>>>>> fourier-nx = 128
>>>>>> fourier-ny = 128
>>>>>> fourier-nz = 128
>>>>>> pme-order = 4
>>>>>> ewald-rtol = 1e-05
>>>>>> ewald-rtol-lj = 0.001
>>>>>> lj-pme-comb-rule = Geometric
>>>>>> ewald-geometry = 0
>>>>>> epsilon-surface = 0
>>>>>> implicit-solvent = No
>>>>>> gb-algorithm = Still
>>>>>> nstgbradii = 1
>>>>>> rgbradii = 2
>>>>>> gb-epsilon-solvent = 80
>>>>>> gb-saltconc = 0
>>>>>> gb-obc-alpha = 1
>>>>>> gb-obc-beta = 0.8
>>>>>> gb-obc-gamma = 4.85
>>>>>> gb-dielectric-offset = 0.009
>>>>>> sa-algorithm = Ace-approximation
>>>>>> sa-surface-tension = 2.092
>>>>>> tcoupl = V-rescale
>>>>>> nsttcouple = 10
>>>>>> nh-chain-length = 0
>>>>>> print-nose-hoover-chain-variables = FALSE
>>>>>> pcoupl = No
>>>>>> pcoupltype = Semiisotropic
>>>>>> nstpcouple = -1
>>>>>> tau-p = 0.5
>>>>>> compressibility (3x3):
>>>>>> compressibility[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>>>>>> compressibility[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>>>>>> compressibility[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>>>>>> ref-p (3x3):
>>>>>> ref-p[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>>>>>> ref-p[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>>>>>> ref-p[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>>>>>> refcoord-scaling = No
>>>>>> posres-com (3):
>>>>>> posres-com[0]= 0.00000e+00
>>>>>> posres-com[1]= 0.00000e+00
>>>>>> posres-com[2]= 0.00000e+00
>>>>>> posres-comB (3):
>>>>>> posres-comB[0]= 0.00000e+00
>>>>>> posres-comB[1]= 0.00000e+00
>>>>>> posres-comB[2]= 0.00000e+00
>>>>>> QMMM = FALSE
>>>>>> QMconstraints = 0
>>>>>> QMMMscheme = 0
>>>>>> MMChargeScaleFactor = 1
>>>>>> qm-opts:
>>>>>> ngQM = 0
>>>>>> constraint-algorithm = Lincs
>>>>>> continuation = FALSE
>>>>>> Shake-SOR = FALSE
>>>>>> shake-tol = 0.0001
>>>>>> lincs-order = 4
>>>>>> lincs-iter = 1
>>>>>> lincs-warnangle = 30
>>>>>> nwall = 0
>>>>>> wall-type = 9-3
>>>>>> wall-r-linpot = -1
>>>>>> wall-atomtype[0] = -1
>>>>>> wall-atomtype[1] = -1
>>>>>> wall-density[0] = 0
>>>>>> wall-density[1] = 0
>>>>>> wall-ewald-zfac = 3
>>>>>> pull = no
>>>>>> rotation = FALSE
>>>>>> interactiveMD = FALSE
>>>>>> disre = No
>>>>>> disre-weighting = Conservative
>>>>>> disre-mixed = FALSE
>>>>>> dr-fc = 1000
>>>>>> dr-tau = 0
>>>>>> nstdisreout = 100
>>>>>> orire-fc = 0
>>>>>> orire-tau = 0
>>>>>> nstorireout = 100
>>>>>> free-energy = no
>>>>>> cos-acceleration = 0
>>>>>> deform (3x3):
>>>>>> deform[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>>>>>> deform[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>>>>>> deform[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>>>>>> simulated-tempering = FALSE
>>>>>> E-x:
>>>>>> n = 0
>>>>>> E-xt:
>>>>>> n = 0
>>>>>> E-y:
>>>>>> n = 0
>>>>>> E-yt:
>>>>>> n = 0
>>>>>> E-z:
>>>>>> n = 0
>>>>>> E-zt:
>>>>>> n = 0
>>>>>> swapcoords = no
>>>>>> adress = FALSE
>>>>>> userint1 = 0
>>>>>> userint2 = 0
>>>>>> userint3 = 0
>>>>>> userint4 = 0
>>>>>> userreal1 = 0
>>>>>> userreal2 = 0
>>>>>> userreal3 = 0
>>>>>> userreal4 = 0
>>>>>> grpopts:
>>>>>> nrdf: 869226
>>>>>> ref-t: 300
>>>>>> tau-t: 0.1
>>>>>> annealing: No
>>>>>> annealing-npoints: 0
>>>>>> acc: 0 0 0
>>>>>> nfreeze: N N N
>>>>>> energygrp-flags[ 0]: 0
>>>>>> Using 1 MPI process
>>>>>> Using 32 OpenMP threads
>>>>>>
>>>>>> Detecting CPU SIMD instructions.
>>>>>> Present hardware specification:
>>>>>> Vendor: GenuineIntel
>>>>>> Brand: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
>>>>>> Family: 6 Model: 62 Stepping: 4
>>>>>> Features: aes apic avx clfsh cmov cx8 cx16 f16c htt lahf_lm mmx msr
>>>>>> nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp sse2
>>>>>> sse3
>>>>>> sse4.1 sse4.2 ssse3 tdt x2apic
>>>>>> SIMD instructions most likely to fit this hardware: AVX_256
>>>>>> SIMD instructions selected at GROMACS compile time: AVX_256
>>>>>>
>>>>>>
>>>>>> 2 GPUs detected on host localhost.localdomain:
>>>>>> #0: NVIDIA Tesla K20c, compute cap.: 3.5, ECC: yes, stat: compatible
>>>>>> #1: NVIDIA GeForce GTX 650, compute cap.: 3.0, ECC: no, stat:
>>>>>> compatible
>>>>>>
>>>>>> 1 GPU auto-selected for this run.
>>>>>> Mapping of GPU to the 1 PP rank in this node: #0
>>>>>>
>>>>>>
>>>>>> NOTE: potentially sub-optimal launch configuration, gmx_mpi started
>>>>>> with
>>>>>> less
>>>>>> PP MPI process per node than GPUs available.
>>>>>> Each PP MPI process can use only one GPU, 1 GPU per node will be
>>>>>> used.
>>>>>>
>>>>>> Will do PME sum in reciprocal space for electrostatic interactions.
>>>>>>
>>>>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>>>>> U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L. G.
>>>>>> Pedersen
>>>>>> A smooth particle mesh Ewald method
>>>>>> J. Chem. Phys. 103 (1995) pp. 8577-8592
>>>>>> -------- -------- --- Thank You --- -------- --------
>>>>>>
>>>>>> Will do ordinary reciprocal space Ewald sum.
>>>>>> Using a Gaussian width (1/beta) of 0.384195 nm for Ewald
>>>>>> Cut-off's: NS: 1.285 Coulomb: 1.2 LJ: 1.2
>>>>>> System total charge: -0.012
>>>>>> Generated table with 1142 data points for Ewald.
>>>>>> Tabscale = 500 points/nm
>>>>>> Generated table with 1142 data points for LJ6.
>>>>>> Tabscale = 500 points/nm
>>>>>> Generated table with 1142 data points for LJ12.
>>>>>> Tabscale = 500 points/nm
>>>>>> Generated table with 1142 data points for 1-4 COUL.
>>>>>> Tabscale = 500 points/nm
>>>>>> Generated table with 1142 data points for 1-4 LJ6.
>>>>>> Tabscale = 500 points/nm
>>>>>> Generated table with 1142 data points for 1-4 LJ12.
>>>>>> Tabscale = 500 points/nm
>>>>>>
>>>>>> Using CUDA 8x8 non-bonded kernels
>>>>>>
>>>>>> Potential shift: LJ r^-12: -1.122e-01 r^-6: -3.349e-01, Ewald
>>>>>> -1.000e-05
>>>>>> Initialized non-bonded Ewald correction tables, spacing: 7.82e-04 size:
>>>>>> 1536
>>>>>>
>>>>>> Removing pbc first time
>>>>>> Pinning threads with an auto-selected logical core stride of 1
>>>>>>
>>>>>> Initializing LINear Constraint Solver
>>>>>>
>>>>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>>>>> B. Hess and H. Bekker and H. J. C. Berendsen and J. G. E. M. Fraaije
>>>>>> LINCS: A Linear Constraint Solver for molecular simulations
>>>>>> J. Comp. Chem. 18 (1997) pp. 1463-1472
>>>>>> -------- -------- --- Thank You --- -------- --------
>>>>>>
>>>>>> The number of constraints is 5913
>>>>>>
>>>>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>>>>> S. Miyamoto and P. A. Kollman
>>>>>> SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for
>>>>>> Rigid
>>>>>> Water Models
>>>>>> J. Comp. Chem. 13 (1992) pp. 952-962
>>>>>> -------- -------- --- Thank You --- -------- --------
>>>>>>
>>>>>> Center of mass motion removal mode is Linear
>>>>>> We have the following groups for center of mass motion removal:
>>>>>> 0: rest
>>>>>>
>>>>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>>>>> G. Bussi, D. Donadio and M. Parrinello
>>>>>> Canonical sampling through velocity rescaling
>>>>>> J. Chem. Phys. 126 (2007) pp. 014101
>>>>>> -------- -------- --- Thank You --- -------- --------
>>>>>>
>>>>>> There are: 434658 Atoms
>>>>>>
>>>>>> Constraining the starting coordinates (step 0)
>>>>>>
>>>>>> Constraining the coordinates at t0-dt (step 0)
>>>>>> RMS relative constraint deviation after constraining: 3.67e-05
>>>>>> Initial temperature: 300.5 K
>>>>>>
>>>>>> Started mdrun on rank 0 Mon Dec 22 16:28:01 2014
>>>>>> Step Time Lambda
>>>>>> 0 0.00000 0.00000
>>>>>>
>>>>>> Energies (kJ/mol)
>>>>>> G96Angle Proper Dih. Improper Dih. LJ-14
>>>>>> Coulomb-14
>>>>>> 9.74139e+03 4.34956e+03 2.97359e+03 -1.93107e+02
>>>>>> 8.05534e+04
>>>>>> LJ (SR) Coulomb (SR) Coul. recip. Potential Kinetic
>>>>>> En.
>>>>>> 1.01340e+06 -7.13271e+06 2.01361e+04 -6.00175e+06
>>>>>> 1.09887e+06
>>>>>> Total Energy Conserved En. Temperature Pressure (bar) Constr.
>>>>>> rmsd
>>>>>> -4.90288e+06 -4.90288e+06 3.04092e+02 1.70897e+02
>>>>>> 2.16683e-05
>>>>>>
>>>>>> step 80: timed with pme grid 128 128 128, coulomb cutoff 1.200:
>>>>>> 6279.0
>>>>>> M-cycles
>>>>>> step 160: timed with pme grid 112 112 112, coulomb cutoff 1.306:
>>>>>> 6962.2
>>>>>> M-cycles
>>>>>> step 240: timed with pme grid 100 100 100, coulomb cutoff 1.463:
>>>>>> 8406.5
>>>>>> M-cycles
>>>>>> step 320: timed with pme grid 128 128 128, coulomb cutoff 1.200:
>>>>>> 6424.0
>>>>>> M-cycles
>>>>>> step 400: timed with pme grid 120 120 120, coulomb cutoff 1.219:
>>>>>> 6369.1
>>>>>> M-cycles
>>>>>> step 480: timed with pme grid 112 112 112, coulomb cutoff 1.306:
>>>>>> 7309.0
>>>>>> M-cycles
>>>>>> step 560: timed with pme grid 108 108 108, coulomb cutoff 1.355:
>>>>>> 7521.2
>>>>>> M-cycles
>>>>>> step 640: timed with pme grid 104 104 104, coulomb cutoff 1.407:
>>>>>> 8369.8
>>>>>> M-cycles
>>>>>> optimal pme grid 128 128 128, coulomb cutoff 1.200
>>>>>> Step Time Lambda
>>>>>> 2500 5.00000 0.00000
>>>>>>
>>>>>> Energies (kJ/mol)
>>>>>> G96Angle Proper Dih. Improper Dih. LJ-14
>>>>>> Coulomb-14
>>>>>> 9.72545e+03 4.33046e+03 2.98087e+03 -1.95794e+02
>>>>>> 8.05967e+04
>>>>>> LJ (SR) Coulomb (SR) Coul. recip. Potential Kinetic
>>>>>> En.
>>>>>> 1.01293e+06 -7.13110e+06 2.01689e+04 -6.00057e+06
>>>>>> 1.08489e+06
>>>>>> Total Energy Conserved En. Temperature Pressure (bar) Constr.
>>>>>> rmsd
>>>>>> -4.91567e+06 -4.90300e+06 3.00225e+02 1.36173e+02
>>>>>> 2.25998e-05
>>>>>>
>>>>>> Step Time Lambda
>>>>>> 5000 10.00000 0.00000
>>>>>>
>>>>>> ............
>>>>>>
>>>>>>
>>>>>>
>>>>>> -------------------------------------------------------------------------------
>>>>>>
>>>>>>
>>>>>> Thank you in advance
>>>>>>
>>>>>> --
>>>>>> Carmen Di Giovanni, PhD
>>>>>> Dept. of Pharmaceutical and Toxicological Chemistry
>>>>>> "Drug Discovery Lab"
>>>>>> University of Naples "Federico II"
>>>>>> Via D. Montesano, 49
>>>>>> 80131 Naples
>>>>>> Tel.: ++39 081 678623
>>>>>> Fax: ++39 081 678100
>>>>>> Email: cdigiova at unina.it
>>>>>>
>>>>>>
>>>>>>
>>>>>> Quoting Justin Lemkul <jalemkul at vt.edu>:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 2/18/15 11:09 AM, Barnett, James W wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> What's your exact command?
>>>>>>>>
>>>>>>>
>>>>>>> A full .log file would be even better; it would tell us everything we
>>>>>>> need
>>>>>>> to know :)
>>>>>>>
>>>>>>> -Justin
>>>>>>>
>>>>>>>> Have you reviewed this page:
>>>>>>>> http://www.gromacs.org/Documentation/Acceleration_and_parallelization
>>>>>>>>
>>>>>>>> James "Wes" Barnett
>>>>>>>> Ph.D. Candidate
>>>>>>>> Chemical and Biomolecular Engineering
>>>>>>>>
>>>>>>>> Tulane University
>>>>>>>> Boggs Center for Energy and Biotechnology, Room 341-B
>>>>>>>>
>>>>>>>> ________________________________________
>>>>>>>> From: gromacs.org_gmx-users-bounces at maillist.sys.kth.se
>>>>>>>> <gromacs.org_gmx-users-bounces at maillist.sys.kth.se> on behalf of
>>>>>>>> Carmen
>>>>>>>> Di
>>>>>>>> Giovanni <cdigiova at unina.it>
>>>>>>>> Sent: Wednesday, February 18, 2015 10:06 AM
>>>>>>>> To: gromacs.org_gmx-users at maillist.sys.kth.se
>>>>>>>> Subject: Re: [gmx-users] GPU low performance
>>>>>>>>
>>>>>>>> I post the message of a md run :
>>>>>>>>
>>>>>>>>
>>>>>>>> Force evaluation time GPU/CPU: 40.974 ms/24.437 ms = 1.677
>>>>>>>> For optimal performance this ratio should be close to 1!
>>>>>>>>
>>>>>>>>
>>>>>>>> NOTE: The GPU has >20% more load than the CPU. This imbalance causes
>>>>>>>> performance loss, consider using a shorter cut-off and a finer
>>>>>>>> PME
>>>>>>>> grid.
>>>>>>>>
>>>>>>>> As can I solved this problem ?
>>>>>>>> Thank you in advance
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Carmen Di Giovanni, PhD
>>>>>>>> Dept. of Pharmaceutical and Toxicological Chemistry
>>>>>>>> "Drug Discovery Lab"
>>>>>>>> University of Naples "Federico II"
>>>>>>>> Via D. Montesano, 49
>>>>>>>> 80131 Naples
>>>>>>>> Tel.: ++39 081 678623
>>>>>>>> Fax: ++39 081 678100
>>>>>>>> Email: cdigiova at unina.it
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Quoting Justin Lemkul <jalemkul at vt.edu>:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 2/18/15 10:30 AM, Carmen Di Giovanni wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Daear all,
>>>>>>>>>> I'm working on a machine with an INVIDIA Teska K20.
>>>>>>>>>> After a minimization on a protein of 1925 atoms this is the mesage:
>>>>>>>>>>
>>>>>>>>>> Force evaluation time GPU/CPU: 2.923 ms/116.774 ms = 0.025
>>>>>>>>>> For optimal performance this ratio should be close to 1!
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Minimization is a poor indicator of performance. Do a real MD run.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> NOTE: The GPU has >25% less load than the CPU. This imbalance
>>>>>>>>>> causes
>>>>>>>>>> performance loss.
>>>>>>>>>>
>>>>>>>>>> Core t (s) Wall t (s) (%)
>>>>>>>>>> Time: 3289.010 205.891 1597.4
>>>>>>>>>> (steps/hour)
>>>>>>>>>> Performance: 8480.2
>>>>>>>>>> Finished mdrun on rank 0 Wed Feb 18 15:50:06 2015
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Cai I improve the performance?
>>>>>>>>>> At the moment in the forum I didn't full informations to solve this
>>>>>>>>>> problem.
>>>>>>>>>> In attachment there is the log. file
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The list does not accept attachments. If you wish to share a file,
>>>>>>>>> upload it to a file-sharing service and provide a URL. The full
>>>>>>>>> .log is quite important for understanding your hardware,
>>>>>>>>> optimizations, and seeing full details of the performance breakdown.
>>>>>>>>> But again, base your assessment on MD, not EM.
>>>>>>>>>
>>>>>>>>> -Justin
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> ==================================================
>>>>>>>>>
>>>>>>>>> Justin A. Lemkul, Ph.D.
>>>>>>>>> Ruth L. Kirschstein NRSA Postdoctoral Fellow
>>>>>>>>>
>>>>>>>>> Department of Pharmaceutical Sciences
>>>>>>>>> School of Pharmacy
>>>>>>>>> Health Sciences Facility II, Room 629
>>>>>>>>> University of Maryland, Baltimore
>>>>>>>>> 20 Penn St.
>>>>>>>>> Baltimore, MD 21201
>>>>>>>>>
>>>>>>>>> jalemkul at outerbanks.umaryland.edu | (410) 706-7441
>>>>>>>>> http://mackerell.umaryland.edu/~jalemkul
>>>>>>>>>
>>>>>>>>> ==================================================
>>>>>>>>> --
>>>>>>>>> Gromacs Users mailing list
>>>>>>>>>
>>>>>>>>> * Please search the archive at
>>>>>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>>>>>>> posting!
>>>>>>>>>
>>>>>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>>>>>
>>>>>>>>> * For (un)subscribe requests visit
>>>>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>>>>>>>>> or send a mail to gmx-users-request at gromacs.org.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Gromacs Users mailing list
>>>>>>>>
>>>>>>>> * Please search the archive at
>>>>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>>>>>> posting!
>>>>>>>>
>>>>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>>>>
>>>>>>>> * For (un)subscribe requests visit
>>>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>>>>>>> send a mail to gmx-users-request at gromacs.org.
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> ==================================================
>>>>>>>
>>>>>>> Justin A. Lemkul, Ph.D.
>>>>>>> Ruth L. Kirschstein NRSA Postdoctoral Fellow
>>>>>>>
>>>>>>> Department of Pharmaceutical Sciences
>>>>>>> School of Pharmacy
>>>>>>> Health Sciences Facility II, Room 629
>>>>>>> University of Maryland, Baltimore
>>>>>>> 20 Penn St.
>>>>>>> Baltimore, MD 21201
>>>>>>>
>>>>>>> jalemkul at outerbanks.umaryland.edu | (410) 706-7441
>>>>>>> http://mackerell.umaryland.edu/~jalemkul
>>>>>>>
>>>>>>> ==================================================
>>>>>>> --
>>>>>>> Gromacs Users mailing list
>>>>>>>
>>>>>>> * Please search the archive at
>>>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>>>>> posting!
>>>>>>>
>>>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>>>
>>>>>>> * For (un)subscribe requests visit
>>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>>>>>> send
>>>>>>> a mail to gmx-users-request at gromacs.org.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Gromacs Users mailing list
>>>>>>
>>>>>> * Please search the archive at
>>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>>>> posting!
>>>>>>
>>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>>
>>>>>> * For (un)subscribe requests visit
>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>>>>> send a
>>>>>> mail to gmx-users-request at gromacs.org.
>>>>>
>>>>>
>>>>> --
>>>>> Gromacs Users mailing list
>>>>>
>>>>> * Please search the archive at
>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>>> posting!
>>>>>
>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>
>>>>> * For (un)subscribe requests visit
>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>>>> send
>>>>> a mail to gmx-users-request at gromacs.org.
>>>>>
>>>>
>>>
>>>
>>
>>
>>
>
>
More information about the gromacs.org_gmx-users
mailing list