[gmx-users] GPU low performance
Carmen Di Giovanni
cdigiova at unina.it
Thu Feb 19 11:34:58 CET 2015
Dear Szilárd,
1) the output of command nvidia-smi -ac 2600,758 is
[root at localhost test_gpu]# nvidia-smi -ac 2600,758
Applications clocks set to "(MEM 2600, SM 758)" for GPU 0000:03:00.0
Warning: persistence mode is disabled on this device. This settings
will go back to default as soon as driver unloads (e.g. last
application like nvidia-smi or cuda application terminates). Run with
[--help | -h] switch to get more information on how to enable
persistence mode.
Setting applications clocks is not supported for GPU 0000:82:00.0.
Treating as warning and moving on.
All done.
----------------------------------------------------------------------------
2) I decreased nlists to 20
However when I do the command:
gmx_mpi mdrun -deffnm nvt -ntmpi 8 -gpu_id 00001111
give me a fatal error:
GROMACS: gmx mdrun, VERSION 5.0
Executable: /opt/SW/gromacs-5.0/build/mpi-cuda/bin/gmx_mpi
Library dir: /opt/SW/gromacs-5.0/share/top
Command line:
gmx_mpi mdrun -deffnm nvt -ntmpi 8 -gpu_id 00001111
Back Off! I just backed up nvt.log to ./#nvt.log.8#
Reading file nvt.tpr, VERSION 5.0 (single precision)
Changing nstlist from 10 to 40, rlist from 1 to 1.097
-------------------------------------------------------
Program gmx_mpi, VERSION 5.0
Source code file: /opt/SW/gromacs-5.0/src/programs/mdrun/runner.c, line: 876
Fatal error:
Setting the number of thread-MPI threads is only supported with
thread-MPI and Gromacs was compiled without thread-MPI
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------
Halting program gmx_mpi
gcq#223: "Jesus Not Only Saves, He Also Frequently Makes Backups."
(Myron Bradshaw)
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode -1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
-------------------------------------------------------------------------
3) I don't understand as I can reduce the "Rest" time
Carmen
--
Carmen Di Giovanni, PhD
Dept. of Pharmaceutical and Toxicological Chemistry
"Drug Discovery Lab"
University of Naples "Federico II"
Via D. Montesano, 49
80131 Naples
Tel.: ++39 081 678623
Fax: ++39 081 678100
Email: cdigiova at unina.it
----- Original Message -----
From: "Szilárd Páll" <pall.szilard at gmail.com>
To: "Discussion list for GROMACS users" <gmx-users at gromacs.org>; "Carmen Di
Giovanni" <cdigiova at unina.it>
Sent: Wednesday, February 18, 2015 6:38 PM
Subject: Re: [gmx-users] GPU low performance
Please keep the mails on the list.
On Wed, Feb 18, 2015 at 6:32 PM, Carmen Di Giovanni <cdigiova at unina.it>
wrote:
> nvidia-smi -q -g 0
>
> ==============NVSMI LOG==============
>
> Timestamp : Wed Feb 18 18:30:01 2015
> Driver Version : 340.24
>
> Attached GPUs : 2
> GPU 0000:03:00.0
> Product Name : Tesla K20c
[...
> Clocks
> Graphics : 705 MHz
> SM : 705 MHz
> Memory : 2600 MHz
> Applications Clocks
> Graphics : 705 MHz
> Memory : 2600 MHz
> Default Applications Clocks
> Graphics : 705 MHz
> Memory : 2600 MHz
> Max Clocks
> Graphics : 758 MHz
> SM : 758 MHz
> Memory : 2600 MHz
This is the relevant part I was looking for. The Tesla K20c supports
setting a so-called application clock which is essentially means that
you can bump its clock frequency using the NVDIA management tool
nvidia-smi from the default 705 MHz to 758 MHz.
Use the command:
nvidia-smi -ac 2600,758
This should give you another 7% or so (I didn't remember the correct
max clock before, that's why I guessing 5%).
Cheers,
Szilard
> Clock Policy
> Auto Boost : N/A
> Auto Boost Default : N/A
> Compute Processes
> Process ID : 19441
> Name : gmx_mpi
> Used GPU Memory : 110 MiB
>
> [carmendigi at localhost test_gpu]$
>
>
>
>
>
>
>
> --
> Carmen Di Giovanni, PhD
> Dept. of Pharmaceutical and Toxicological Chemistry
> "Drug Discovery Lab"
> University of Naples "Federico II"
> Via D. Montesano, 49
> 80131 Naples
> Tel.: ++39 081 678623
> Fax: ++39 081 678100
> Email: cdigiova at unina.it
>
>
>
> Quoting Szilárd Páll <pall.szilard at gmail.com>:
>
>> As I suggested above please use pastebin.com or similar!
>> --
>> Szilárd
>>
>>
>> On Wed, Feb 18, 2015 at 6:09 PM, Carmen Di Giovanni <cdigiova at unina.it>
>> wrote:
>>>
>>> Dear Szilàrd, it's not possible attach the full log file in the forum
>>> mail
>>> because it is too big.
>>> I send it by your private mail address.
>>> Thank you in advance
>>> Carmen
>>>
>>>
>>> --
>>> Carmen Di Giovanni, PhD
>>> Dept. of Pharmaceutical and Toxicological Chemistry
>>> "Drug Discovery Lab"
>>> University of Naples "Federico II"
>>> Via D. Montesano, 49
>>> 80131 Naples
>>> Tel.: ++39 081 678623
>>> Fax: ++39 081 678100
>>> Email: cdigiova at unina.it
>>>
>>>
>>>
>>> Quoting Szilárd Páll <pall.szilard at gmail.com>:
>>>
>>>> We need a *full* log file, not parts of it!
>>>>
>>>> You can try running with "-ntomp 16 -pin on" - it may be a bit faster
>>>> not not use HyperThreading.
>>>> --
>>>> Szilárd
>>>>
>>>>
>>>> On Wed, Feb 18, 2015 at 5:20 PM, Carmen Di Giovanni <cdigiova at unina.it>
>>>> wrote:
>>>>>
>>>>>
>>>>> Justin,
>>>>> the problem is evident for all calculations.
>>>>> This is the log file of a recent run:
>>>>>
>>>>>
>>>>>
>>>>> --------------------------------------------------------------------------------
>>>>>
>>>>> Log file opened on Mon Dec 22 16:28:00 2014
>>>>> Host: localhost.localdomain pid: 8378 rank ID: 0 number of ranks:
>>>>> 1
>>>>> GROMACS: gmx mdrun, VERSION 5.0
>>>>>
>>>>> GROMACS is written by:
>>>>> Emile Apol Rossen Apostolov Herman J.C. Berendsen Par
>>>>> Bjelkmar
>>>>> Aldert van Buuren Rudi van Drunen Anton Feenstra Sebastian
>>>>> Fritsch
>>>>> Gerrit Groenhof Christoph Junghans Peter Kasson Carsten
>>>>> Kutzner
>>>>> Per Larsson Justin A. Lemkul Magnus Lundborg Pieter
>>>>> Meulenhoff
>>>>> Erik Marklund Teemu Murtola Szilard Pall Sander Pronk
>>>>> Roland Schulz Alexey Shvetsov Michael Shirts Alfons
>>>>> Sijbers
>>>>> Peter Tieleman Christian Wennberg Maarten Wolf
>>>>> and the project leaders:
>>>>> Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel
>>>>>
>>>>> Copyright (c) 1991-2000, University of Groningen, The Netherlands.
>>>>> Copyright (c) 2001-2014, The GROMACS development team at
>>>>> Uppsala University, Stockholm University and
>>>>> the Royal Institute of Technology, Sweden.
>>>>> check out http://www.gromacs.org for more information.
>>>>>
>>>>> GROMACS is free software; you can redistribute it and/or modify it
>>>>> under the terms of the GNU Lesser General Public License
>>>>> as published by the Free Software Foundation; either version 2.1
>>>>> of the License, or (at your option) any later version.
>>>>>
>>>>> GROMACS: gmx mdrun, VERSION 5.0
>>>>> Executable: /opt/SW/gromacs-5.0/build/mpi-cuda/bin/gmx_mpi
>>>>> Library dir: /opt/SW/gromacs-5.0/share/top
>>>>> Command line:
>>>>> gmx_mpi mdrun -deffnm prod_20ns
>>>>>
>>>>> Gromacs version: VERSION 5.0
>>>>> Precision: single
>>>>> Memory model: 64 bit
>>>>> MPI library: MPI
>>>>> OpenMP support: enabled
>>>>> GPU support: enabled
>>>>> invsqrt routine: gmx_software_invsqrt(x)
>>>>> SIMD instructions: AVX_256
>>>>> FFT library: fftw-3.3.3-sse2
>>>>> RDTSCP usage: enabled
>>>>> C++11 compilation: disabled
>>>>> TNG support: enabled
>>>>> Tracing support: disabled
>>>>> Built on: Thu Jul 31 18:30:37 CEST 2014
>>>>> Built by: root at localhost.localdomain [CMAKE]
>>>>> Build OS/arch: Linux 2.6.32-431.el6.x86_64 x86_64
>>>>> Build CPU vendor: GenuineIntel
>>>>> Build CPU brand: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
>>>>> Build CPU family: 6 Model: 62 Stepping: 4
>>>>> Build CPU features: aes apic avx clfsh cmov cx8 cx16 f16c htt lahf_lm
>>>>> mmx
>>>>> msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp
>>>>> sse2
>>>>> sse3
>>>>> sse4.1 sse4.2 ssse3 tdt x2apic
>>>>> C compiler: /usr/bin/cc GNU 4.4.7
>>>>> C compiler flags: -mavx -Wno-maybe-uninitialized -Wextra
>>>>> -Wno-missing-field-initializers -Wno-sign-compare -Wpointer-arith -Wall
>>>>> -Wno-unused -Wunused-value -Wunused-parameter -fomit-frame-pointer
>>>>> -funroll-all-loops -Wno-array-bounds -O3 -DNDEBUG
>>>>> C++ compiler: /usr/bin/c++ GNU 4.4.7
>>>>> C++ compiler flags: -mavx -Wextra -Wno-missing-field-initializers
>>>>> -Wpointer-arith -Wall -Wno-unused-function -fomit-frame-pointer
>>>>> -funroll-all-loops -Wno-array-bounds -O3 -DNDEBUG
>>>>> Boost version: 1.55.0 (internal)
>>>>> CUDA compiler: /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda
>>>>> compiler
>>>>> driver;Copyright (c) 2005-2013 NVIDIA Corporation;Built on
>>>>> Thu_Mar_13_11:58:58_PDT_2014;Cuda compilation tools, release 6.0,
>>>>> V6.0.1
>>>>> CUDA compiler
>>>>>
>>>>>
>>>>> flags:-gencode;arch=compute_20,code=sm_20;-gencode;arch=compute_20,code=sm_21;-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_35,code=compute_35;-use_fast_math;-Xcompiler;-fPIC
>>>>> ;
>>>>>
>>>>>
>>>>> ;-mavx;-Wextra;-Wno-missing-field-initializers;-Wpointer-arith;-Wall;-Wno-unused-function;-fomit-frame-pointer;-funroll-all-loops;-Wno-array-bounds;-O3;-DNDEBUG
>>>>> CUDA driver: 6.50
>>>>> CUDA runtime: 6.0
>>>>>
>>>>>
>>>>>
>>>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>>>> B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
>>>>> GROMACS 4: Algorithms for highly efficient, load-balanced, and
>>>>> scalable
>>>>> molecular simulation
>>>>> J. Chem. Theory Comput. 4 (2008) pp. 435-447
>>>>> -------- -------- --- Thank You --- -------- --------
>>>>>
>>>>>
>>>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>>>> D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H.
>>>>> J.
>>>>> C.
>>>>> Berendsen
>>>>> GROMACS: Fast, Flexible and Free
>>>>> J. Comp. Chem. 26 (2005) pp. 1701-1719
>>>>> -------- -------- --- Thank You --- -------- --------
>>>>>
>>>>>
>>>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>>>> E. Lindahl and B. Hess and D. van der Spoel
>>>>> GROMACS 3.0: A package for molecular simulation and trajectory
>>>>> analysis
>>>>> J. Mol. Mod. 7 (2001) pp. 306-317
>>>>> -------- -------- --- Thank You --- -------- --------
>>>>>
>>>>>
>>>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>>>> H. J. C. Berendsen, D. van der Spoel and R. van Drunen
>>>>> GROMACS: A message-passing parallel molecular dynamics implementation
>>>>> Comp. Phys. Comm. 91 (1995) pp. 43-56
>>>>> -------- -------- --- Thank You --- -------- --------
>>>>>
>>>>>
>>>>> For optimal performance with a GPU nstlist (now 10) should be larger.
>>>>> The optimum depends on your CPU and GPU resources.
>>>>> You might want to try several nstlist values.
>>>>> Changing nstlist from 10 to 40, rlist from 1.2 to 1.285
>>>>>
>>>>> Input Parameters:
>>>>> integrator = md
>>>>> tinit = 0
>>>>> dt = 0.002
>>>>> nsteps = 10000000
>>>>> init-step = 0
>>>>> simulation-part = 1
>>>>> comm-mode = Linear
>>>>> nstcomm = 1
>>>>> bd-fric = 0
>>>>> ld-seed = 1993
>>>>> emtol = 10
>>>>> emstep = 0.01
>>>>> niter = 20
>>>>> fcstep = 0
>>>>> nstcgsteep = 1000
>>>>> nbfgscorr = 10
>>>>> rtpi = 0.05
>>>>> nstxout = 2500
>>>>> nstvout = 2500
>>>>> nstfout = 0
>>>>> nstlog = 2500
>>>>> nstcalcenergy = 1
>>>>> nstenergy = 2500
>>>>> nstxout-compressed = 500
>>>>> compressed-x-precision = 1000
>>>>> cutoff-scheme = Verlet
>>>>> nstlist = 40
>>>>> ns-type = Grid
>>>>> pbc = xyz
>>>>> periodic-molecules = FALSE
>>>>> verlet-buffer-tolerance = 0.005
>>>>> rlist = 1.285
>>>>> rlistlong = 1.285
>>>>> nstcalclr = 10
>>>>> coulombtype = PME
>>>>> coulomb-modifier = Potential-shift
>>>>> rcoulomb-switch = 0
>>>>> rcoulomb = 1.2
>>>>> epsilon-r = 1
>>>>> epsilon-rf = 1
>>>>> vdw-type = Cut-off
>>>>> vdw-modifier = Potential-shift
>>>>> rvdw-switch = 0
>>>>> rvdw = 1.2
>>>>> DispCorr = No
>>>>> table-extension = 1
>>>>> fourierspacing = 0.135
>>>>> fourier-nx = 128
>>>>> fourier-ny = 128
>>>>> fourier-nz = 128
>>>>> pme-order = 4
>>>>> ewald-rtol = 1e-05
>>>>> ewald-rtol-lj = 0.001
>>>>> lj-pme-comb-rule = Geometric
>>>>> ewald-geometry = 0
>>>>> epsilon-surface = 0
>>>>> implicit-solvent = No
>>>>> gb-algorithm = Still
>>>>> nstgbradii = 1
>>>>> rgbradii = 2
>>>>> gb-epsilon-solvent = 80
>>>>> gb-saltconc = 0
>>>>> gb-obc-alpha = 1
>>>>> gb-obc-beta = 0.8
>>>>> gb-obc-gamma = 4.85
>>>>> gb-dielectric-offset = 0.009
>>>>> sa-algorithm = Ace-approximation
>>>>> sa-surface-tension = 2.092
>>>>> tcoupl = V-rescale
>>>>> nsttcouple = 10
>>>>> nh-chain-length = 0
>>>>> print-nose-hoover-chain-variables = FALSE
>>>>> pcoupl = No
>>>>> pcoupltype = Semiisotropic
>>>>> nstpcouple = -1
>>>>> tau-p = 0.5
>>>>> compressibility (3x3):
>>>>> compressibility[ 0]={ 0.00000e+00, 0.00000e+00,
>>>>> 0.00000e+00}
>>>>> compressibility[ 1]={ 0.00000e+00, 0.00000e+00,
>>>>> 0.00000e+00}
>>>>> compressibility[ 2]={ 0.00000e+00, 0.00000e+00,
>>>>> 0.00000e+00}
>>>>> ref-p (3x3):
>>>>> ref-p[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>>>>> ref-p[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>>>>> ref-p[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>>>>> refcoord-scaling = No
>>>>> posres-com (3):
>>>>> posres-com[0]= 0.00000e+00
>>>>> posres-com[1]= 0.00000e+00
>>>>> posres-com[2]= 0.00000e+00
>>>>> posres-comB (3):
>>>>> posres-comB[0]= 0.00000e+00
>>>>> posres-comB[1]= 0.00000e+00
>>>>> posres-comB[2]= 0.00000e+00
>>>>> QMMM = FALSE
>>>>> QMconstraints = 0
>>>>> QMMMscheme = 0
>>>>> MMChargeScaleFactor = 1
>>>>> qm-opts:
>>>>> ngQM = 0
>>>>> constraint-algorithm = Lincs
>>>>> continuation = FALSE
>>>>> Shake-SOR = FALSE
>>>>> shake-tol = 0.0001
>>>>> lincs-order = 4
>>>>> lincs-iter = 1
>>>>> lincs-warnangle = 30
>>>>> nwall = 0
>>>>> wall-type = 9-3
>>>>> wall-r-linpot = -1
>>>>> wall-atomtype[0] = -1
>>>>> wall-atomtype[1] = -1
>>>>> wall-density[0] = 0
>>>>> wall-density[1] = 0
>>>>> wall-ewald-zfac = 3
>>>>> pull = no
>>>>> rotation = FALSE
>>>>> interactiveMD = FALSE
>>>>> disre = No
>>>>> disre-weighting = Conservative
>>>>> disre-mixed = FALSE
>>>>> dr-fc = 1000
>>>>> dr-tau = 0
>>>>> nstdisreout = 100
>>>>> orire-fc = 0
>>>>> orire-tau = 0
>>>>> nstorireout = 100
>>>>> free-energy = no
>>>>> cos-acceleration = 0
>>>>> deform (3x3):
>>>>> deform[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>>>>> deform[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>>>>> deform[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>>>>> simulated-tempering = FALSE
>>>>> E-x:
>>>>> n = 0
>>>>> E-xt:
>>>>> n = 0
>>>>> E-y:
>>>>> n = 0
>>>>> E-yt:
>>>>> n = 0
>>>>> E-z:
>>>>> n = 0
>>>>> E-zt:
>>>>> n = 0
>>>>> swapcoords = no
>>>>> adress = FALSE
>>>>> userint1 = 0
>>>>> userint2 = 0
>>>>> userint3 = 0
>>>>> userint4 = 0
>>>>> userreal1 = 0
>>>>> userreal2 = 0
>>>>> userreal3 = 0
>>>>> userreal4 = 0
>>>>> grpopts:
>>>>> nrdf: 869226
>>>>> ref-t: 300
>>>>> tau-t: 0.1
>>>>> annealing: No
>>>>> annealing-npoints: 0
>>>>> acc: 0 0 0
>>>>> nfreeze: N N N
>>>>> energygrp-flags[ 0]: 0
>>>>> Using 1 MPI process
>>>>> Using 32 OpenMP threads
>>>>>
>>>>> Detecting CPU SIMD instructions.
>>>>> Present hardware specification:
>>>>> Vendor: GenuineIntel
>>>>> Brand: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
>>>>> Family: 6 Model: 62 Stepping: 4
>>>>> Features: aes apic avx clfsh cmov cx8 cx16 f16c htt lahf_lm mmx msr
>>>>> nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp sse2
>>>>> sse3
>>>>> sse4.1 sse4.2 ssse3 tdt x2apic
>>>>> SIMD instructions most likely to fit this hardware: AVX_256
>>>>> SIMD instructions selected at GROMACS compile time: AVX_256
>>>>>
>>>>>
>>>>> 2 GPUs detected on host localhost.localdomain:
>>>>> #0: NVIDIA Tesla K20c, compute cap.: 3.5, ECC: yes, stat: compatible
>>>>> #1: NVIDIA GeForce GTX 650, compute cap.: 3.0, ECC: no, stat:
>>>>> compatible
>>>>>
>>>>> 1 GPU auto-selected for this run.
>>>>> Mapping of GPU to the 1 PP rank in this node: #0
>>>>>
>>>>>
>>>>> NOTE: potentially sub-optimal launch configuration, gmx_mpi started
>>>>> with
>>>>> less
>>>>> PP MPI process per node than GPUs available.
>>>>> Each PP MPI process can use only one GPU, 1 GPU per node will be
>>>>> used.
>>>>>
>>>>> Will do PME sum in reciprocal space for electrostatic interactions.
>>>>>
>>>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>>>> U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L. G.
>>>>> Pedersen
>>>>> A smooth particle mesh Ewald method
>>>>> J. Chem. Phys. 103 (1995) pp. 8577-8592
>>>>> -------- -------- --- Thank You --- -------- --------
>>>>>
>>>>> Will do ordinary reciprocal space Ewald sum.
>>>>> Using a Gaussian width (1/beta) of 0.384195 nm for Ewald
>>>>> Cut-off's: NS: 1.285 Coulomb: 1.2 LJ: 1.2
>>>>> System total charge: -0.012
>>>>> Generated table with 1142 data points for Ewald.
>>>>> Tabscale = 500 points/nm
>>>>> Generated table with 1142 data points for LJ6.
>>>>> Tabscale = 500 points/nm
>>>>> Generated table with 1142 data points for LJ12.
>>>>> Tabscale = 500 points/nm
>>>>> Generated table with 1142 data points for 1-4 COUL.
>>>>> Tabscale = 500 points/nm
>>>>> Generated table with 1142 data points for 1-4 LJ6.
>>>>> Tabscale = 500 points/nm
>>>>> Generated table with 1142 data points for 1-4 LJ12.
>>>>> Tabscale = 500 points/nm
>>>>>
>>>>> Using CUDA 8x8 non-bonded kernels
>>>>>
>>>>> Potential shift: LJ r^-12: -1.122e-01 r^-6: -3.349e-01, Ewald
>>>>> -1.000e-05
>>>>> Initialized non-bonded Ewald correction tables, spacing: 7.82e-04
>>>>> size:
>>>>> 1536
>>>>>
>>>>> Removing pbc first time
>>>>> Pinning threads with an auto-selected logical core stride of 1
>>>>>
>>>>> Initializing LINear Constraint Solver
>>>>>
>>>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>>>> B. Hess and H. Bekker and H. J. C. Berendsen and J. G. E. M. Fraaije
>>>>> LINCS: A Linear Constraint Solver for molecular simulations
>>>>> J. Comp. Chem. 18 (1997) pp. 1463-1472
>>>>> -------- -------- --- Thank You --- -------- --------
>>>>>
>>>>> The number of constraints is 5913
>>>>>
>>>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>>>> S. Miyamoto and P. A. Kollman
>>>>> SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for
>>>>> Rigid
>>>>> Water Models
>>>>> J. Comp. Chem. 13 (1992) pp. 952-962
>>>>> -------- -------- --- Thank You --- -------- --------
>>>>>
>>>>> Center of mass motion removal mode is Linear
>>>>> We have the following groups for center of mass motion removal:
>>>>> 0: rest
>>>>>
>>>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>>>> G. Bussi, D. Donadio and M. Parrinello
>>>>> Canonical sampling through velocity rescaling
>>>>> J. Chem. Phys. 126 (2007) pp. 014101
>>>>> -------- -------- --- Thank You --- -------- --------
>>>>>
>>>>> There are: 434658 Atoms
>>>>>
>>>>> Constraining the starting coordinates (step 0)
>>>>>
>>>>> Constraining the coordinates at t0-dt (step 0)
>>>>> RMS relative constraint deviation after constraining: 3.67e-05
>>>>> Initial temperature: 300.5 K
>>>>>
>>>>> Started mdrun on rank 0 Mon Dec 22 16:28:01 2014
>>>>> Step Time Lambda
>>>>> 0 0.00000 0.00000
>>>>>
>>>>> Energies (kJ/mol)
>>>>> G96Angle Proper Dih. Improper Dih. LJ-14
>>>>> Coulomb-14
>>>>> 9.74139e+03 4.34956e+03 2.97359e+03 -1.93107e+02
>>>>> 8.05534e+04
>>>>> LJ (SR) Coulomb (SR) Coul. recip. Potential
>>>>> Kinetic
>>>>> En.
>>>>> 1.01340e+06 -7.13271e+06 2.01361e+04 -6.00175e+06
>>>>> 1.09887e+06
>>>>> Total Energy Conserved En. Temperature Pressure (bar) Constr.
>>>>> rmsd
>>>>> -4.90288e+06 -4.90288e+06 3.04092e+02 1.70897e+02
>>>>> 2.16683e-05
>>>>>
>>>>> step 80: timed with pme grid 128 128 128, coulomb cutoff 1.200:
>>>>> 6279.0
>>>>> M-cycles
>>>>> step 160: timed with pme grid 112 112 112, coulomb cutoff 1.306:
>>>>> 6962.2
>>>>> M-cycles
>>>>> step 240: timed with pme grid 100 100 100, coulomb cutoff 1.463:
>>>>> 8406.5
>>>>> M-cycles
>>>>> step 320: timed with pme grid 128 128 128, coulomb cutoff 1.200:
>>>>> 6424.0
>>>>> M-cycles
>>>>> step 400: timed with pme grid 120 120 120, coulomb cutoff 1.219:
>>>>> 6369.1
>>>>> M-cycles
>>>>> step 480: timed with pme grid 112 112 112, coulomb cutoff 1.306:
>>>>> 7309.0
>>>>> M-cycles
>>>>> step 560: timed with pme grid 108 108 108, coulomb cutoff 1.355:
>>>>> 7521.2
>>>>> M-cycles
>>>>> step 640: timed with pme grid 104 104 104, coulomb cutoff 1.407:
>>>>> 8369.8
>>>>> M-cycles
>>>>> optimal pme grid 128 128 128, coulomb cutoff 1.200
>>>>> Step Time Lambda
>>>>> 2500 5.00000 0.00000
>>>>>
>>>>> Energies (kJ/mol)
>>>>> G96Angle Proper Dih. Improper Dih. LJ-14
>>>>> Coulomb-14
>>>>> 9.72545e+03 4.33046e+03 2.98087e+03 -1.95794e+02
>>>>> 8.05967e+04
>>>>> LJ (SR) Coulomb (SR) Coul. recip. Potential
>>>>> Kinetic
>>>>> En.
>>>>> 1.01293e+06 -7.13110e+06 2.01689e+04 -6.00057e+06
>>>>> 1.08489e+06
>>>>> Total Energy Conserved En. Temperature Pressure (bar) Constr.
>>>>> rmsd
>>>>> -4.91567e+06 -4.90300e+06 3.00225e+02 1.36173e+02
>>>>> 2.25998e-05
>>>>>
>>>>> Step Time Lambda
>>>>> 5000 10.00000 0.00000
>>>>>
>>>>> ............
>>>>>
>>>>>
>>>>>
>>>>> -------------------------------------------------------------------------------
>>>>>
>>>>>
>>>>> Thank you in advance
>>>>>
>>>>> --
>>>>> Carmen Di Giovanni, PhD
>>>>> Dept. of Pharmaceutical and Toxicological Chemistry
>>>>> "Drug Discovery Lab"
>>>>> University of Naples "Federico II"
>>>>> Via D. Montesano, 49
>>>>> 80131 Naples
>>>>> Tel.: ++39 081 678623
>>>>> Fax: ++39 081 678100
>>>>> Email: cdigiova at unina.it
>>>>>
>>>>>
>>>>>
>>>>> Quoting Justin Lemkul <jalemkul at vt.edu>:
>>>>>
>>>>>>
>>>>>>
>>>>>> On 2/18/15 11:09 AM, Barnett, James W wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> What's your exact command?
>>>>>>>
>>>>>>
>>>>>> A full .log file would be even better; it would tell us everything we
>>>>>> need
>>>>>> to know :)
>>>>>>
>>>>>> -Justin
>>>>>>
>>>>>>> Have you reviewed this page:
>>>>>>> http://www.gromacs.org/Documentation/Acceleration_and_parallelization
>>>>>>>
>>>>>>> James "Wes" Barnett
>>>>>>> Ph.D. Candidate
>>>>>>> Chemical and Biomolecular Engineering
>>>>>>>
>>>>>>> Tulane University
>>>>>>> Boggs Center for Energy and Biotechnology, Room 341-B
>>>>>>>
>>>>>>> ________________________________________
>>>>>>> From: gromacs.org_gmx-users-bounces at maillist.sys.kth.se
>>>>>>> <gromacs.org_gmx-users-bounces at maillist.sys.kth.se> on behalf of
>>>>>>> Carmen
>>>>>>> Di
>>>>>>> Giovanni <cdigiova at unina.it>
>>>>>>> Sent: Wednesday, February 18, 2015 10:06 AM
>>>>>>> To: gromacs.org_gmx-users at maillist.sys.kth.se
>>>>>>> Subject: Re: [gmx-users] GPU low performance
>>>>>>>
>>>>>>> I post the message of a md run :
>>>>>>>
>>>>>>>
>>>>>>> Force evaluation time GPU/CPU: 40.974 ms/24.437 ms = 1.677
>>>>>>> For optimal performance this ratio should be close to 1!
>>>>>>>
>>>>>>>
>>>>>>> NOTE: The GPU has >20% more load than the CPU. This imbalance causes
>>>>>>> performance loss, consider using a shorter cut-off and a
>>>>>>> finer
>>>>>>> PME
>>>>>>> grid.
>>>>>>>
>>>>>>> As can I solved this problem ?
>>>>>>> Thank you in advance
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Carmen Di Giovanni, PhD
>>>>>>> Dept. of Pharmaceutical and Toxicological Chemistry
>>>>>>> "Drug Discovery Lab"
>>>>>>> University of Naples "Federico II"
>>>>>>> Via D. Montesano, 49
>>>>>>> 80131 Naples
>>>>>>> Tel.: ++39 081 678623
>>>>>>> Fax: ++39 081 678100
>>>>>>> Email: cdigiova at unina.it
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Quoting Justin Lemkul <jalemkul at vt.edu>:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2/18/15 10:30 AM, Carmen Di Giovanni wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Daear all,
>>>>>>>>> I'm working on a machine with an INVIDIA Teska K20.
>>>>>>>>> After a minimization on a protein of 1925 atoms this is the
>>>>>>>>> mesage:
>>>>>>>>>
>>>>>>>>> Force evaluation time GPU/CPU: 2.923 ms/116.774 ms = 0.025
>>>>>>>>> For optimal performance this ratio should be close to 1!
>>>>>>>>>
>>>>>>>>
>>>>>>>> Minimization is a poor indicator of performance. Do a real MD run.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> NOTE: The GPU has >25% less load than the CPU. This imbalance
>>>>>>>>> causes
>>>>>>>>> performance loss.
>>>>>>>>>
>>>>>>>>> Core t (s) Wall t (s) (%)
>>>>>>>>> Time: 3289.010 205.891 1597.4
>>>>>>>>> (steps/hour)
>>>>>>>>> Performance: 8480.2
>>>>>>>>> Finished mdrun on rank 0 Wed Feb 18 15:50:06 2015
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Cai I improve the performance?
>>>>>>>>> At the moment in the forum I didn't full informations to solve
>>>>>>>>> this
>>>>>>>>> problem.
>>>>>>>>> In attachment there is the log. file
>>>>>>>>>
>>>>>>>>
>>>>>>>> The list does not accept attachments. If you wish to share a file,
>>>>>>>> upload it to a file-sharing service and provide a URL. The full
>>>>>>>> .log is quite important for understanding your hardware,
>>>>>>>> optimizations, and seeing full details of the performance
>>>>>>>> breakdown.
>>>>>>>> But again, base your assessment on MD, not EM.
>>>>>>>>
>>>>>>>> -Justin
>>>>>>>>
>>>>>>>> --
>>>>>>>> ==================================================
>>>>>>>>
>>>>>>>> Justin A. Lemkul, Ph.D.
>>>>>>>> Ruth L. Kirschstein NRSA Postdoctoral Fellow
>>>>>>>>
>>>>>>>> Department of Pharmaceutical Sciences
>>>>>>>> School of Pharmacy
>>>>>>>> Health Sciences Facility II, Room 629
>>>>>>>> University of Maryland, Baltimore
>>>>>>>> 20 Penn St.
>>>>>>>> Baltimore, MD 21201
>>>>>>>>
>>>>>>>> jalemkul at outerbanks.umaryland.edu | (410) 706-7441
>>>>>>>> http://mackerell.umaryland.edu/~jalemkul
>>>>>>>>
>>>>>>>> ==================================================
>>>>>>>> --
>>>>>>>> Gromacs Users mailing list
>>>>>>>>
>>>>>>>> * Please search the archive at
>>>>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>>>>>> posting!
>>>>>>>>
>>>>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>>>>
>>>>>>>> * For (un)subscribe requests visit
>>>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>>>>>>>> or send a mail to gmx-users-request at gromacs.org.
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Gromacs Users mailing list
>>>>>>>
>>>>>>> * Please search the archive at
>>>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>>>>> posting!
>>>>>>>
>>>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>>>
>>>>>>> * For (un)subscribe requests visit
>>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>>>>>>> or
>>>>>>> send a mail to gmx-users-request at gromacs.org.
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> ==================================================
>>>>>>
>>>>>> Justin A. Lemkul, Ph.D.
>>>>>> Ruth L. Kirschstein NRSA Postdoctoral Fellow
>>>>>>
>>>>>> Department of Pharmaceutical Sciences
>>>>>> School of Pharmacy
>>>>>> Health Sciences Facility II, Room 629
>>>>>> University of Maryland, Baltimore
>>>>>> 20 Penn St.
>>>>>> Baltimore, MD 21201
>>>>>>
>>>>>> jalemkul at outerbanks.umaryland.edu | (410) 706-7441
>>>>>> http://mackerell.umaryland.edu/~jalemkul
>>>>>>
>>>>>> ==================================================
>>>>>> --
>>>>>> Gromacs Users mailing list
>>>>>>
>>>>>> * Please search the archive at
>>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>>>> posting!
>>>>>>
>>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>>
>>>>>> * For (un)subscribe requests visit
>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>>>>> send
>>>>>> a mail to gmx-users-request at gromacs.org.
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Gromacs Users mailing list
>>>>>
>>>>> * Please search the archive at
>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>>> posting!
>>>>>
>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>
>>>>> * For (un)subscribe requests visit
>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>>>> send a
>>>>> mail to gmx-users-request at gromacs.org.
>>>>
>>>>
>>>> --
>>>> Gromacs Users mailing list
>>>>
>>>> * Please search the archive at
>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>> posting!
>>>>
>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>
>>>> * For (un)subscribe requests visit
>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>>> send
>>>> a mail to gmx-users-request at gromacs.org.
>>>>
>>>
>>
>>
>
>
>
More information about the gromacs.org_gmx-users
mailing list