[gmx-users] GPU low performance
Carmen Di Giovanni
cdigiova at unina.it
Thu Feb 19 18:44:55 CET 2015
Szilard,
about:
Fatal error
1) Setting the number of thread-MPI threads is only supported with thread-MPI
and Gromacs was compiled without thread-MPI
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------
The error is quite clearly explains that you're trying to use mdrun's
built-in thread-MPI parallelization, but you have a binary that does
not support it. Use the MPI launching syntax instead.
Can you help me about the MPI launching syntax? What is the suitable
command ?
2) Have you looked at the the performance table at the end of the log?
You are wasting a large amount of runtime calculating energies every
step and this overhead comes in multiple places in the code - one of
them being the non-timed code parts which typically take <3%.
As can I reduce runtime to calculate the energies every step?
I must to modify something in mdp file ?
Thank you in advance
Carmen
--
Carmen Di Giovanni, PhD
Dept. of Pharmaceutical and Toxicological Chemistry
"Drug Discovery Lab"
University of Naples "Federico II"
Via D. Montesano, 49
80131 Naples
Tel.: ++39 081 678623
Fax: ++39 081 678100
Email: cdigiova at unina.it
Quoting Szilárd Páll <pall.szilard at gmail.com>:
> On Thu, Feb 19, 2015 at 11:32 AM, Carmen Di Giovanni
> <cdigiova at unina.it> wrote:
>> Dear Szilárd,
>>
>> 1) the output of command nvidia-smi -ac 2600,758 is
>>
>> [root at localhost test_gpu]# nvidia-smi -ac 2600,758
>> Applications clocks set to "(MEM 2600, SM 758)" for GPU 0000:03:00.0
>>
>> Warning: persistence mode is disabled on this device. This settings will go
>> back to default as soon as driver unloads (e.g. last application like
>> nvidia-smi or cuda application terminates). Run with [--help | -h] switch to
>> get more information on how to enable persistence mode.
>
> run nvidia-smi -pm 1 if you want to avoid that.
>
>> Setting applications clocks is not supported for GPU 0000:82:00.0.
>> Treating as warning and moving on.
>> All done.
>> ----------------------------------------------------------------------------
>> 2) I decreased nlists to 20
>> However when I do the command:
>> gmx_mpi mdrun -deffnm nvt -ntmpi 8 -gpu_id 00001111
>> give me a fatal error:
>>
>> GROMACS: gmx mdrun, VERSION 5.0
>> Executable: /opt/SW/gromacs-5.0/build/mpi-cuda/bin/gmx_mpi
>> Library dir: /opt/SW/gromacs-5.0/share/top
>> Command line:
>> gmx_mpi mdrun -deffnm nvt -ntmpi 8 -gpu_id 00001111
>>
>>
>> Back Off! I just backed up nvt.log to ./#nvt.log.8#
>> Reading file nvt.tpr, VERSION 5.0 (single precision)
>> Changing nstlist from 10 to 40, rlist from 1 to 1.097
>>
>>
>> -------------------------------------------------------
>> Program gmx_mpi, VERSION 5.0
>> Source code file: /opt/SW/gromacs-5.0/src/programs/mdrun/runner.c, line: 876
>>
>> Fatal error:
>> Setting the number of thread-MPI threads is only supported with thread-MPI
>> and Gromacs was compiled without thread-MPI
>> For more information and tips for troubleshooting, please check the GROMACS
>> website at http://www.gromacs.org/Documentation/Errors
>> -------------------------------------------------------
>
> The error is quite clearly explains that you're trying to use mdrun's
> built-in thread-MPI parallelization, but you have a binary that does
> not support it. Use the MPI launching syntax instead.
>
>> Halting program gmx_mpi
>>
>> gcq#223: "Jesus Not Only Saves, He Also Frequently Makes Backups." (Myron
>> Bradshaw)
>>
>> --------------------------------------------------------------------------
>> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
>> with errorcode -1.
>>
>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>> You may or may not see output from other processes, depending on
>> exactly when Open MPI kills them.
>> -------------------------------------------------------------------------
>>
>>
>> 4) I don't understand as I can reduce the "Rest" time
>
> Have you looked at the the performance table at the end of the log?
> You are wasting a large amount of runtime calculating energies every
> step and this overhead comes in multiple places in the code - one of
> them being the non-timed code parts which typically take <3%.
>
> Cheers,
> --
> Szilard
>
>
>>
>> Carmen
>>
>>
>>
>> --
>> Carmen Di Giovanni, PhD
>> Dept. of Pharmaceutical and Toxicological Chemistry
>> "Drug Discovery Lab"
>> University of Naples "Federico II"
>> Via D. Montesano, 49
>> 80131 Naples
>> Tel.: ++39 081 678623
>> Fax: ++39 081 678100
>> Email: cdigiova at unina.it
>>
>>
>>
>> Quoting Szilárd Páll <pall.szilard at gmail.com>:
>>
>>> Please keep the mails on the list.
>>>
>>> On Wed, Feb 18, 2015 at 6:32 PM, Carmen Di Giovanni <cdigiova at unina.it>
>>> wrote:
>>>>
>>>> nvidia-smi -q -g 0
>>>>
>>>> ==============NVSMI LOG==============
>>>>
>>>> Timestamp : Wed Feb 18 18:30:01 2015
>>>> Driver Version : 340.24
>>>>
>>>> Attached GPUs : 2
>>>> GPU 0000:03:00.0
>>>> Product Name : Tesla K20c
>>>
>>> [...
>>>>
>>>> Clocks
>>>> Graphics : 705 MHz
>>>> SM : 705 MHz
>>>> Memory : 2600 MHz
>>>> Applications Clocks
>>>> Graphics : 705 MHz
>>>> Memory : 2600 MHz
>>>> Default Applications Clocks
>>>> Graphics : 705 MHz
>>>> Memory : 2600 MHz
>>>> Max Clocks
>>>> Graphics : 758 MHz
>>>> SM : 758 MHz
>>>> Memory : 2600 MHz
>>>
>>>
>>> This is the relevant part I was looking for. The Tesla K20c supports
>>> setting a so-called application clock which is essentially means that
>>> you can bump its clock frequency using the NVDIA management tool
>>> nvidia-smi from the default 705 MHz to 758 MHz.
>>>
>>> Use the command:
>>> nvidia-smi -ac 2600,758
>>>
>>> This should give you another 7% or so (I didn't remember the correct
>>> max clock before, that's why I guessing 5%).
>>>
>>> Cheers,
>>> Szilard
>>>
>>>> Clock Policy
>>>> Auto Boost : N/A
>>>> Auto Boost Default : N/A
>>>> Compute Processes
>>>> Process ID : 19441
>>>> Name : gmx_mpi
>>>> Used GPU Memory : 110 MiB
>>>>
>>>> [carmendigi at localhost test_gpu]$
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Carmen Di Giovanni, PhD
>>>> Dept. of Pharmaceutical and Toxicological Chemistry
>>>> "Drug Discovery Lab"
>>>> University of Naples "Federico II"
>>>> Via D. Montesano, 49
>>>> 80131 Naples
>>>> Tel.: ++39 081 678623
>>>> Fax: ++39 081 678100
>>>> Email: cdigiova at unina.it
>>>>
>>>>
>>>>
>>>> Quoting Szilárd Páll <pall.szilard at gmail.com>:
>>>>
>>>>> As I suggested above please use pastebin.com or similar!
>>>>> --
>>>>> Szilárd
>>>>>
>>>>>
>>>>> On Wed, Feb 18, 2015 at 6:09 PM, Carmen Di Giovanni <cdigiova at unina.it>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> Dear Szilàrd, it's not possible attach the full log file in the forum
>>>>>> mail
>>>>>> because it is too big.
>>>>>> I send it by your private mail address.
>>>>>> Thank you in advance
>>>>>> Carmen
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Carmen Di Giovanni, PhD
>>>>>> Dept. of Pharmaceutical and Toxicological Chemistry
>>>>>> "Drug Discovery Lab"
>>>>>> University of Naples "Federico II"
>>>>>> Via D. Montesano, 49
>>>>>> 80131 Naples
>>>>>> Tel.: ++39 081 678623
>>>>>> Fax: ++39 081 678100
>>>>>> Email: cdigiova at unina.it
>>>>>>
>>>>>>
>>>>>>
>>>>>> Quoting Szilárd Páll <pall.szilard at gmail.com>:
>>>>>>
>>>>>>> We need a *full* log file, not parts of it!
>>>>>>>
>>>>>>> You can try running with "-ntomp 16 -pin on" - it may be a bit faster
>>>>>>> not not use HyperThreading.
>>>>>>> --
>>>>>>> Szilárd
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Feb 18, 2015 at 5:20 PM, Carmen Di Giovanni
>>>>>>> <cdigiova at unina.it>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Justin,
>>>>>>>> the problem is evident for all calculations.
>>>>>>>> This is the log file of a recent run:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------------------
>>>>>>>>
>>>>>>>> Log file opened on Mon Dec 22 16:28:00 2014
>>>>>>>> Host: localhost.localdomain pid: 8378 rank ID: 0 number of ranks:
>>>>>>>> 1
>>>>>>>> GROMACS: gmx mdrun, VERSION 5.0
>>>>>>>>
>>>>>>>> GROMACS is written by:
>>>>>>>> Emile Apol Rossen Apostolov Herman J.C. Berendsen Par
>>>>>>>> Bjelkmar
>>>>>>>> Aldert van Buuren Rudi van Drunen Anton Feenstra Sebastian
>>>>>>>> Fritsch
>>>>>>>> Gerrit Groenhof Christoph Junghans Peter Kasson Carsten
>>>>>>>> Kutzner
>>>>>>>> Per Larsson Justin A. Lemkul Magnus Lundborg Pieter
>>>>>>>> Meulenhoff
>>>>>>>> Erik Marklund Teemu Murtola Szilard Pall Sander Pronk
>>>>>>>> Roland Schulz Alexey Shvetsov Michael Shirts Alfons
>>>>>>>> Sijbers
>>>>>>>> Peter Tieleman Christian Wennberg Maarten Wolf
>>>>>>>> and the project leaders:
>>>>>>>> Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel
>>>>>>>>
>>>>>>>> Copyright (c) 1991-2000, University of Groningen, The Netherlands.
>>>>>>>> Copyright (c) 2001-2014, The GROMACS development team at
>>>>>>>> Uppsala University, Stockholm University and
>>>>>>>> the Royal Institute of Technology, Sweden.
>>>>>>>> check out http://www.gromacs.org for more information.
>>>>>>>>
>>>>>>>> GROMACS is free software; you can redistribute it and/or modify it
>>>>>>>> under the terms of the GNU Lesser General Public License
>>>>>>>> as published by the Free Software Foundation; either version 2.1
>>>>>>>> of the License, or (at your option) any later version.
>>>>>>>>
>>>>>>>> GROMACS: gmx mdrun, VERSION 5.0
>>>>>>>> Executable: /opt/SW/gromacs-5.0/build/mpi-cuda/bin/gmx_mpi
>>>>>>>> Library dir: /opt/SW/gromacs-5.0/share/top
>>>>>>>> Command line:
>>>>>>>> gmx_mpi mdrun -deffnm prod_20ns
>>>>>>>>
>>>>>>>> Gromacs version: VERSION 5.0
>>>>>>>> Precision: single
>>>>>>>> Memory model: 64 bit
>>>>>>>> MPI library: MPI
>>>>>>>> OpenMP support: enabled
>>>>>>>> GPU support: enabled
>>>>>>>> invsqrt routine: gmx_software_invsqrt(x)
>>>>>>>> SIMD instructions: AVX_256
>>>>>>>> FFT library: fftw-3.3.3-sse2
>>>>>>>> RDTSCP usage: enabled
>>>>>>>> C++11 compilation: disabled
>>>>>>>> TNG support: enabled
>>>>>>>> Tracing support: disabled
>>>>>>>> Built on: Thu Jul 31 18:30:37 CEST 2014
>>>>>>>> Built by: root at localhost.localdomain [CMAKE]
>>>>>>>> Build OS/arch: Linux 2.6.32-431.el6.x86_64 x86_64
>>>>>>>> Build CPU vendor: GenuineIntel
>>>>>>>> Build CPU brand: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
>>>>>>>> Build CPU family: 6 Model: 62 Stepping: 4
>>>>>>>> Build CPU features: aes apic avx clfsh cmov cx8 cx16 f16c htt lahf_lm
>>>>>>>> mmx
>>>>>>>> msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp
>>>>>>>> sse2
>>>>>>>> sse3
>>>>>>>> sse4.1 sse4.2 ssse3 tdt x2apic
>>>>>>>> C compiler: /usr/bin/cc GNU 4.4.7
>>>>>>>> C compiler flags: -mavx -Wno-maybe-uninitialized -Wextra
>>>>>>>> -Wno-missing-field-initializers -Wno-sign-compare -Wpointer-arith
>>>>>>>> -Wall
>>>>>>>> -Wno-unused -Wunused-value -Wunused-parameter -fomit-frame-pointer
>>>>>>>> -funroll-all-loops -Wno-array-bounds -O3 -DNDEBUG
>>>>>>>> C++ compiler: /usr/bin/c++ GNU 4.4.7
>>>>>>>> C++ compiler flags: -mavx -Wextra -Wno-missing-field-initializers
>>>>>>>> -Wpointer-arith -Wall -Wno-unused-function -fomit-frame-pointer
>>>>>>>> -funroll-all-loops -Wno-array-bounds -O3 -DNDEBUG
>>>>>>>> Boost version: 1.55.0 (internal)
>>>>>>>> CUDA compiler: /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda
>>>>>>>> compiler
>>>>>>>> driver;Copyright (c) 2005-2013 NVIDIA Corporation;Built on
>>>>>>>> Thu_Mar_13_11:58:58_PDT_2014;Cuda compilation tools, release 6.0,
>>>>>>>> V6.0.1
>>>>>>>> CUDA compiler
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> flags:-gencode;arch=compute_20,code=sm_20;-gencode;arch=compute_20,code=sm_21;-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_35,code=compute_35;-use_fast_math;-Xcompiler;-fPIC
>>>>>>>> ;
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ;-mavx;-Wextra;-Wno-missing-field-initializers;-Wpointer-arith;-Wall;-Wno-unused-function;-fomit-frame-pointer;-funroll-all-loops;-Wno-array-bounds;-O3;-DNDEBUG
>>>>>>>> CUDA driver: 6.50
>>>>>>>> CUDA runtime: 6.0
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>>>>>>> B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
>>>>>>>> GROMACS 4: Algorithms for highly efficient, load-balanced, and
>>>>>>>> scalable
>>>>>>>> molecular simulation
>>>>>>>> J. Chem. Theory Comput. 4 (2008) pp. 435-447
>>>>>>>> -------- -------- --- Thank You --- -------- --------
>>>>>>>>
>>>>>>>>
>>>>>>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>>>>>>> D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H.
>>>>>>>> J.
>>>>>>>> C.
>>>>>>>> Berendsen
>>>>>>>> GROMACS: Fast, Flexible and Free
>>>>>>>> J. Comp. Chem. 26 (2005) pp. 1701-1719
>>>>>>>> -------- -------- --- Thank You --- -------- --------
>>>>>>>>
>>>>>>>>
>>>>>>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>>>>>>> E. Lindahl and B. Hess and D. van der Spoel
>>>>>>>> GROMACS 3.0: A package for molecular simulation and trajectory
>>>>>>>> analysis
>>>>>>>> J. Mol. Mod. 7 (2001) pp. 306-317
>>>>>>>> -------- -------- --- Thank You --- -------- --------
>>>>>>>>
>>>>>>>>
>>>>>>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>>>>>>> H. J. C. Berendsen, D. van der Spoel and R. van Drunen
>>>>>>>> GROMACS: A message-passing parallel molecular dynamics implementation
>>>>>>>> Comp. Phys. Comm. 91 (1995) pp. 43-56
>>>>>>>> -------- -------- --- Thank You --- -------- --------
>>>>>>>>
>>>>>>>>
>>>>>>>> For optimal performance with a GPU nstlist (now 10) should be larger.
>>>>>>>> The optimum depends on your CPU and GPU resources.
>>>>>>>> You might want to try several nstlist values.
>>>>>>>> Changing nstlist from 10 to 40, rlist from 1.2 to 1.285
>>>>>>>>
>>>>>>>> Input Parameters:
>>>>>>>> integrator = md
>>>>>>>> tinit = 0
>>>>>>>> dt = 0.002
>>>>>>>> nsteps = 10000000
>>>>>>>> init-step = 0
>>>>>>>> simulation-part = 1
>>>>>>>> comm-mode = Linear
>>>>>>>> nstcomm = 1
>>>>>>>> bd-fric = 0
>>>>>>>> ld-seed = 1993
>>>>>>>> emtol = 10
>>>>>>>> emstep = 0.01
>>>>>>>> niter = 20
>>>>>>>> fcstep = 0
>>>>>>>> nstcgsteep = 1000
>>>>>>>> nbfgscorr = 10
>>>>>>>> rtpi = 0.05
>>>>>>>> nstxout = 2500
>>>>>>>> nstvout = 2500
>>>>>>>> nstfout = 0
>>>>>>>> nstlog = 2500
>>>>>>>> nstcalcenergy = 1
>>>>>>>> nstenergy = 2500
>>>>>>>> nstxout-compressed = 500
>>>>>>>> compressed-x-precision = 1000
>>>>>>>> cutoff-scheme = Verlet
>>>>>>>> nstlist = 40
>>>>>>>> ns-type = Grid
>>>>>>>> pbc = xyz
>>>>>>>> periodic-molecules = FALSE
>>>>>>>> verlet-buffer-tolerance = 0.005
>>>>>>>> rlist = 1.285
>>>>>>>> rlistlong = 1.285
>>>>>>>> nstcalclr = 10
>>>>>>>> coulombtype = PME
>>>>>>>> coulomb-modifier = Potential-shift
>>>>>>>> rcoulomb-switch = 0
>>>>>>>> rcoulomb = 1.2
>>>>>>>> epsilon-r = 1
>>>>>>>> epsilon-rf = 1
>>>>>>>> vdw-type = Cut-off
>>>>>>>> vdw-modifier = Potential-shift
>>>>>>>> rvdw-switch = 0
>>>>>>>> rvdw = 1.2
>>>>>>>> DispCorr = No
>>>>>>>> table-extension = 1
>>>>>>>> fourierspacing = 0.135
>>>>>>>> fourier-nx = 128
>>>>>>>> fourier-ny = 128
>>>>>>>> fourier-nz = 128
>>>>>>>> pme-order = 4
>>>>>>>> ewald-rtol = 1e-05
>>>>>>>> ewald-rtol-lj = 0.001
>>>>>>>> lj-pme-comb-rule = Geometric
>>>>>>>> ewald-geometry = 0
>>>>>>>> epsilon-surface = 0
>>>>>>>> implicit-solvent = No
>>>>>>>> gb-algorithm = Still
>>>>>>>> nstgbradii = 1
>>>>>>>> rgbradii = 2
>>>>>>>> gb-epsilon-solvent = 80
>>>>>>>> gb-saltconc = 0
>>>>>>>> gb-obc-alpha = 1
>>>>>>>> gb-obc-beta = 0.8
>>>>>>>> gb-obc-gamma = 4.85
>>>>>>>> gb-dielectric-offset = 0.009
>>>>>>>> sa-algorithm = Ace-approximation
>>>>>>>> sa-surface-tension = 2.092
>>>>>>>> tcoupl = V-rescale
>>>>>>>> nsttcouple = 10
>>>>>>>> nh-chain-length = 0
>>>>>>>> print-nose-hoover-chain-variables = FALSE
>>>>>>>> pcoupl = No
>>>>>>>> pcoupltype = Semiisotropic
>>>>>>>> nstpcouple = -1
>>>>>>>> tau-p = 0.5
>>>>>>>> compressibility (3x3):
>>>>>>>> compressibility[ 0]={ 0.00000e+00, 0.00000e+00,
>>>>>>>> 0.00000e+00}
>>>>>>>> compressibility[ 1]={ 0.00000e+00, 0.00000e+00,
>>>>>>>> 0.00000e+00}
>>>>>>>> compressibility[ 2]={ 0.00000e+00, 0.00000e+00,
>>>>>>>> 0.00000e+00}
>>>>>>>> ref-p (3x3):
>>>>>>>> ref-p[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>>>>>>>> ref-p[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>>>>>>>> ref-p[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>>>>>>>> refcoord-scaling = No
>>>>>>>> posres-com (3):
>>>>>>>> posres-com[0]= 0.00000e+00
>>>>>>>> posres-com[1]= 0.00000e+00
>>>>>>>> posres-com[2]= 0.00000e+00
>>>>>>>> posres-comB (3):
>>>>>>>> posres-comB[0]= 0.00000e+00
>>>>>>>> posres-comB[1]= 0.00000e+00
>>>>>>>> posres-comB[2]= 0.00000e+00
>>>>>>>> QMMM = FALSE
>>>>>>>> QMconstraints = 0
>>>>>>>> QMMMscheme = 0
>>>>>>>> MMChargeScaleFactor = 1
>>>>>>>> qm-opts:
>>>>>>>> ngQM = 0
>>>>>>>> constraint-algorithm = Lincs
>>>>>>>> continuation = FALSE
>>>>>>>> Shake-SOR = FALSE
>>>>>>>> shake-tol = 0.0001
>>>>>>>> lincs-order = 4
>>>>>>>> lincs-iter = 1
>>>>>>>> lincs-warnangle = 30
>>>>>>>> nwall = 0
>>>>>>>> wall-type = 9-3
>>>>>>>> wall-r-linpot = -1
>>>>>>>> wall-atomtype[0] = -1
>>>>>>>> wall-atomtype[1] = -1
>>>>>>>> wall-density[0] = 0
>>>>>>>> wall-density[1] = 0
>>>>>>>> wall-ewald-zfac = 3
>>>>>>>> pull = no
>>>>>>>> rotation = FALSE
>>>>>>>> interactiveMD = FALSE
>>>>>>>> disre = No
>>>>>>>> disre-weighting = Conservative
>>>>>>>> disre-mixed = FALSE
>>>>>>>> dr-fc = 1000
>>>>>>>> dr-tau = 0
>>>>>>>> nstdisreout = 100
>>>>>>>> orire-fc = 0
>>>>>>>> orire-tau = 0
>>>>>>>> nstorireout = 100
>>>>>>>> free-energy = no
>>>>>>>> cos-acceleration = 0
>>>>>>>> deform (3x3):
>>>>>>>> deform[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>>>>>>>> deform[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>>>>>>>> deform[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
>>>>>>>> simulated-tempering = FALSE
>>>>>>>> E-x:
>>>>>>>> n = 0
>>>>>>>> E-xt:
>>>>>>>> n = 0
>>>>>>>> E-y:
>>>>>>>> n = 0
>>>>>>>> E-yt:
>>>>>>>> n = 0
>>>>>>>> E-z:
>>>>>>>> n = 0
>>>>>>>> E-zt:
>>>>>>>> n = 0
>>>>>>>> swapcoords = no
>>>>>>>> adress = FALSE
>>>>>>>> userint1 = 0
>>>>>>>> userint2 = 0
>>>>>>>> userint3 = 0
>>>>>>>> userint4 = 0
>>>>>>>> userreal1 = 0
>>>>>>>> userreal2 = 0
>>>>>>>> userreal3 = 0
>>>>>>>> userreal4 = 0
>>>>>>>> grpopts:
>>>>>>>> nrdf: 869226
>>>>>>>> ref-t: 300
>>>>>>>> tau-t: 0.1
>>>>>>>> annealing: No
>>>>>>>> annealing-npoints: 0
>>>>>>>> acc: 0 0 0
>>>>>>>> nfreeze: N N N
>>>>>>>> energygrp-flags[ 0]: 0
>>>>>>>> Using 1 MPI process
>>>>>>>> Using 32 OpenMP threads
>>>>>>>>
>>>>>>>> Detecting CPU SIMD instructions.
>>>>>>>> Present hardware specification:
>>>>>>>> Vendor: GenuineIntel
>>>>>>>> Brand: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
>>>>>>>> Family: 6 Model: 62 Stepping: 4
>>>>>>>> Features: aes apic avx clfsh cmov cx8 cx16 f16c htt lahf_lm mmx msr
>>>>>>>> nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp sse2
>>>>>>>> sse3
>>>>>>>> sse4.1 sse4.2 ssse3 tdt x2apic
>>>>>>>> SIMD instructions most likely to fit this hardware: AVX_256
>>>>>>>> SIMD instructions selected at GROMACS compile time: AVX_256
>>>>>>>>
>>>>>>>>
>>>>>>>> 2 GPUs detected on host localhost.localdomain:
>>>>>>>> #0: NVIDIA Tesla K20c, compute cap.: 3.5, ECC: yes, stat:
>>>>>>>> compatible
>>>>>>>> #1: NVIDIA GeForce GTX 650, compute cap.: 3.0, ECC: no, stat:
>>>>>>>> compatible
>>>>>>>>
>>>>>>>> 1 GPU auto-selected for this run.
>>>>>>>> Mapping of GPU to the 1 PP rank in this node: #0
>>>>>>>>
>>>>>>>>
>>>>>>>> NOTE: potentially sub-optimal launch configuration, gmx_mpi started
>>>>>>>> with
>>>>>>>> less
>>>>>>>> PP MPI process per node than GPUs available.
>>>>>>>> Each PP MPI process can use only one GPU, 1 GPU per node will
>>>>>>>> be
>>>>>>>> used.
>>>>>>>>
>>>>>>>> Will do PME sum in reciprocal space for electrostatic interactions.
>>>>>>>>
>>>>>>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>>>>>>> U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L. G.
>>>>>>>> Pedersen
>>>>>>>> A smooth particle mesh Ewald method
>>>>>>>> J. Chem. Phys. 103 (1995) pp. 8577-8592
>>>>>>>> -------- -------- --- Thank You --- -------- --------
>>>>>>>>
>>>>>>>> Will do ordinary reciprocal space Ewald sum.
>>>>>>>> Using a Gaussian width (1/beta) of 0.384195 nm for Ewald
>>>>>>>> Cut-off's: NS: 1.285 Coulomb: 1.2 LJ: 1.2
>>>>>>>> System total charge: -0.012
>>>>>>>> Generated table with 1142 data points for Ewald.
>>>>>>>> Tabscale = 500 points/nm
>>>>>>>> Generated table with 1142 data points for LJ6.
>>>>>>>> Tabscale = 500 points/nm
>>>>>>>> Generated table with 1142 data points for LJ12.
>>>>>>>> Tabscale = 500 points/nm
>>>>>>>> Generated table with 1142 data points for 1-4 COUL.
>>>>>>>> Tabscale = 500 points/nm
>>>>>>>> Generated table with 1142 data points for 1-4 LJ6.
>>>>>>>> Tabscale = 500 points/nm
>>>>>>>> Generated table with 1142 data points for 1-4 LJ12.
>>>>>>>> Tabscale = 500 points/nm
>>>>>>>>
>>>>>>>> Using CUDA 8x8 non-bonded kernels
>>>>>>>>
>>>>>>>> Potential shift: LJ r^-12: -1.122e-01 r^-6: -3.349e-01, Ewald
>>>>>>>> -1.000e-05
>>>>>>>> Initialized non-bonded Ewald correction tables, spacing: 7.82e-04
>>>>>>>> size:
>>>>>>>> 1536
>>>>>>>>
>>>>>>>> Removing pbc first time
>>>>>>>> Pinning threads with an auto-selected logical core stride of 1
>>>>>>>>
>>>>>>>> Initializing LINear Constraint Solver
>>>>>>>>
>>>>>>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>>>>>>> B. Hess and H. Bekker and H. J. C. Berendsen and J. G. E. M. Fraaije
>>>>>>>> LINCS: A Linear Constraint Solver for molecular simulations
>>>>>>>> J. Comp. Chem. 18 (1997) pp. 1463-1472
>>>>>>>> -------- -------- --- Thank You --- -------- --------
>>>>>>>>
>>>>>>>> The number of constraints is 5913
>>>>>>>>
>>>>>>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>>>>>>> S. Miyamoto and P. A. Kollman
>>>>>>>> SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for
>>>>>>>> Rigid
>>>>>>>> Water Models
>>>>>>>> J. Comp. Chem. 13 (1992) pp. 952-962
>>>>>>>> -------- -------- --- Thank You --- -------- --------
>>>>>>>>
>>>>>>>> Center of mass motion removal mode is Linear
>>>>>>>> We have the following groups for center of mass motion removal:
>>>>>>>> 0: rest
>>>>>>>>
>>>>>>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>>>>>>> G. Bussi, D. Donadio and M. Parrinello
>>>>>>>> Canonical sampling through velocity rescaling
>>>>>>>> J. Chem. Phys. 126 (2007) pp. 014101
>>>>>>>> -------- -------- --- Thank You --- -------- --------
>>>>>>>>
>>>>>>>> There are: 434658 Atoms
>>>>>>>>
>>>>>>>> Constraining the starting coordinates (step 0)
>>>>>>>>
>>>>>>>> Constraining the coordinates at t0-dt (step 0)
>>>>>>>> RMS relative constraint deviation after constraining: 3.67e-05
>>>>>>>> Initial temperature: 300.5 K
>>>>>>>>
>>>>>>>> Started mdrun on rank 0 Mon Dec 22 16:28:01 2014
>>>>>>>> Step Time Lambda
>>>>>>>> 0 0.00000 0.00000
>>>>>>>>
>>>>>>>> Energies (kJ/mol)
>>>>>>>> G96Angle Proper Dih. Improper Dih. LJ-14
>>>>>>>> Coulomb-14
>>>>>>>> 9.74139e+03 4.34956e+03 2.97359e+03 -1.93107e+02
>>>>>>>> 8.05534e+04
>>>>>>>> LJ (SR) Coulomb (SR) Coul. recip. Potential
>>>>>>>> Kinetic
>>>>>>>> En.
>>>>>>>> 1.01340e+06 -7.13271e+06 2.01361e+04 -6.00175e+06
>>>>>>>> 1.09887e+06
>>>>>>>> Total Energy Conserved En. Temperature Pressure (bar)
>>>>>>>> Constr.
>>>>>>>> rmsd
>>>>>>>> -4.90288e+06 -4.90288e+06 3.04092e+02 1.70897e+02
>>>>>>>> 2.16683e-05
>>>>>>>>
>>>>>>>> step 80: timed with pme grid 128 128 128, coulomb cutoff 1.200:
>>>>>>>> 6279.0
>>>>>>>> M-cycles
>>>>>>>> step 160: timed with pme grid 112 112 112, coulomb cutoff 1.306:
>>>>>>>> 6962.2
>>>>>>>> M-cycles
>>>>>>>> step 240: timed with pme grid 100 100 100, coulomb cutoff 1.463:
>>>>>>>> 8406.5
>>>>>>>> M-cycles
>>>>>>>> step 320: timed with pme grid 128 128 128, coulomb cutoff 1.200:
>>>>>>>> 6424.0
>>>>>>>> M-cycles
>>>>>>>> step 400: timed with pme grid 120 120 120, coulomb cutoff 1.219:
>>>>>>>> 6369.1
>>>>>>>> M-cycles
>>>>>>>> step 480: timed with pme grid 112 112 112, coulomb cutoff 1.306:
>>>>>>>> 7309.0
>>>>>>>> M-cycles
>>>>>>>> step 560: timed with pme grid 108 108 108, coulomb cutoff 1.355:
>>>>>>>> 7521.2
>>>>>>>> M-cycles
>>>>>>>> step 640: timed with pme grid 104 104 104, coulomb cutoff 1.407:
>>>>>>>> 8369.8
>>>>>>>> M-cycles
>>>>>>>> optimal pme grid 128 128 128, coulomb cutoff 1.200
>>>>>>>> Step Time Lambda
>>>>>>>> 2500 5.00000 0.00000
>>>>>>>>
>>>>>>>> Energies (kJ/mol)
>>>>>>>> G96Angle Proper Dih. Improper Dih. LJ-14
>>>>>>>> Coulomb-14
>>>>>>>> 9.72545e+03 4.33046e+03 2.98087e+03 -1.95794e+02
>>>>>>>> 8.05967e+04
>>>>>>>> LJ (SR) Coulomb (SR) Coul. recip. Potential
>>>>>>>> Kinetic
>>>>>>>> En.
>>>>>>>> 1.01293e+06 -7.13110e+06 2.01689e+04 -6.00057e+06
>>>>>>>> 1.08489e+06
>>>>>>>> Total Energy Conserved En. Temperature Pressure (bar)
>>>>>>>> Constr.
>>>>>>>> rmsd
>>>>>>>> -4.91567e+06 -4.90300e+06 3.00225e+02 1.36173e+02
>>>>>>>> 2.25998e-05
>>>>>>>>
>>>>>>>> Step Time Lambda
>>>>>>>> 5000 10.00000 0.00000
>>>>>>>>
>>>>>>>> ............
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> -------------------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>> Thank you in advance
>>>>>>>>
>>>>>>>> --
>>>>>>>> Carmen Di Giovanni, PhD
>>>>>>>> Dept. of Pharmaceutical and Toxicological Chemistry
>>>>>>>> "Drug Discovery Lab"
>>>>>>>> University of Naples "Federico II"
>>>>>>>> Via D. Montesano, 49
>>>>>>>> 80131 Naples
>>>>>>>> Tel.: ++39 081 678623
>>>>>>>> Fax: ++39 081 678100
>>>>>>>> Email: cdigiova at unina.it
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Quoting Justin Lemkul <jalemkul at vt.edu>:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 2/18/15 11:09 AM, Barnett, James W wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> What's your exact command?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> A full .log file would be even better; it would tell us everything
>>>>>>>>> we
>>>>>>>>> need
>>>>>>>>> to know :)
>>>>>>>>>
>>>>>>>>> -Justin
>>>>>>>>>
>>>>>>>>>> Have you reviewed this page:
>>>>>>>>>>
>>>>>>>>>> http://www.gromacs.org/Documentation/Acceleration_and_parallelization
>>>>>>>>>>
>>>>>>>>>> James "Wes" Barnett
>>>>>>>>>> Ph.D. Candidate
>>>>>>>>>> Chemical and Biomolecular Engineering
>>>>>>>>>>
>>>>>>>>>> Tulane University
>>>>>>>>>> Boggs Center for Energy and Biotechnology, Room 341-B
>>>>>>>>>>
>>>>>>>>>> ________________________________________
>>>>>>>>>> From: gromacs.org_gmx-users-bounces at maillist.sys.kth.se
>>>>>>>>>> <gromacs.org_gmx-users-bounces at maillist.sys.kth.se> on behalf of
>>>>>>>>>> Carmen
>>>>>>>>>> Di
>>>>>>>>>> Giovanni <cdigiova at unina.it>
>>>>>>>>>> Sent: Wednesday, February 18, 2015 10:06 AM
>>>>>>>>>> To: gromacs.org_gmx-users at maillist.sys.kth.se
>>>>>>>>>> Subject: Re: [gmx-users] GPU low performance
>>>>>>>>>>
>>>>>>>>>> I post the message of a md run :
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Force evaluation time GPU/CPU: 40.974 ms/24.437 ms = 1.677
>>>>>>>>>> For optimal performance this ratio should be close to 1!
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> NOTE: The GPU has >20% more load than the CPU. This imbalance
>>>>>>>>>> causes
>>>>>>>>>> performance loss, consider using a shorter cut-off and a
>>>>>>>>>> finer
>>>>>>>>>> PME
>>>>>>>>>> grid.
>>>>>>>>>>
>>>>>>>>>> As can I solved this problem ?
>>>>>>>>>> Thank you in advance
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Carmen Di Giovanni, PhD
>>>>>>>>>> Dept. of Pharmaceutical and Toxicological Chemistry
>>>>>>>>>> "Drug Discovery Lab"
>>>>>>>>>> University of Naples "Federico II"
>>>>>>>>>> Via D. Montesano, 49
>>>>>>>>>> 80131 Naples
>>>>>>>>>> Tel.: ++39 081 678623
>>>>>>>>>> Fax: ++39 081 678100
>>>>>>>>>> Email: cdigiova at unina.it
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Quoting Justin Lemkul <jalemkul at vt.edu>:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 2/18/15 10:30 AM, Carmen Di Giovanni wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Daear all,
>>>>>>>>>>>> I'm working on a machine with an INVIDIA Teska K20.
>>>>>>>>>>>> After a minimization on a protein of 1925 atoms this is the
>>>>>>>>>>>> mesage:
>>>>>>>>>>>>
>>>>>>>>>>>> Force evaluation time GPU/CPU: 2.923 ms/116.774 ms = 0.025
>>>>>>>>>>>> For optimal performance this ratio should be close to 1!
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Minimization is a poor indicator of performance. Do a real MD
>>>>>>>>>>> run.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> NOTE: The GPU has >25% less load than the CPU. This imbalance
>>>>>>>>>>>> causes
>>>>>>>>>>>> performance loss.
>>>>>>>>>>>>
>>>>>>>>>>>> Core t (s) Wall t (s) (%)
>>>>>>>>>>>> Time: 3289.010 205.891 1597.4
>>>>>>>>>>>> (steps/hour)
>>>>>>>>>>>> Performance: 8480.2
>>>>>>>>>>>> Finished mdrun on rank 0 Wed Feb 18 15:50:06 2015
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Cai I improve the performance?
>>>>>>>>>>>> At the moment in the forum I didn't full informations to solve
>>>>>>>>>>>> this
>>>>>>>>>>>> problem.
>>>>>>>>>>>> In attachment there is the log. file
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The list does not accept attachments. If you wish to share a
>>>>>>>>>>> file,
>>>>>>>>>>> upload it to a file-sharing service and provide a URL. The full
>>>>>>>>>>> .log is quite important for understanding your hardware,
>>>>>>>>>>> optimizations, and seeing full details of the performance
>>>>>>>>>>> breakdown.
>>>>>>>>>>> But again, base your assessment on MD, not EM.
>>>>>>>>>>>
>>>>>>>>>>> -Justin
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> ==================================================
>>>>>>>>>>>
>>>>>>>>>>> Justin A. Lemkul, Ph.D.
>>>>>>>>>>> Ruth L. Kirschstein NRSA Postdoctoral Fellow
>>>>>>>>>>>
>>>>>>>>>>> Department of Pharmaceutical Sciences
>>>>>>>>>>> School of Pharmacy
>>>>>>>>>>> Health Sciences Facility II, Room 629
>>>>>>>>>>> University of Maryland, Baltimore
>>>>>>>>>>> 20 Penn St.
>>>>>>>>>>> Baltimore, MD 21201
>>>>>>>>>>>
>>>>>>>>>>> jalemkul at outerbanks.umaryland.edu | (410) 706-7441
>>>>>>>>>>> http://mackerell.umaryland.edu/~jalemkul
>>>>>>>>>>>
>>>>>>>>>>> ==================================================
>>>>>>>>>>> --
>>>>>>>>>>> Gromacs Users mailing list
>>>>>>>>>>>
>>>>>>>>>>> * Please search the archive at
>>>>>>>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>>>>>>>>> posting!
>>>>>>>>>>>
>>>>>>>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>>>>>>>
>>>>>>>>>>> * For (un)subscribe requests visit
>>>>>>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>>>>>>>>>>> or send a mail to gmx-users-request at gromacs.org.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Gromacs Users mailing list
>>>>>>>>>>
>>>>>>>>>> * Please search the archive at
>>>>>>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>>>>>>>> posting!
>>>>>>>>>>
>>>>>>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>>>>>>
>>>>>>>>>> * For (un)subscribe requests visit
>>>>>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>>>>>>>>>> or
>>>>>>>>>> send a mail to gmx-users-request at gromacs.org.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> ==================================================
>>>>>>>>>
>>>>>>>>> Justin A. Lemkul, Ph.D.
>>>>>>>>> Ruth L. Kirschstein NRSA Postdoctoral Fellow
>>>>>>>>>
>>>>>>>>> Department of Pharmaceutical Sciences
>>>>>>>>> School of Pharmacy
>>>>>>>>> Health Sciences Facility II, Room 629
>>>>>>>>> University of Maryland, Baltimore
>>>>>>>>> 20 Penn St.
>>>>>>>>> Baltimore, MD 21201
>>>>>>>>>
>>>>>>>>> jalemkul at outerbanks.umaryland.edu | (410) 706-7441
>>>>>>>>> http://mackerell.umaryland.edu/~jalemkul
>>>>>>>>>
>>>>>>>>> ==================================================
>>>>>>>>> --
>>>>>>>>> Gromacs Users mailing list
>>>>>>>>>
>>>>>>>>> * Please search the archive at
>>>>>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>>>>>>> posting!
>>>>>>>>>
>>>>>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>>>>>
>>>>>>>>> * For (un)subscribe requests visit
>>>>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>>>>>>>>> or
>>>>>>>>> send
>>>>>>>>> a mail to gmx-users-request at gromacs.org.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Gromacs Users mailing list
>>>>>>>>
>>>>>>>> * Please search the archive at
>>>>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>>>>>> posting!
>>>>>>>>
>>>>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>>>>
>>>>>>>> * For (un)subscribe requests visit
>>>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>>>>>>> send a
>>>>>>>> mail to gmx-users-request at gromacs.org.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Gromacs Users mailing list
>>>>>>>
>>>>>>> * Please search the archive at
>>>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>>>>> posting!
>>>>>>>
>>>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>>>
>>>>>>> * For (un)subscribe requests visit
>>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>>>>>> send
>>>>>>> a mail to gmx-users-request at gromacs.org.
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>>
>
>
More information about the gromacs.org_gmx-users
mailing list