[gmx-users] how to increase GMX_OPENMP_MAX_THREADS
Szilárd Páll
pall.szilard at gmail.com
Wed Feb 27 14:32:14 CET 2019
The Quadro K2200 is a low-end several generations old GPU and I strongly
doubt you will see any benefit from using it.
I suggest you try running
mdrun -nb gpu -ntmpi 1 -ntomp 36 -pin on
which will give you the (most likely best) performance you can get when
using both high-end Intel CPUs and the GPU.
Compare that to:
mdrun -nb cpu -ntmpi 1 -ntomp 36 -pin on
mdrun -nb cpu -ntmpi 36 -ntomp 2 -pin on [with or without -npme 0]
mdrun -nb cpu -ntmpi 18 -ntomp 4 -pin on [with or without -npme 0]
I suspect one of the the latter two will be fastest.
You can also try using no domain-decomposition and 72 threads per rank by
recompiling GROMACS and setting the
cmake . -DGMX_OPENMP_MAX_THREADS=128
option. I will could end up being faster than the first suggested CPU run,
but not likely to be faster (by a relevant amount of any) than the latter
two.
If the above technicalities are not clear and you would like to learn about
them to understand them better, I recommend reading (again?) the relevant
parts of the user guide.
Cheers,
--
Szilárd
On Wed, Feb 27, 2019 at 12:27 PM Lalehan Ozalp <lalehan.ozalp at gmail.com>
wrote:
> Dear Szilárd,
> There is indeed one GPU. And please keep in mind I used to exploit the -nt
> 72 option BEFORE the 2019-dev version. It looks like it employs GPU by
> default and I don't know how to efficiently use it, apparently. Here is
> the info you asked for:
> System size: 130655 atoms
>
> .mdp file:
> ; Run parameters
> integrator = md ; leap-frog integrator
> nsteps = 15000000 ; 2 * 15000000 = 30000 ps (30 ns)
> dt = 0.002 ; 2 fs
> ; Output control
> nstenergy = 5000 ; save energies every 10.0 ps
> nstlog = 5000 ; update log file every 10.0 ps
> nstxout-compressed = 5000 ; save coordinates every 10.0 ps
> ; Bond parameters
> continuation = yes ; continuing from NPT
> constraint_algorithm = lincs ; holonomic constraints
> constraints = h-bonds ; bonds to H are constrained
> lincs_iter = 1 ; accuracy of LINCS
> lincs_order = 4 ; also related to accuracy
> ; Neighbor searching and vdW
> cutoff-scheme = Verlet
> ns_type = grid ; search neighboring grid cells
> nstlist = 20 ; largely irrelevant with Verlet
> rlist = 1.2
> vdwtype = cutoff
> vdw-modifier = force-switch
> rvdw-switch = 1.0
> rvdw = 1.2 ; short-range van der Waals cutoff (in
> nm)
> ; Electrostatics
> coulombtype = PME ; Particle Mesh Ewald for long-range
> electrostatics
> rcoulomb = 1.2
> pme_order = 4 ; cubic interpolation
> fourierspacing = 0.16 ; grid spacing for FFT
> ; Temperature coupling
> tcoupl = V-rescale ; modified
> Berendsen thermostat
> tc-grps = Protein_nap_16 Water_and_ions ; two coupling
> groups - more accurate
> tau_t = 0.1 0.1 ; time constant, in
> ps
> ref_t = 300 300 ; reference
> temperature, one for each group, in K
> ; Pressure coupling
> pcoupl = Parrinello-Rahman ; pressure coupling
> is on for NPT
> pcoupltype = isotropic ; uniform scaling
> of box vectors
>
>
>
> my command:
> gmx mdrun -deffnm md_0_30 -ntmpi 4 -ntomp 18 -npme 1 -pme gpu -nb gpu
>
>
>
> and what the program prints in the log file once I run it:
>
> GROMACS version: 2019-dev
> Precision: single
> Memory model: 64 bit
> MPI library: thread_mpi
> OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
> GPU support: CUDA
> SIMD instructions: NONE
> FFT library: fftw-3.3.8
> RDTSCP usage: disabled
> TNG support: enabled
> Hwloc support: disabled
> Tracing support: disabled
> Built on: 2019-01-22 13:53:24
> Build CPU vendor: Unknown
> Build CPU brand: Unknown
> Build CPU family: 0 Model: 0 Stepping: 0
> Build CPU features: Unknown
> C compiler: /usr/local/bin/gcc GNU 5.3.0
> C++ compiler flags: -std=c++11 -Wundef -Wextra
> -Wno-missing-field-initializers -Wpointer-arith -Wmissing-declarations
> -Wall -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast
> -Wno-array-bounds
> CUDA compiler: /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda compiler
> driver;Copyright (c) 2005-2016 NVIDIA Corporation;Built on
> Tue_Jan_10_13:22:03_CST_2017;Cuda compilation tools, release 8.0, V8.0.61
>
> Running on 1 node with total 36 cores, 72 logical cores, 1 compatible GPU
> Hardware detected:
> CPU info:
> Vendor: Intel
> Brand: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz
> Family: 6 Model: 63 Stepping: 2
>
> GPU info:
> Number of GPUs detected: 1
> #0: NVIDIA Quadro K2200, compute cap.: 5.0, ECC: no, stat: compatible
>
> Highest SIMD level requested by all nodes in run: AVX2_256
> SIMD instructions selected at compile time: None
> This program was compiled for different hardware than you are running on,
> which could influence performance.
> The current CPU can measure timings more accurately than the code in
> gmx mdrun was configured to use. This might affect your simulation
> speed as accurate timings are needed for load-balancing.
>
>
>
> Hardware:
>
> Architecture: x86_64
> CPU op-mode(s): 32-bit, 64-bit
> Byte Order: Little Endian
> CPU(s): 72
> On-line CPU(s) list: 0-71
> Thread(s) per core: 2
> Core(s) per socket: 18
> Socket(s): 2
> NUMA node(s): 2
> Vendor ID: GenuineIntel
> CPU family: 6
> Model: 63
> Model name: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz
> Stepping: 2
> CPU MHz: 1200.000
> BogoMIPS: 4589.66
> Virtualization: VT-x
> L1d cache: 32K
> L1i cache: 32K
> L2 cache: 256K
> L3 cache: 46080K
> NUMA node0 CPU(s): 0-17,36-53
> NUMA node1 CPU(s): 18-35,54-71
>
>
>
> GPU:
>
> 03:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM107GL
> [Quadro K2200] [10de:13ba] (rev a2) (prog-if 00 [VGA controller])
> Subsystem: NVIDIA Corporation Device [10de:1097]
> Physical Slot: 2
> Flags: bus master, fast devsel, latency 0, IRQ 232
> Memory at d2000000 (32-bit, non-prefetchable) [size=16M]
> Memory at c0000000 (64-bit, prefetchable) [size=256M]
> Memory at d0000000 (64-bit, prefetchable) [size=32M]
> I/O ports at 4000 [size=128]
> [virtual] Expansion ROM at d3000000 [disabled] [size=512K]
> Capabilities: <access denied>
> Kernel driver in use: nvidia
> Kernel modules: nvidia-drm, nvidia, nouveau, nvidiafb
>
>
> Hope I didn't flooded with too much information.
> Thank you very much for your interest.
> Best,
>
> Lalehan
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
More information about the gromacs.org_gmx-users
mailing list