[gmx-developers] free energies on GPUs?
Igor Leontyev
ileontyev at ucdavis.edu
Wed Feb 22 17:54:31 CET 2017
>
> What CPU vs GPU time per step gets reported at the end of the log
> file?
Thank you Berk for prompt response. Here is my log-file that provides
all the details.
=================================================
Host: compute-0-113.local pid: 12081 rank ID: 0 number of ranks: 1
:-) GROMACS - gmx mdrun, 2016.2 (-:
GROMACS is written by:
...........................................................
GROMACS: gmx mdrun, version 2016.2
Executable:
/home/leontyev/programs/bin/gromacs/gromacs-2016.2/bin/gmx_avx2_gpu
Data prefix: /home/leontyev/programs/bin/gromacs/gromacs-2016.2
Working dir:
/share/COMMON2/MDRUNS/GROMACS/MUTATIONS/PROTEINS/coc-Flu_A-B_LIGs/MDRUNS/InP/fluA/Output_test/6829_6818_9/Gromacs.571690
Command line:
gmx_avx2_gpu mdrun -nb gpu -gpu_id 3 -pin on -nt 8 -s
6829_6818-liq_0.tpr -e
/state/partition1/Gromacs.571690.0//6829_6818-liq_0.edr -dhdl
/state/partition1/Gromacs.571690.0//6829_6818-liq_0.xvg -o
/state/partition1/Gromacs.571690.0//6829_6818-liq_0.trr -x
/state/partition1/Gromacs.571690.0//6829_6818-liq_0.xtc -cpo
/state/partition1/Gromacs.571690.0//6829_6818-liq_0.cpt -c
6829_6818-liq_0.gro -g 6829_6818-liq_0.log
GROMACS version: 2016.2
Precision: single
Memory model: 64 bit
MPI library: thread_mpi
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 32)
GPU support: CUDA
SIMD instructions: AVX2_256
FFT library: fftw-3.3.4-sse2-avx
RDTSCP usage: enabled
TNG support: enabled
Hwloc support: disabled
Tracing support: disabled
Built on: Mon Feb 20 18:26:54 PST 2017
Built by: leontyev at cluster01.interxinc.com [CMAKE]
Build OS/arch: Linux 2.6.32-642.el6.x86_64 x86_64
Build CPU vendor: Intel
Build CPU brand: Intel(R) Xeon(R) CPU E5-1620 0 @ 3.60GHz
Build CPU family: 6 Model: 45 Stepping: 7
Build CPU features: aes apic avx clfsh cmov cx8 cx16 htt lahf mmx msr
nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdtscp sse2 sse3
sse4.1 sse4.2 ssse3 tdt x2apic
C compiler: /share/apps/devtoolset-1.1/root/usr/bin/gcc GNU 4.7.2
C compiler flags: -march=core-avx2 -static-libgcc -static-libstdc++
-O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast
C++ compiler: /share/apps/devtoolset-1.1/root/usr/bin/g++ GNU 4.7.2
C++ compiler flags: -march=core-avx2 -std=c++0x -O3 -DNDEBUG
-funroll-all-loops -fexcess-precision=fast
CUDA compiler: /share/apps/cuda-8.0/bin/nvcc nvcc: NVIDIA (R) Cuda
compiler driver;Copyright (c) 2005-2016 NVIDIA Corporation;Built on
Sun_Sep__4_22:14:01_CDT_2016;Cuda compilation tools, release 8.0, V8.0.44
CUDA compiler
flags:-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_35,code=compute_35;-gencode;arch=compute_52,code=compute_52;-gencode;arch=compute_61,code=compute_61;-use_fast_math;;;-Xcompiler;,-march=core-avx2,,,,,,;-Xcompiler;-O3,-DNDEBUG,-funroll-all-loops,-fexcess-precision=fast,,;
CUDA driver: 8.0
CUDA runtime: 8.0
Running on 1 node with total 24 cores, 24 logical cores, 4 compatible GPUs
Hardware detected:
CPU info:
Vendor: Intel
Brand: Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz
Family: 6 Model: 63 Stepping: 2
Features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt lahf
mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp
sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
SIMD instructions most likely to fit this hardware: AVX2_256
SIMD instructions selected at GROMACS compile time: AVX2_256
Hardware topology: Basic
Sockets, cores, and logical processors:
Socket 0: [ 0] [ 1] [ 2] [ 3] [ 4] [ 5] [ 6] [
7] [ 8] [ 9] [ 10] [ 11]
Socket 1: [ 12] [ 13] [ 14] [ 15] [ 16] [ 17] [ 18] [
19] [ 20] [ 21] [ 22] [ 23]
GPU info:
Number of GPUs detected: 4
#0: NVIDIA GeForce GTX 1080, compute cap.: 6.1, ECC: no, stat:
compatible
#1: NVIDIA GeForce GTX 1080, compute cap.: 6.1, ECC: no, stat:
compatible
#2: NVIDIA GeForce GTX 1080, compute cap.: 6.1, ECC: no, stat:
compatible
#3: NVIDIA GeForce GTX 1080, compute cap.: 6.1, ECC: no, stat:
compatible
For optimal performance with a GPU nstlist (now 10) should be larger.
The optimum depends on your CPU and GPU resources.
You might want to try several nstlist values.
Changing nstlist from 10 to 40, rlist from 0.9 to 0.932
Input Parameters:
integrator = sd
tinit = 0
dt = 0.001
nsteps = 10000
init-step = 0
simulation-part = 1
comm-mode = Linear
nstcomm = 100
bd-fric = 0
ld-seed = 1103660843
emtol = 10
emstep = 0.01
niter = 20
fcstep = 0
nstcgsteep = 1000
nbfgscorr = 10
rtpi = 0.05
nstxout = 10000000
nstvout = 10000000
nstfout = 0
nstlog = 20000
nstcalcenergy = 100
nstenergy = 1000
nstxout-compressed = 5000
compressed-x-precision = 1000
cutoff-scheme = Verlet
nstlist = 40
ns-type = Grid
pbc = xyz
periodic-molecules = false
verlet-buffer-tolerance = 0.005
rlist = 0.932
coulombtype = PME
coulomb-modifier = Potential-shift
rcoulomb-switch = 0.9
rcoulomb = 0.9
epsilon-r = 1
epsilon-rf = inf
vdw-type = Cut-off
vdw-modifier = Potential-shift
rvdw-switch = 0.9
rvdw = 0.9
DispCorr = EnerPres
table-extension = 1
fourierspacing = 0.12
fourier-nx = 42
fourier-ny = 42
fourier-nz = 40
pme-order = 6
ewald-rtol = 1e-05
ewald-rtol-lj = 0.001
lj-pme-comb-rule = Geometric
ewald-geometry = 0
epsilon-surface = 0
implicit-solvent = No
gb-algorithm = Still
nstgbradii = 1
rgbradii = 1
gb-epsilon-solvent = 80
gb-saltconc = 0
gb-obc-alpha = 1
gb-obc-beta = 0.8
gb-obc-gamma = 4.85
gb-dielectric-offset = 0.009
sa-algorithm = Ace-approximation
sa-surface-tension = 2.05016
tcoupl = No
nsttcouple = 5
nh-chain-length = 0
print-nose-hoover-chain-variables = false
pcoupl = Parrinello-Rahman
pcoupltype = Isotropic
nstpcouple = 5
tau-p = 0.5
compressibility (3x3):
compressibility[ 0]={ 5.00000e-05, 0.00000e+00, 0.00000e+00}
compressibility[ 1]={ 0.00000e+00, 5.00000e-05, 0.00000e+00}
compressibility[ 2]={ 0.00000e+00, 0.00000e+00, 5.00000e-05}
ref-p (3x3):
ref-p[ 0]={ 1.01325e+00, 0.00000e+00, 0.00000e+00}
ref-p[ 1]={ 0.00000e+00, 1.01325e+00, 0.00000e+00}
ref-p[ 2]={ 0.00000e+00, 0.00000e+00, 1.01325e+00}
refcoord-scaling = All
posres-com (3):
posres-com[0]= 0.00000e+00
posres-com[1]= 0.00000e+00
posres-com[2]= 0.00000e+00
posres-comB (3):
posres-comB[0]= 0.00000e+00
posres-comB[1]= 0.00000e+00
posres-comB[2]= 0.00000e+00
QMMM = false
QMconstraints = 0
QMMMscheme = 0
MMChargeScaleFactor = 1
qm-opts:
ngQM = 0
constraint-algorithm = Lincs
continuation = false
Shake-SOR = false
shake-tol = 0.0001
lincs-order = 12
lincs-iter = 1
lincs-warnangle = 30
nwall = 0
wall-type = 9-3
wall-r-linpot = -1
wall-atomtype[0] = -1
wall-atomtype[1] = -1
wall-density[0] = 0
wall-density[1] = 0
wall-ewald-zfac = 3
pull = false
rotation = false
interactiveMD = false
disre = No
disre-weighting = Conservative
disre-mixed = false
dr-fc = 1000
dr-tau = 0
nstdisreout = 100
orire-fc = 0
orire-tau = 0
nstorireout = 100
free-energy = yes
init-lambda = -1
init-lambda-state = 0
delta-lambda = 0
nstdhdl = 100
n-lambdas = 13
separate-dvdl:
fep-lambdas = FALSE
mass-lambdas = FALSE
coul-lambdas = TRUE
vdw-lambdas = TRUE
bonded-lambdas = TRUE
restraint-lambdas = FALSE
temperature-lambdas = FALSE
all-lambdas:
fep-lambdas = 0 0 0 0
0 0 0 0 0 0
0 0 0
mass-lambdas = 0 0 0 0
0 0 0 0 0 0
0 0 0
coul-lambdas = 0 0.03 0.1 0.2
0.3 0.4 0.5 0.6 0.7 0.8
0.9 0.97 1
vdw-lambdas = 0 0.03 0.1 0.2
0.3 0.4 0.5 0.6 0.7 0.8
0.9 0.97 1
bonded-lambdas = 0 0.03 0.1 0.2
0.3 0.4 0.5 0.6 0.7 0.8
0.9 0.97 1
restraint-lambdas = 0 0 0 0
0 0 0 0 0 0
0 0 0
temperature-lambdas = 0 0 0 0
0 0 0 0 0 0
0 0 0
calc-lambda-neighbors = -1
dhdl-print-energy = potential
sc-alpha = 0.1
sc-power = 1
sc-r-power = 6
sc-sigma = 0.3
sc-sigma-min = 0.3
sc-coul = true
dh-hist-size = 0
dh-hist-spacing = 0.1
separate-dhdl-file = yes
dhdl-derivatives = yes
cos-acceleration = 0
deform (3x3):
deform[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
deform[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
deform[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
simulated-tempering = false
E-x:
n = 0
E-xt:
n = 0
E-y:
n = 0
E-yt:
n = 0
E-z:
n = 0
E-zt:
n = 0
swapcoords = no
userint1 = 0
userint2 = 0
userint3 = 0
userint4 = 0
userreal1 = 0
userreal2 = 0
userreal3 = 0
userreal4 = 0
grpopts:
nrdf: 6332.24 62.9925 18705.8
ref-t: 298.15 298.15 298.15
tau-t: 1 1 1
annealing: No No No
annealing-npoints: 0 0 0
acc: 0 0 0
nfreeze: N N N
energygrp-flags[ 0]: 0
Using 1 MPI thread
Using 8 OpenMP threads
1 GPU user-selected for this run.
Mapping of GPU ID to the 1 PP rank in this node: 3
Will do PME sum in reciprocal space for electrostatic interactions.
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L. G.
Pedersen
A smooth particle mesh Ewald method
J. Chem. Phys. 103 (1995) pp. 8577-8592
-------- -------- --- Thank You --- -------- --------
Will do ordinary reciprocal space Ewald sum.
Using a Gaussian width (1/beta) of 0.288146 nm for Ewald
Cut-off's: NS: 0.932 Coulomb: 0.9 LJ: 0.9
Long Range LJ corr.: <C6> 3.6183e-04
System total charge, top. A: 7.000 top. B: 7.000
Generated table with 965 data points for Ewald.
Tabscale = 500 points/nm
Generated table with 965 data points for LJ6.
Tabscale = 500 points/nm
Generated table with 965 data points for LJ12.
Tabscale = 500 points/nm
Generated table with 965 data points for 1-4 COUL.
Tabscale = 500 points/nm
Generated table with 965 data points for 1-4 LJ6.
Tabscale = 500 points/nm
Generated table with 965 data points for 1-4 LJ12.
Tabscale = 500 points/nm
Potential shift: LJ r^-12: -3.541e+00 r^-6: -1.882e+00, Ewald -1.000e-05
Initialized non-bonded Ewald correction tables, spacing: 8.85e-04 size: 1018
Using GPU 8x8 non-bonded kernels
Using Lorentz-Berthelot Lennard-Jones combination rule
There are 21 atoms and 21 charges for free energy perturbation
Removing pbc first time
Pinning threads with an auto-selected logical core stride of 1
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Miyamoto and P. A. Kollman
SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for Rigid
Water Models
J. Comp. Chem. 13 (1992) pp. 952-962
-------- -------- --- Thank You --- -------- --------
Intra-simulation communication will occur every 5 steps.
Initial vector of lambda components:[ 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000 ]
Center of mass motion removal mode is Linear
We have the following groups for center of mass motion removal:
0: rest
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
N. Goga and A. J. Rzepiela and A. H. de Vries and S. J. Marrink and H. J. C.
Berendsen
Efficient Algorithms for Langevin and DPD Dynamics
J. Chem. Theory Comput. 8 (2012) pp. 3637--3649
-------- -------- --- Thank You --- -------- --------
There are: 11486 Atoms
Constraining the starting coordinates (step 0)
Constraining the coordinates at t0-dt (step 0)
RMS relative constraint deviation after constraining: 0.00e+00
Initial temperature: 291.365 K
Started mdrun on rank 0 Wed Feb 22 02:11:02 2017
Step Time
0 0.00000
Energies (kJ/mol)
Bond Angle Proper Dih. Ryckaert-Bell. Improper Dih.
2.99018e+03 4.09043e+03 5.20416e+03 4.32600e+01 2.38045e+02
LJ-14 Coulomb-14 LJ (SR) Disper. corr. Coulomb (SR)
2.04778e+03 1.45523e+04 1.59846e+04 -2.41317e+03 -1.92125e+05
Coul. recip. Position Rest. Potential Kinetic En. Total Energy
1.58368e+03 2.08367e-09 -1.47804e+05 3.03783e+04 -1.17425e+05
Temperature Pres. DC (bar) Pressure (bar) dVcoul/dl dVvdw/dl
2.91118e+02 -3.53694e+02 -3.01252e+02 4.77627e+02 1.41810e+01
dVbonded/dl
-2.15074e+01
step 80: timed with pme grid 42 42 40, coulomb cutoff 0.900: 391.6
M-cycles
step 160: timed with pme grid 36 36 36, coulomb cutoff 1.043: 595.7
M-cycles
step 240: timed with pme grid 40 36 36, coulomb cutoff 1.022: 401.1
M-cycles
step 320: timed with pme grid 40 40 36, coulomb cutoff 0.963: 318.8
M-cycles
step 400: timed with pme grid 40 40 40, coulomb cutoff 0.938: 349.9
M-cycles
step 480: timed with pme grid 42 40 40, coulomb cutoff 0.920: 319.9
M-cycles
optimal pme grid 40 40 36, coulomb cutoff 0.963
Step Time
10000 10.00000
Writing checkpoint, step 10000 at Wed Feb 22 02:11:41 2017
Energies (kJ/mol)
Bond Angle Proper Dih. Ryckaert-Bell. Improper Dih.
2.99123e+03 4.14451e+03 5.19572e+03 2.56045e+01 2.74109e+02
LJ-14 Coulomb-14 LJ (SR) Disper. corr. Coulomb (SR)
2.01371e+03 1.45326e+04 1.55974e+04 -2.43903e+03 -1.88805e+05
Coul. recip. Position Rest. Potential Kinetic En. Total Energy
1.26353e+03 7.39689e+01 -1.45132e+05 3.14390e+04 -1.13693e+05
Temperature Pres. DC (bar) Pressure (bar) dVcoul/dl dVvdw/dl
3.01283e+02 -3.61306e+02 1.35461e+02 3.46732e+02 1.03533e+01
dVbonded/dl
-1.08537e+01
<====== ############### ==>
<==== A V E R A G E S ====>
<== ############### ======>
Statistics over 10001 steps using 101 frames
Energies (kJ/mol)
Bond Angle Proper Dih. Ryckaert-Bell. Improper Dih.
3.01465e+03 4.25438e+03 5.23249e+03 3.47157e+01 2.59375e+02
LJ-14 Coulomb-14 LJ (SR) Disper. corr. Coulomb (SR)
2.02486e+03 1.45795e+04 1.58085e+04 -2.42589e+03 -1.89788e+05
Coul. recip. Position Rest. Potential Kinetic En. Total Energy
1.28411e+03 6.08802e+01 -1.45660e+05 3.09346e+04 -1.14726e+05
Temperature Pres. DC (bar) Pressure (bar) dVcoul/dl dVvdw/dl
2.96448e+02 -3.57435e+02 3.32252e+01 4.36060e+02 1.77368e+01
dVbonded/dl
-1.82384e+01
Box-X Box-Y Box-Z
4.99607e+00 4.89654e+00 4.61444e+00
Total Virial (kJ/mol)
1.00345e+04 5.03211e+01 -1.17351e+02
4.69630e+01 1.04021e+04 1.73033e+02
-1.16637e+02 1.75781e+02 1.01673e+04
Pressure (bar)
7.67740e+01 -1.32678e+01 3.58518e+01
-1.22810e+01 -2.15571e+01 -5.79828e+01
3.56420e+01 -5.87931e+01 4.44585e+01
T-Protein T-LIG T-SOL
2.98707e+02 2.97436e+02 2.95680e+02
P P - P M E L O A D B A L A N C I N G
PP/PME load balancing changed the cut-off and PME settings:
particle-particle PME
rcoulomb rlist grid spacing 1/beta
initial 0.900 nm 0.932 nm 42 42 40 0.119 nm 0.288 nm
final 0.963 nm 0.995 nm 40 40 36 0.128 nm 0.308 nm
cost-ratio 1.22 0.82
(note that these numbers concern only part of the total PP and PME load)
M E G A - F L O P S A C C O U N T I N G
NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels
RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
W3=SPC/TIP3p W4=TIP4p (single or pairs)
V&F=Potential and force V=Potential only F=Force only
Computing: M-Number M-Flops % Flops
-----------------------------------------------------------------------------
NB Free energy kernel 20441.549154 20441.549 0.3
Pair Search distance check 289.750448 2607.754 0.0
NxN Ewald Elec. + LJ [F] 78217.065728 5162326.338 85.6
NxN Ewald Elec. + LJ [V&F] 798.216192 85409.133 1.4
1,4 nonbonded interactions 55.597769 5003.799 0.1
Calc Weights 344.614458 12406.120 0.2
Spread Q Bspline 49624.481952 99248.964 1.6
Gather F Bspline 49624.481952 297746.892 4.9
3D-FFT 36508.030372 292064.243 4.8
Solve PME 31.968000 2045.952 0.0
Shift-X 2.882986 17.298 0.0
Bonds 21.487804 1267.780 0.0
Angles 38.645175 6492.389 0.1
Propers 58.750116 13453.777 0.2
Impropers 4.270427 888.249 0.0
RB-Dihedrals 0.445700 110.088 0.0
Pos. Restr. 0.900090 45.005 0.0
Virial 23.073531 415.324 0.0
Update 114.871486 3561.016 0.1
Stop-CM 1.171572 11.716 0.0
Calc-Ekin 45.966972 1241.108 0.0
Constraint-V 187.108062 1496.864 0.0
Constraint-Vir 18.717354 449.216 0.0
Settle 62.372472 20146.308 0.3
-----------------------------------------------------------------------------
Total 6028896.883 100.0
-----------------------------------------------------------------------------
R E A L C Y C L E A N D T I M E A C C O U N T I N G
On 1 MPI rank, each using 8 OpenMP threads
Computing: Num Num Call Wall time Giga-Cycles
Ranks Threads Count (s) total sum %
-----------------------------------------------------------------------------
Neighbor search 1 8 251 0.530 9.754 1.4
Launch GPU ops. 1 8 10001 0.509 9.357 1.3
Force 1 8 10001 10.634 195.662 27.3
PME mesh 1 8 10001 22.173 407.991 57.0
Wait GPU local 1 8 10001 0.073 1.338 0.2
NB X/F buffer ops. 1 8 19751 0.255 4.690 0.7
Write traj. 1 8 3 0.195 3.587 0.5
Update 1 8 20002 1.038 19.093 2.7
Constraints 1 8 20002 0.374 6.887 1.0
Rest 3.126 57.513 8.0
-----------------------------------------------------------------------------
Total 38.906 715.871 100.0
-----------------------------------------------------------------------------
Breakdown of PME mesh computation
-----------------------------------------------------------------------------
PME spread/gather 1 8 40004 19.289 354.929 49.6
PME 3D-FFT 1 8 40004 2.319 42.665 6.0
PME solve Elec 1 8 20002 0.518 9.538 1.3
-----------------------------------------------------------------------------
GPU timings
-----------------------------------------------------------------------------
Computing: Count Wall t (s) ms/step %
-----------------------------------------------------------------------------
Pair list H2D 251 0.023 0.090 1.1
X / q H2D 10001 0.269 0.027 12.5
Nonbonded F kernel 9700 1.615 0.166 75.0
Nonbonded F+ene k. 50 0.014 0.273 0.6
Nonbonded F+prune k. 200 0.039 0.196 1.8
Nonbonded F+ene+prune k. 51 0.016 0.323 0.8
F D2H 10001 0.177 0.018 8.2
-----------------------------------------------------------------------------
Total 2.153 0.215 100.0
-----------------------------------------------------------------------------
Average per-step force GPU/CPU evaluation time ratio: 0.215 ms/3.280 ms
= 0.066
For optimal performance this ratio should be close to 1!
NOTE: The GPU has >25% less load than the CPU. This imbalance causes
performance loss.
Core t (s) Wall t (s) (%)
Time: 311.246 38.906 800.0
(ns/day) (hour/ns)
Performance: 22.210 1.081
=================================================
On 2/22/2017 1:04 AM, Igor Leontyev wrote:
> Hi.
> I am having hard time with accelerating free energy (FE) simulations on
> my high end GPU. Not sure is it normal for my smaller systems or I am
> doing something wrong.
>
> The efficiency of GPU acceleration seems to decrease with the system
> size, right? Typical sizes in FE simulations in water is 32x32x32 A^3
> (~3.5K atoms) and in protein it is about 60x60x60A^3 (~25K atoms).
> Requirement for larger MD box in FE simulation is rather rare.
>
> For my system (with 11K atoms) I am getting on 8 cpus and with GTX 1080
> gpu only up to 50% speedup. GPU utilization during simulation is only
> 1-2%. Does it sound right? (I am using current gmx ver-2016.2 and CUDA
> driver 8.0; by request will attach log-files with all the details.)
>
> BTW, regarding how much take perturbed interactions, in my case
> simulation with "free_energy = no" running about TWICE faster.
>
> Igor
>
>> On 2/13/17, 1:32 AM,
>> "gromacs.org_gmx-developers-bounces at maillist.sys.kth.se on behalf of
>> Berk Hess" <gromacs.org_gmx-developers-bounces at maillist.sys.kth.se on
>> behalf of hess at kth.se> wrote:
>>
>> That depends on what you mean with this.
>> With free-energy all non-perturbed non-bonded interactions can run on
>> the GPU. The perturbed ones currently can not. For a large system
>> with a
>> few perturbed atoms this is no issue. For smaller systems the
>> free-energy kernel can be the limiting factor. I think there is a
>> lot of
>> gain to be had in making the extremely complex CPU free-energy kernel
>> faster. Initially I thought SIMD would not help there. But since any
>> perturbed i-particle will have perturbed interactions with all
>> j's, this
>> will help a lot.
>>
>> Cheers,
>>
>> Berk
>>
>> On 2017-02-13 01:08, Michael R Shirts wrote:
>> > What?s the current state of free energy code on GPU?s, and what
>> are the roadblocks?
>> >
>> > Thanks!
>> > ~~~~~~~~~~~~~~~~
>> > Michael Shirts
More information about the gromacs.org_gmx-developers
mailing list