[gmx-users] performance issue of GROMACS
Szilárd Páll
pall.szilard at gmail.com
Tue Sep 19 15:19:59 CEST 2017
PS: A bit of extrapolation from my standard historical benchmark data
shows that regular cut-off kernels should run at ~3.0 ms/step, so
force shift will be ~3.5-4 ms/step (with nstlist=20 and 2 fs step);
assuming 70% CPU-GPU overlap that's 5-5.5 ms/step which corresponds to
~35 ns/day (with 2 fs).
That's just a rough estimate, though, and it assumes that you have
enough CPU cores for a balanced run.
--
Szilárd
On Tue, Sep 19, 2017 at 3:16 PM, Szilárd Páll <pall.szilard at gmail.com> wrote:
> On Tue, Sep 19, 2017 at 2:20 PM, Tomek Stępniewski
> <tm.stepniewski at gmail.com> wrote:
>> Hi everybody,
>> I am running gromacs 5.1.4 on a system that uses NVIDIA Tesla K40m,
>> surprisingly I get a speed of only 15 ns a day when carrying out nvt
>> simulations, my colleagues say that on a new GPU like this with my system
>> size it should be around 60 ns a day,
>> are there any apparent errors in my input files that might hhinder the
>> simulation?
>
> 15 ns/day seems a bit low, but I can't say for sure if it's far too
> low. Can you share logs?
>
>> input file:
>> integrator = md
>> dt = 0.002
>> nsteps = 100000000
>> nstlog = 10000
>> nstxout = 50000
>> nstvout = 50000
>> nstfout = 50000
>> nstcalcenergy = 100
>> nstenergy = 1000
>> ;
>> cutoff-scheme = Verlet
>> nstlist = 20
>> rlist = 1.2
>> coulombtype = pme
>> rcoulomb = 1.2
>> vdwtype = Cut-off
>> vdw-modifier = Force-switch
>> rvdw_switch = 1.0
>> rvdw = 1.2
>> ;
>> tcoupl = Nose-Hoover
>> tc_grps = PROT MEMB SOL_ION
>> tau_t = 1.0 1.0 1.0
>> ref_t = 310 310 310
>> ;
>> constraints = h-bonds
>> constraint_algorithm = LINCS
>> continuation = yes
>> ;
>> nstcomm = 100
>> comm_mode = linear
>> comm_grps = PROT MEMB SOL_ION
>> ;
>> refcoord_scaling = com
>>
>> the system has around 70,000 atoms,
>>
>> can this issue depend on the CUDA drivers?:
>
> A bit, but not to a factor of 4.
>
>> CUDA compiler: /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda compiler
>> driver;Copyright (c) 2005-2016 NVIDIA Corporation;Built on
>> Tue_Jan_10_13:22:03_CST_2017;Cuda compilation tools, release 8.0, V8.0.61
>> CUDA compiler flags:-gencode;arch=compute_20,code=sm_20;-gencode;arch=
>> compute_30,code=sm_30;-gencode;arch=compute_35,code=
>> sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=
>> compute_50,code=sm_50;-gencode;arch=compute_52,code=
>> sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=
>> compute_61,code=sm_61;-gencode;arch=compute_60,code=
>> compute_60;-gencode;arch=compute_61,code=compute_61;-use_fast_math;;
>> ;-march=core-avx2;-Wextra;-Wno-missing-field-initializers;-Wpointer-arith;-
>> Wall;-Wno-unused-function;-O3;-DNDEBUG;-funroll-all-loops;-
>> fexcess-precision=fast;-Wno-array-bounds;
>> CUDA driver: 8.0
>> CUDA runtime: 8.0
>> GPU info:
>> Number of GPUs detected: 1
>> #0: NVIDIA Tesla K40m, compute cap.: 3.5, ECC: yes, stat: compatible
>>
>> NOTE: GROMACS was configured without NVML support hence it can not exploit
>> application clocks of the detected Tesla K40m GPU to improve
>> performance.
>> Recompile with the NVML library (compatible with the driver used) or
>> set application clocks manually.
>>
>>
>> Using GPU 8x8 non-bonded kernels
>>
>> I will be extremely grateful for any help,
>> best
>>
>> --
>> Tomasz M Stepniewski
>> Research Group on Biomedical Informatics (GRIB)
>> Hospital del Mar Medical Research Institute (IMIM)
>> --
>> Gromacs Users mailing list
>>
>> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.
More information about the gromacs.org_gmx-users
mailing list