Antw: [gmx-users] NaN error using mdrun-gpu

Bongkeun Kim bkim at
Wed Dec 15 18:37:25 CET 2010


This is the output from deviceQuery command:
./deviceQuery Starting...

  CUDA Device Query (Runtime API) version (CUDART static linking)

There are 4 devices supporting CUDA

Device 0: "Tesla T10 Processor"
   CUDA Driver Version:                           3.20
   CUDA Runtime Version:                          3.20
   CUDA Capability Major revision number:         1
   CUDA Capability Minor revision number:         3
   Total amount of global memory:                 4294770688 bytes
   Number of multiprocessors:                     30
   Number of cores:                               240
   Total amount of constant memory:               65536 bytes
   Total amount of shared memory per block:       16384 bytes
   Total number of registers available per block: 16384
   Warp size:                                     32
   Maximum number of threads per block:           512
   Maximum sizes of each dimension of a block:    512 x 512 x 64
   Maximum sizes of each dimension of a grid:     65535 x 65535 x 1
   Maximum memory pitch:                          2147483647 bytes
   Texture alignment:                             256 bytes
   Clock rate:                                    1.44 GHz
   Concurrent copy and execution:                 Yes
   Run time limit on kernels:                     No
   Integrated:                                    No
   Support host page-locked memory mapping:       Yes
   Compute mode:                                  Default (multiple  
host threads can use this device simultaneously)

And this simulation was already done by cpu first and I tried to run  
the second one with gpu.
Bongkeun Kim

Quoting Szilard Pall <szilard.pall at>:

> Hi,
> Tesla C1060 and S1070 should is definitely supported so it's strange
> that you get that warning. The only thing I can think of is that for
> some reason the CUDA runtime reports the name of the GPUS other than
> C1060/S1070. Could you please run the deviceQuery from the SDK and
> provide the output here?
> However, that should not be causing the NaN issue. Does the same
> simulation run on the CPU?
> Cheers,
> --
> Szilard
> 2010/12/15 Bongkeun Kim <bkim at>:
>> Hello,
>> I tried using 1fs timestep and it did not work.
>> I'm using nvidia T10 gpus(c1060 or s1070) and mdrun-gpu said it's not
>> supported gpu and I had to use "force-device=y". Do you think this is the
>> reason of the error?
>> Thanks.
>> Bongkeun Kim
>> Quoting Emanuel Peter <Emanuel.Peter at>:
>>> Hello,
>>> If you use for your timestep 1fs instead of 2fs, it could run better.
>>> Bests,
>>> Emanuel
>>>>>> Bongkeun Kim  15.12.10 8.36 Uhr >>>
>>> Hello,
>>> I got an error log when I used gromacs-gpu on npt simulation.
>>> The error is like:
>>> ---------------------------------------------------------------
>>> Input Parameters:
>>>    integrator           = md
>>>    nsteps               = 50000000
>>>    init_step            = 0
>>>    ns_type              = Grid
>>>    nstlist              = 5
>>>    ndelta               = 2
>>>    nstcomm              = 10
>>>    comm_mode            = Linear
>>>    nstlog               = 1000
>>>    nstxout              = 1000
>>>    nstvout              = 1000
>>>    nstfout              = 0
>>>    nstcalcenergy        = 5
>>>    nstenergy            = 1000
>>>    nstxtcout            = 1000
>>>    init_t               = 0
>>>    delta_t              = 0.002
>>>    xtcprec              = 1000
>>>    nkx                  = 32
>>>    nky                  = 32
>>>    nkz                  = 32
>>>    pme_order            = 4
>>>    ewald_rtol           = 1e-05
>>>    ewald_geometry       = 0
>>>    epsilon_surface      = 0
>>>    optimize_fft         = FALSE
>>>    ePBC                 = xyz
>>>    bPeriodicMols        = FALSE
>>>    bContinuation        = TRUE
>>>    bShakeSOR            = FALSE
>>>    etc                  = V-rescale
>>>    nsttcouple           = 5
>>>    epc                  = Parrinello-Rahman
>>>    epctype              = Isotropic
>>>    nstpcouple           = 5
>>>    tau_p                = 2
>>>    ref_p (3x3):
>>>       ref_p[    0]={ 1.00000e+00,  0.00000e+00,  0.00000e+00}
>>>       ref_p[    1]={ 0.00000e+00,  1.00000e+00,  0.00000e+00}
>>>       ref_p[    2]={ 0.00000e+00,  0.00000e+00,  1.00000e+00}
>>>    compress (3x3):
>>>       compress[    0]={ 4.50000e-05,  0.00000e+00,  0.00000e+00}
>>>       compress[    1]={ 0.00000e+00,  4.50000e-05,  0.00000e+00}
>>>       compress[    2]={ 0.00000e+00,  0.00000e+00,  4.50000e-05}
>>>    refcoord_scaling     = No
>>>    posres_com (3):
>>>       posres_com[0]= 0.00000e+00
>>>       posres_com[1]= 0.00000e+00
>>>       posres_com[2]= 0.00000e+00
>>>    posres_comB (3):
>>>       posres_comB[0]= 0.00000e+00
>>>       posres_comB[1]= 0.00000e+00
>>>       posres_comB[2]= 0.00000e+00
>>>    andersen_seed        = 815131
>>>    rlist                = 1
>>>    rlistlong            = 1
>>>    rtpi                 = 0.05
>>>    coulombtype          = PME
>>>    rcoulomb_switch      = 0
>>>    rcoulomb             = 1
>>>    vdwtype              = Cut-off
>>>    rvdw_switch          = 0
>>>    rvdw                 = 1
>>>    epsilon_r            = 1
>>>    epsilon_rf           = 1
>>>    tabext               = 1
>>>    implicit_solvent     = No
>>>    gb_algorithm         = Still
>>>    gb_epsilon_solvent   = 80
>>>    nstgbradii           = 1
>>>    rgbradii             = 1
>>>    gb_saltconc          = 0
>>>    gb_obc_alpha         = 1
>>>    gb_obc_beta          = 0.8
>>>    gb_obc_gamma         = 4.85
>>>    gb_dielectric_offset = 0.009
>>>    sa_algorithm         = Ace-approximation
>>>    sa_surface_tension   = 2.05016
>>>    DispCorr             = EnerPres
>>>    free_energy          = no
>>>    init_lambda          = 0
>>>    delta_lambda         = 0
>>>    n_foreign_lambda     = 0
>>>    sc_alpha             = 0
>>>    sc_power             = 0
>>>    sc_sigma             = 0.3
>>>    sc_sigma_min         = 0.3
>>>    nstdhdl              = 10
>>>    separate_dhdl_file   = yes
>>>    dhdl_derivatives     = yes
>>>    dh_hist_size         = 0
>>>    dh_hist_spacing      = 0.1
>>>    nwall                = 0
>>>    wall_type            = 9-3
>>>    wall_atomtype[0]     = -1
>>>    wall_atomtype[1]     = -1
>>>    wall_density[0]      = 0
>>>    wall_density[1]      = 0
>>>    wall_ewald_zfac      = 3
>>>    pull                 = no
>>>    disre                = No
>>>    disre_weighting      = Conservative
>>>    disre_mixed          = FALSE
>>>    dr_fc                = 1000
>>>    dr_tau               = 0
>>>    nstdisreout          = 100
>>>    orires_fc            = 0
>>>    orires_tau           = 0
>>>    nstorireout          = 100
>>>    dihre-fc             = 1000
>>>    em_stepsize          = 0.01
>>>    em_tol               = 10
>>>    niter                = 20
>>>    fc_stepsize          = 0
>>>    nstcgsteep           = 1000
>>>    nbfgscorr            = 10
>>>    ConstAlg             = Lincs
>>>    shake_tol            = 0.0001
>>>    lincs_order          = 4
>>>    lincs_warnangle      = 30
>>>    lincs_iter           = 1
>>>    bd_fric              = 0
>>>    ld_seed              = 1993
>>>    cos_accel            = 0
>>>    deform (3x3):
>>>       deform[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
>>>       deform[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
>>>       deform[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
>>>    userint1             = 0
>>>    userint2             = 0
>>>    userint3             = 0
>>>    userint4             = 0
>>>    userreal1            = 0
>>>    userreal2            = 0
>>>    userreal3            = 0
>>>    userreal4            = 0
>>> grpopts:
>>>    nrdf:       24715
>>>    ref_t:         325
>>>    tau_t:         0.1
>>> anneal:          No
>>> ann_npoints:           0
>>>    acc:            0           0           0
>>>    nfreeze:           N           N           N
>>>    energygrp_flags[  0]: 0
>>>    efield-x:
>>>       n = 0
>>>    efield-xt:
>>>       n = 0
>>>    efield-y:
>>>       n = 0
>>>    efield-yt:
>>>       n = 0
>>>    efield-z:
>>>       n = 0
>>>    efield-zt:
>>>       n = 0
>>>    bQMMM                = FALSE
>>>    QMconstraints        = 0
>>>    QMMMscheme           = 0
>>>    scalefactor          = 1
>>> qm_opts:
>>>    ngQM                 = 0
>>> Table routines are used for coulomb: TRUE
>>> Table routines are used for vdw:     FALSE
>>> Will do PME sum in reciprocal space.
>>> U. Essman, L. Perela, M. L. Berkowitz, T. Darden, H. Lee and L. G.
>>> Pedersen
>>> A smooth particle mesh Ewald method
>>> J. Chem. Phys. 103 (1995) pp. 8577-8592
>>> -------- -------- --- Thank You --- -------- --------
>>> Will do ordinary reciprocal space Ewald sum.
>>> Using a Gaussian width (1/beta) of 0.320163 nm for Ewald
>>> Cut-off's:   NS: 1   Coulomb: 1   LJ: 1
>>> Long Range LJ corr.:  2.9723e-04
>>> System total charge: 0.000
>>> Generated table with 1000 data points for Ewald.
>>> Tabscale = 500 points/nm
>>> Generated table with 1000 data points for LJ6.
>>> Tabscale = 500 points/nm
>>> Generated table with 1000 data points for LJ12.
>>> Tabscale = 500 points/nm
>>> Generated table with 1000 data points for 1-4 COUL.
>>> Tabscale = 500 points/nm
>>> Generated table with 1000 data points for 1-4 LJ6.
>>> Tabscale = 500 points/nm
>>> Generated table with 1000 data points for 1-4 LJ12.
>>> Tabscale = 500 points/nm
>>> Enabling SPC-like water optimization for 3910 molecules.
>>> Configuring nonbonded kernels...
>>> Configuring standard C nonbonded kernels...
>>> Initializing LINear Constraint Solver
>>> B. Hess and H. Bekker and H. J. C. Berendsen and J. G. E. M. Fraaije
>>> LINCS: A Linear Constraint Solver for molecular simulations
>>> J. Comp. Chem. 18 (1997) pp. 1463-1472
>>> -------- -------- --- Thank You --- -------- --------
>>> The number of constraints is 626
>>> S. Miyamoto and P. A. Kollman
>>> SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for Rigid
>>> Water Models
>>> J. Comp. Chem. 13 (1992) pp. 952-962
>>> -------- -------- --- Thank You --- -------- --------
>>> Center of mass motion removal mode is Linear
>>> We have the following groups for center of mass motion removal:
>>>   0:  rest
>>> G. Bussi, D. Donadio and M. Parrinello
>>> Canonical sampling through velocity rescaling
>>> J. Chem. Phys. 126 (2007) pp. 014101
>>> -------- -------- --- Thank You --- -------- --------
>>> Max number of connections per atom is 103
>>> Total number of connections is 37894
>>> Max number of graph edges per atom is 4
>>> Total number of graph edges is 16892
>>> OpenMM plugins loaded from directory
>>> /home/bkim/packages/openmm/lib/plugins:
>>> The combination rule of the used force field matches the one used by
>>> OpenMM.
>>> Gromacs will use the OpenMM platform: Cuda
>>> Non-supported GPU selected (#1, Tesla T10 Processor), forced
>>> continuing.Note, th
>>> at the simulation can be slow or it migth even crash.
>>> Pre-simulation ~15s memtest in progress...
>>> Memory test completed without errors.
>>> Entry Friedrichs2009 not found in citation database
>>> -------- -------- --- Thank You --- -------- --------
>>> Initial temperature: 0 K
>>> Started mdrun on node 0 Tue Dec 14 23:10:20 2010
>>>            Step           Time         Lambda
>>>               0        0.00000        0.00000
>>>    Energies (kJ/mol)
>>>       Potential    Kinetic En.   Total Energy    Temperature   Constr.
>>> rmsd
>>>    -1.40587e+05    3.36048e+04   -1.06982e+05    3.27065e+02
>>>  0.00000e+00
>>>            Step           Time         Lambda
>>>            1000        2.00000        0.00000
>>>    Energies (kJ/mol)
>>>       Potential    Kinetic En.   Total Energy    Temperature   Constr.
>>> rmsd
>>>             nan            nan            nan            nan
>>>  0.00000e+00
>>> Received the second INT/TERM signal, stopping at the next step
>>>            Step           Time         Lambda
>>>            1927        3.85400        0.00000
>>>    Energies (kJ/mol)
>>>       Potential    Kinetic En.   Total Energy    Temperature   Constr.
>>> rmsd
>>>             nan            nan            nan            nan
>>>  0.00000e+00
>>> Writing checkpoint, step 1927 at Tue Dec 14 23:12:07 2010
>>>         <======  ###############  ==>
>>>         <====  A V E R A G E S  ====>
>>>         <==  ###############  ======>
>>>         Statistics over 3 steps using 3 frames
>>>    Energies (kJ/mol)
>>>       Potential    Kinetic En.   Total Energy    Temperature   Constr.
>>> rmsd
>>>             nan            nan            nan            nan
>>>  0.00000e+00
>>>           Box-X          Box-Y          Box-Z
>>>     3.91363e-24    6.72623e-44   -1.71925e+16
>>>    Total Virial (kJ/mol)
>>>     0.00000e+00    0.00000e+00    0.00000e+00
>>>     0.00000e+00    0.00000e+00    0.00000e+00
>>>     0.00000e+00    0.00000e+00    0.00000e+00
>>>    Pressure (bar)
>>>     0.00000e+00    0.00000e+00    0.00000e+00
>>>     0.00000e+00    0.00000e+00    0.00000e+00
>>>     0.00000e+00    0.00000e+00    0.00000e+00
>>>    Total Dipole (D)
>>>     0.00000e+00    0.00000e+00    0.00000e+00
>>> ------------------------------------------------------------------------
>>> The input mdp file is given by
>>> ========================================================
>>> title           = OPLS Lysozyme MD
>>> ; Run parameters
>>> integrator      = md            ; leap-frog integrator
>>> nsteps          = 50000000      ;
>>> dt              = 0.002         ; 2 fs
>>> ; Output control
>>> nstxout         = 1000          ; save coordinates every 2 ps
>>> nstvout         = 1000          ; save velocities every 2 ps
>>> nstxtcout       = 1000          ; xtc compressed trajectory output every 2
>>> ps
>>> nstenergy       = 1000          ; save energies every 2 ps
>>> nstlog          = 1000          ; update log file every 2 ps
>>> ; Bond parameters
>>> continuation    = yes           ; Restarting after NPT
>>> constraint_algorithm = lincs    ; holonomic constraints
>>> constraints     = all-bonds     ; all bonds (even heavy atom-H bonds)
>>> constraine
>>> d
>>> lincs_iter      = 1             ; accuracy of LINCS
>>> lincs_order     = 4             ; also related to accuracy
>>> ; Neighborsearching
>>> ns_type         = grid          ; search neighboring grid cels
>>> nstlist         = 5             ; 10 fs
>>> rlist           = 1.0           ; short-range neighborlist cutoff (in nm)
>>> rcoulomb        = 1.0           ; short-range electrostatic cutoff (in nm)
>>> rvdw            = 1.0           ; short-range van der Waals cutoff (in nm)
>>> ; Electrostatics
>>> coulombtype     = PME           ; Particle Mesh Ewald for long-range
>>> electrostat
>>> ics
>>> pme_order       = 4             ; cubic interpolation
>>> fourierspacing  = 0.16          ; grid spacing for FFT
>>> ; Temperature coupling is on
>>> tcoupl          = V-rescale     ; modified Berendsen thermostat
>>> tc-grps         = System        ; two coupling groups - more accurate
>>> tau_t           = 0.1           ; time constant, in ps
>>> ref_t           = 325           ; reference temperature, one for each
>>> group, in
>>> K
>>> ; Pressure coupling is on
>>> pcoupl          = Parrinello-Rahman     ; Pressure coupling on in NPT
>>> pcoupltype      = isotropic     ; uniform scaling of box vectors
>>> tau_p           = 2.0           ; time constant, in ps
>>> ref_p           = 1.0           ; reference pressure, in bar
>>> compressibility = 4.5e-5        ; isothermal compressibility of water,
>>> bar^-1
>>> ; Periodic boundary conditions
>>> pbc             = xyz           ; 3-D PBC
>>> ; Dispersion correction
>>> DispCorr        = EnerPres      ; account for cut-off vdW scheme
>>> ; Velocity generation
>>> gen_vel         = no            ; Velocity generation is off
>>> =========================================================================
>>> It worked with generic cpu mdrun but gave this error when mdrun-gpu
>>> was used by
>>> mdrun-gpu -deffnm md_0_2 -device
>>> "OpenMM:platform=Cuda,deviceid=1,force-device=y
>>> es"
>>> If you have any idea how to avoid this problem, I will really appreciate
>>> it.
>>> Thank you.
>>> Bongkeun Kim
>>> --
