[gmx-users] GPU running problem with GMX-4.6 beta2

Szilárd Páll szilard.pall at cbr.su.se
Mon Dec 17 18:08:22 CET 2012


Hi,

How about GPU emulation or CPU-only runs? Also, please try setting the
number of therads to 1 (-ntomp 1).


--
Szilárd



On Mon, Dec 17, 2012 at 6:01 PM, Albert <mailmd2011 at gmail.com> wrote:

> hello:
>
>  I reduced the GPU to two, and it said:
>
> Back Off! I just backed up nvt.log to ./#nvt.log.1#
> Reading file nvt.tpr, VERSION 4.6-dev-20121004-5d6c49d (single precision)
>
> NOTE: GPU(s) found, but the current simulation can not use GPUs
>       To use a GPU, set the mdp option: cutoff-scheme = Verlet
>       (for quick performance testing you can use the -testverlet option)
>
> Using 2 MPI processes
>
> 4 GPUs detected on host CUDANodeA:
>   #0: NVIDIA GeForce GTX 590, compute cap.: 2.0, ECC:  no, stat: compatible
>   #1: NVIDIA GeForce GTX 590, compute cap.: 2.0, ECC:  no, stat: compatible
>   #2: NVIDIA GeForce GTX 590, compute cap.: 2.0, ECC:  no, stat: compatible
>   #3: NVIDIA GeForce GTX 590, compute cap.: 2.0, ECC:  no, stat: compatible
>
> Making 1D domain decomposition 2 x 1 x 1
>
> * WARNING * WARNING * WARNING * WARNING * WARNING * WARNING *
> We have just committed the new CPU detection code in this branch,
> and will commit new SSE/AVX kernels in a few days. However, this
> means that currently only the NxN kernels are accelerated!
> In the mean time, you might want to avoid production runs in 4.6.
>
>
> when I run it with single GPU, it produced lots of pdb file with prefix
> "step", and then it crashed with messages:
>
> Wrote pdb files with previous and current coordinates
> Warning: 1-4 interaction between 4674 and 4706 at distance 434.986 which
> is larger than the 1-4 table size 2.200 nm
> These are ignored for the rest of the simulation
> This usually means your system is exploding,
> if not, you should increase table-extension in your mdp file
> or with user tables increase the table size
> [CUDANodeA:20659] *** Process received signal ***
> [CUDANodeA:20659] Signal: Segmentation fault (11)
> [CUDANodeA:20659] Signal code: Address not mapped (1)
> [CUDANodeA:20659] Failing at address: 0xc7aa00dc
> [CUDANodeA:20659] [ 0] /lib64/libpthread.so.0(+**0xf2d0) [0x2ab25c76d2d0]
> [CUDANodeA:20659] [ 1] /opt/gromacs-4.6/lib/libmd_**mpi.so.6(+0x11020f)
> [0x2ab259e0720f]
> [CUDANodeA:20659] [ 2] /opt/gromacs-4.6/lib/libmd_**mpi.so.6(+0x111c94)
> [0x2ab259e08c94]
> [CUDANodeA:20659] [ 3] /opt/gromacs-4.6/lib/libmd_**mpi.so.6(gmx_pme_do+0x1d2e)
> [0x2ab259e0cbae]
> [CUDANodeA:20659] [ 4] /opt/gromacs-4.6/lib/libmd_**
> mpi.so.6(do_force_lowlevel+**0x1eef) [0x2ab259ddd62f]
> [CUDANodeA:20659] [ 5] /opt/gromacs-4.6/lib/libmd_**
> mpi.so.6(do_force_cutsGROUP+**0x1495) [0x2ab259e72a45]
> [CUDANodeA:20659] [ 6] mdrun_mpi(do_md+0x8133) [0x4334c3]
> [CUDANodeA:20659] [ 7] mdrun_mpi(mdrunner+0x19e9) [0x411639]
> [CUDANodeA:20659] [ 8] mdrun_mpi(main+0x17db) [0x4373db]
> [CUDANodeA:20659] [ 9] /lib64/libc.so.6(__libc_start_**main+0xfd)
> [0x2ab25c999bfd]
> [CUDANodeA:20659] [10] mdrun_mpi() [0x407f09]
> [CUDANodeA:20659] *** End of error message ***
>
> [1]    Segmentation fault            mdrun_mpi -v -s nvt.tpr -c nvt.gro -g
> nvt.log -x nvt.xtc
>
>
>
> here is the .mdp file I used:
>
> title           = NVT equilibration for OR-POPC system
> define          = -DPOSRES -DPOSRES_LIG ; Protein is position restrained
> (uses the posres.itp file information)
> ; Parameters describing the details of the NVT simulation protocol
> integrator      = md            ; Algorithm ("md" = molecular dynamics
> [leap-frog integrator]; "md-vv" = md using velocity verlet; sd = stochastic
> dynamics)
> dt              = 0.002         ; Time-step (ps)
> nsteps          = 250000        ; Number of steps to run (0.002 * 250000 =
> 500 ps)
>
> ; Parameters controlling output writing
> nstxout         = 0             ; Write coordinates to output .trr file
> every 2 ps
> nstvout         = 0             ; Write velocities to output .trr file
> every 2 ps
> nstfout         = 0
>
> nstxtcout       = 1000
> nstenergy       = 1000          ; Write energies to output .edr file every
> 2 ps
> nstlog          = 1000          ; Write output to .log file every 2 ps
>
> ; Parameters describing neighbors searching and details about interaction
> calculations
> ns_type         = grid          ; Neighbor list search method (simple,
> grid)
> nstlist         = 50            ; Neighbor list update frequency (after
> every given number of steps)
> rlist           = 1.2           ; Neighbor list search cut-off distance
> (nm)
> rlistlong       = 1.4
> rcoulomb        = 1.2           ; Short-range Coulombic interactions
> cut-off distance (nm)
> rvdw            = 1.2           ; Short-range van der Waals cutoff
> distance (nm)
> pbc             = xyz           ; Direction in which to use Perodic
> Boundary Conditions (xyz, xy, no)
> cutoff-scheme   =Verlet  ; GPU running
>
> ; Parameters for treating bonded interactions
> continuation    = no            ; Whether a fresh start or a continuation
> from a previous run (yes/no)
> constraint_algorithm = LINCS    ; Constraint algorithm (LINCS / SHAKE)
> constraints     = all-bonds     ; Which bonds/angles to constrain
> (all-bonds / hbonds / none / all-angles / h-angles)
> lincs_iter      = 1             ; Number of iterations to correct for
> rotational lengthening in LINCS (related to accuracy)
> lincs_order     = 4             ; Highest order in the expansion of the
> constraint coupling matrix (related to accuracy)
>
> ; Parameters for treating electrostatic interactions
> coulombtype     = PME           ; Long range electrostatic interactions
> treatment (cut-off, Ewald, PME)
> pme_order       = 4             ; Interpolation order for PME (cubic
> interpolation is represented by 4)
> fourierspacing  = 0.12          ; Maximum grid spacing for FFT grid using
> PME (nm)
>
> ; Temperature coupling parameters
> tcoupl          = V-rescale             ; Modified Berendsen thermostat
> using velocity rescaling
> tc-grps         = Protein_LIG POPC Water_and_ions ; Define groups to be
> coupled separately to temperature bath
> tau_t           = 0.1   0.1     0.1     ; Group-wise coupling time
> constant (ps)
> ref_t           = 303   303     303     ; Group-wise reference temperature
> (K)
>
> ; Pressure coupling parameters
> pcoupl          = no            ; Under NVT conditions pressure coupling
> is not done
>
> ; Miscellaneous control parameters
> ; Dispersion correction
> DispCorr        = EnerPres      ; Dispersion corrections for Energy and
> Pressure for vdW cut-off
> ; Initial Velocity Generation
> gen_vel         = yes           ; Generate velocities from Maxwell
> distribution at given temperature
> gen_temp        = 303           ; Specific temperature for Maxwell
> distribution (K)
> gen_seed        = -1            ; Use random seed for velocity generation
> (integer; -1 means seed is calculated from the process ID number)
> ; Centre of mass (COM) motion removal relative to the specified groups
> nstcomm         = 1                     ; COM removal frequency (steps)
> comm_mode       = Linear                ; Remove COM translation (linear /
> angular / no)
> comm_grps       = Protein_LIG_POPC Water_and_ions ; COM removal relative
> to the specified groups
>
> THX
>
>
>
>
>
>
> On 12/17/2012 05:45 PM, Szilárd Páll wrote:
>
>> Hi,
>>
>> That unfortunately tell exactly about the reason why mdrun is stuck. Can
>> you reproduce the issue on another machines or with different launch
>> configurations? At which step does it get stuck (-stepout 1 can help)?
>>
>> Please try the following:
>> - try running on a single GPU;
>> - try running on CPUs only (-nb cpu and to match closer the GPU setup with
>> -ntomp 12);
>> - try running in GPU emulation mode with the GMX_EMULATE_GPU=1 env. var
>> set (and to match closer the GPU setup with -ntomp 12)
>> - provide a backtrace (using gdb).
>>
>> Cheers,
>>
>> --
>> Szilárd
>>
>>
>>
>> On Mon, Dec 17, 2012 at 5:37 PM, Albert <mailmd2011 at gmail.com> wrote:
>>
>>  hello:
>>>
>>>   I am running GMX-4.6 beta2 GPU work in a 24 CPU core workstation with
>>> two
>>> GTX590, it stacked there without any output i.e the .xtc file size is
>>> always 0 after hours of running. Here is the md.log file I found:
>>>
>>>
>>> Using CUDA 8x8x8 non-bonded kernels
>>>
>>> Potential shift: LJ r^-12: 0.112 r^-6 0.335, Ewald 1.000e-05
>>> Initialized non-bonded Ewald correction tables, spacing: 7.82e-04 size:
>>> 1536
>>>
>>> Removing pbc first time
>>> Pinning to Hyper-Threading cores with 12 physical cores in a compute node
>>> There are 1 flexible constraints
>>>
>>> WARNING: step size for flexible constraining = 0
>>>           All flexible constraints will be rigid.
>>>           Will try to keep all flexible constraints at their original
>>> length,
>>>           but the lengths may exhibit some drift.
>>>
>>> Initializing Parallel LINear Constraint Solver
>>> Linking all bonded interactions to atoms
>>> There are 161872 inter charge-group exclusions,
>>> will use an extra communication step for exclusion forces for PME
>>>
>>> The initial number of communication pulses is: X 1
>>> The initial domain decomposition cell size is: X 1.83 nm
>>>
>>> The maximum allowed distance for charge groups involved in interactions
>>> is:
>>>                   non-bonded interactions           1.200 nm
>>> (the following are initial values, they could change due to box
>>> deformation)
>>>              two-body bonded interactions  (-rdd)   1.200 nm
>>>            multi-body bonded interactions  (-rdd)   1.200 nm
>>>    atoms separated by up to 5 constraints  (-rcon)  1.826 nm
>>>
>>> When dynamic load balancing gets turned on, these settings will change
>>> to:
>>> The maximum number of communication pulses is: X 1
>>> The minimum size for domain decomposition cells is 1.200 nm
>>> The requested allowed shrink of DD cells (option -dds) is: 0.80
>>> The allowed shrink of domain decomposition cells is: X 0.66
>>> The maximum allowed distance for charge groups involved in interactions
>>> is:
>>>                   non-bonded interactions           1.200 nm
>>>              two-body bonded interactions  (-rdd)   1.200 nm
>>>            multi-body bonded interactions  (-rdd)   1.200 nm
>>>    atoms separated by up to 5 constraints  (-rcon)  1.200 nm
>>>
>>> Making 1D domain decomposition grid 4 x 1 x 1, home cell index 0 0 0
>>>
>>> Center of mass motion removal mode is Linear
>>> We have the following groups for center of mass motion removal:
>>>    0:  Protein_LIG_POPC
>>>    1:  Water_and_ions
>>>
>>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>>> G. Bussi, D. Donadio and M. Parrinello
>>> Canonical sampling through velocity rescaling
>>> J. Chem. Phys. 126 (2007) pp. 014101
>>> -------- -------- --- Thank You --- -------- --------
>>>
>>>
>>>
>>> THX
>>> --
>>> gmx-users mailing list    gmx-users at gromacs.org
>>> http://lists.gromacs.org/****mailman/listinfo/gmx-users<http://lists.gromacs.org/**mailman/listinfo/gmx-users>
>>> <htt**p://lists.gromacs.org/mailman/**listinfo/gmx-users<http://lists.gromacs.org/mailman/listinfo/gmx-users>
>>> >
>>> * Please search the archive at http://www.gromacs.org/**
>>> Support/Mailing_Lists/Search<h**ttp://www.gromacs.org/Support/**
>>> Mailing_Lists/Search<http://www.gromacs.org/Support/Mailing_Lists/Search>>before
>>> posting!
>>>
>>> * Please don't post (un)subscribe requests to the list. Use the www
>>> interface or send it to gmx-users-request at gromacs.org.
>>> * Can't post? Read http://www.gromacs.org/****Support/Mailing_Lists<http://www.gromacs.org/**Support/Mailing_Lists>
>>> <http://**www.gromacs.org/Support/**Mailing_Lists<http://www.gromacs.org/Support/Mailing_Lists>
>>> >
>>>
>>>
> --
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/**mailman/listinfo/gmx-users<http://lists.gromacs.org/mailman/listinfo/gmx-users>
> * Please search the archive at http://www.gromacs.org/**
> Support/Mailing_Lists/Search<http://www.gromacs.org/Support/Mailing_Lists/Search>before posting!
> * Please don't post (un)subscribe requests to the list. Use the www
> interface or send it to gmx-users-request at gromacs.org.
> * Can't post? Read http://www.gromacs.org/**Support/Mailing_Lists<http://www.gromacs.org/Support/Mailing_Lists>
>



More information about the gromacs.org_gmx-users mailing list