[gmx-users] GPU running problem with GMX-4.6 beta2

Albert mailmd2011 at gmail.com
Mon Dec 17 18:01:07 CET 2012


hello:

  I reduced the GPU to two, and it said:

Back Off! I just backed up nvt.log to ./#nvt.log.1#
Reading file nvt.tpr, VERSION 4.6-dev-20121004-5d6c49d (single precision)

NOTE: GPU(s) found, but the current simulation can not use GPUs
       To use a GPU, set the mdp option: cutoff-scheme = Verlet
       (for quick performance testing you can use the -testverlet option)

Using 2 MPI processes

4 GPUs detected on host CUDANodeA:
   #0: NVIDIA GeForce GTX 590, compute cap.: 2.0, ECC:  no, stat: compatible
   #1: NVIDIA GeForce GTX 590, compute cap.: 2.0, ECC:  no, stat: compatible
   #2: NVIDIA GeForce GTX 590, compute cap.: 2.0, ECC:  no, stat: compatible
   #3: NVIDIA GeForce GTX 590, compute cap.: 2.0, ECC:  no, stat: compatible

Making 1D domain decomposition 2 x 1 x 1

* WARNING * WARNING * WARNING * WARNING * WARNING * WARNING *
We have just committed the new CPU detection code in this branch,
and will commit new SSE/AVX kernels in a few days. However, this
means that currently only the NxN kernels are accelerated!
In the mean time, you might want to avoid production runs in 4.6.


when I run it with single GPU, it produced lots of pdb file with prefix 
"step", and then it crashed with messages:

Wrote pdb files with previous and current coordinates
Warning: 1-4 interaction between 4674 and 4706 at distance 434.986 which 
is larger than the 1-4 table size 2.200 nm
These are ignored for the rest of the simulation
This usually means your system is exploding,
if not, you should increase table-extension in your mdp file
or with user tables increase the table size
[CUDANodeA:20659] *** Process received signal ***
[CUDANodeA:20659] Signal: Segmentation fault (11)
[CUDANodeA:20659] Signal code: Address not mapped (1)
[CUDANodeA:20659] Failing at address: 0xc7aa00dc
[CUDANodeA:20659] [ 0] /lib64/libpthread.so.0(+0xf2d0) [0x2ab25c76d2d0]
[CUDANodeA:20659] [ 1] /opt/gromacs-4.6/lib/libmd_mpi.so.6(+0x11020f) 
[0x2ab259e0720f]
[CUDANodeA:20659] [ 2] /opt/gromacs-4.6/lib/libmd_mpi.so.6(+0x111c94) 
[0x2ab259e08c94]
[CUDANodeA:20659] [ 3] 
/opt/gromacs-4.6/lib/libmd_mpi.so.6(gmx_pme_do+0x1d2e) [0x2ab259e0cbae]
[CUDANodeA:20659] [ 4] 
/opt/gromacs-4.6/lib/libmd_mpi.so.6(do_force_lowlevel+0x1eef) 
[0x2ab259ddd62f]
[CUDANodeA:20659] [ 5] 
/opt/gromacs-4.6/lib/libmd_mpi.so.6(do_force_cutsGROUP+0x1495) 
[0x2ab259e72a45]
[CUDANodeA:20659] [ 6] mdrun_mpi(do_md+0x8133) [0x4334c3]
[CUDANodeA:20659] [ 7] mdrun_mpi(mdrunner+0x19e9) [0x411639]
[CUDANodeA:20659] [ 8] mdrun_mpi(main+0x17db) [0x4373db]
[CUDANodeA:20659] [ 9] /lib64/libc.so.6(__libc_start_main+0xfd) 
[0x2ab25c999bfd]
[CUDANodeA:20659] [10] mdrun_mpi() [0x407f09]
[CUDANodeA:20659] *** End of error message ***

[1]    Segmentation fault            mdrun_mpi -v -s nvt.tpr -c nvt.gro 
-g nvt.log -x nvt.xtc



here is the .mdp file I used:

title           = NVT equilibration for OR-POPC system
define          = -DPOSRES -DPOSRES_LIG ; Protein is position restrained 
(uses the posres.itp file information)
; Parameters describing the details of the NVT simulation protocol
integrator      = md            ; Algorithm ("md" = molecular dynamics 
[leap-frog integrator]; "md-vv" = md using velocity verlet; sd = 
stochastic dynamics)
dt              = 0.002         ; Time-step (ps)
nsteps          = 250000        ; Number of steps to run (0.002 * 250000 
= 500 ps)

; Parameters controlling output writing
nstxout         = 0             ; Write coordinates to output .trr file 
every 2 ps
nstvout         = 0             ; Write velocities to output .trr file 
every 2 ps
nstfout         = 0

nstxtcout       = 1000
nstenergy       = 1000          ; Write energies to output .edr file 
every 2 ps
nstlog          = 1000          ; Write output to .log file every 2 ps

; Parameters describing neighbors searching and details about 
interaction calculations
ns_type         = grid          ; Neighbor list search method (simple, grid)
nstlist         = 50            ; Neighbor list update frequency (after 
every given number of steps)
rlist           = 1.2           ; Neighbor list search cut-off distance (nm)
rlistlong       = 1.4
rcoulomb        = 1.2           ; Short-range Coulombic interactions 
cut-off distance (nm)
rvdw            = 1.2           ; Short-range van der Waals cutoff 
distance (nm)
pbc             = xyz           ; Direction in which to use Perodic 
Boundary Conditions (xyz, xy, no)
cutoff-scheme   =Verlet  ; GPU running

; Parameters for treating bonded interactions
continuation    = no            ; Whether a fresh start or a 
continuation from a previous run (yes/no)
constraint_algorithm = LINCS    ; Constraint algorithm (LINCS / SHAKE)
constraints     = all-bonds     ; Which bonds/angles to constrain 
(all-bonds / hbonds / none / all-angles / h-angles)
lincs_iter      = 1             ; Number of iterations to correct for 
rotational lengthening in LINCS (related to accuracy)
lincs_order     = 4             ; Highest order in the expansion of the 
constraint coupling matrix (related to accuracy)

; Parameters for treating electrostatic interactions
coulombtype     = PME           ; Long range electrostatic interactions 
treatment (cut-off, Ewald, PME)
pme_order       = 4             ; Interpolation order for PME (cubic 
interpolation is represented by 4)
fourierspacing  = 0.12          ; Maximum grid spacing for FFT grid 
using PME (nm)

; Temperature coupling parameters
tcoupl          = V-rescale             ; Modified Berendsen thermostat 
using velocity rescaling
tc-grps         = Protein_LIG POPC Water_and_ions ; Define groups to be 
coupled separately to temperature bath
tau_t           = 0.1   0.1     0.1     ; Group-wise coupling time 
constant (ps)
ref_t           = 303   303     303     ; Group-wise reference 
temperature (K)

; Pressure coupling parameters
pcoupl          = no            ; Under NVT conditions pressure coupling 
is not done

; Miscellaneous control parameters
; Dispersion correction
DispCorr        = EnerPres      ; Dispersion corrections for Energy and 
Pressure for vdW cut-off
; Initial Velocity Generation
gen_vel         = yes           ; Generate velocities from Maxwell 
distribution at given temperature
gen_temp        = 303           ; Specific temperature for Maxwell 
distribution (K)
gen_seed        = -1            ; Use random seed for velocity 
generation (integer; -1 means seed is calculated from the process ID number)
; Centre of mass (COM) motion removal relative to the specified groups
nstcomm         = 1                     ; COM removal frequency (steps)
comm_mode       = Linear                ; Remove COM translation (linear 
/ angular / no)
comm_grps       = Protein_LIG_POPC Water_and_ions ; COM removal relative 
to the specified groups

THX





On 12/17/2012 05:45 PM, Szilárd Páll wrote:
> Hi,
>
> That unfortunately tell exactly about the reason why mdrun is stuck. Can
> you reproduce the issue on another machines or with different launch
> configurations? At which step does it get stuck (-stepout 1 can help)?
>
> Please try the following:
> - try running on a single GPU;
> - try running on CPUs only (-nb cpu and to match closer the GPU setup with
> -ntomp 12);
> - try running in GPU emulation mode with the GMX_EMULATE_GPU=1 env. var
> set (and to match closer the GPU setup with -ntomp 12)
> - provide a backtrace (using gdb).
>
> Cheers,
>
> --
> Szilárd
>
>
>
> On Mon, Dec 17, 2012 at 5:37 PM, Albert <mailmd2011 at gmail.com> wrote:
>
>> hello:
>>
>>   I am running GMX-4.6 beta2 GPU work in a 24 CPU core workstation with two
>> GTX590, it stacked there without any output i.e the .xtc file size is
>> always 0 after hours of running. Here is the md.log file I found:
>>
>>
>> Using CUDA 8x8x8 non-bonded kernels
>>
>> Potential shift: LJ r^-12: 0.112 r^-6 0.335, Ewald 1.000e-05
>> Initialized non-bonded Ewald correction tables, spacing: 7.82e-04 size:
>> 1536
>>
>> Removing pbc first time
>> Pinning to Hyper-Threading cores with 12 physical cores in a compute node
>> There are 1 flexible constraints
>>
>> WARNING: step size for flexible constraining = 0
>>           All flexible constraints will be rigid.
>>           Will try to keep all flexible constraints at their original
>> length,
>>           but the lengths may exhibit some drift.
>>
>> Initializing Parallel LINear Constraint Solver
>> Linking all bonded interactions to atoms
>> There are 161872 inter charge-group exclusions,
>> will use an extra communication step for exclusion forces for PME
>>
>> The initial number of communication pulses is: X 1
>> The initial domain decomposition cell size is: X 1.83 nm
>>
>> The maximum allowed distance for charge groups involved in interactions is:
>>                   non-bonded interactions           1.200 nm
>> (the following are initial values, they could change due to box
>> deformation)
>>              two-body bonded interactions  (-rdd)   1.200 nm
>>            multi-body bonded interactions  (-rdd)   1.200 nm
>>    atoms separated by up to 5 constraints  (-rcon)  1.826 nm
>>
>> When dynamic load balancing gets turned on, these settings will change to:
>> The maximum number of communication pulses is: X 1
>> The minimum size for domain decomposition cells is 1.200 nm
>> The requested allowed shrink of DD cells (option -dds) is: 0.80
>> The allowed shrink of domain decomposition cells is: X 0.66
>> The maximum allowed distance for charge groups involved in interactions is:
>>                   non-bonded interactions           1.200 nm
>>              two-body bonded interactions  (-rdd)   1.200 nm
>>            multi-body bonded interactions  (-rdd)   1.200 nm
>>    atoms separated by up to 5 constraints  (-rcon)  1.200 nm
>>
>> Making 1D domain decomposition grid 4 x 1 x 1, home cell index 0 0 0
>>
>> Center of mass motion removal mode is Linear
>> We have the following groups for center of mass motion removal:
>>    0:  Protein_LIG_POPC
>>    1:  Water_and_ions
>>
>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>> G. Bussi, D. Donadio and M. Parrinello
>> Canonical sampling through velocity rescaling
>> J. Chem. Phys. 126 (2007) pp. 014101
>> -------- -------- --- Thank You --- -------- --------
>>
>>
>>
>> THX
>> --
>> gmx-users mailing list    gmx-users at gromacs.org
>> http://lists.gromacs.org/**mailman/listinfo/gmx-users<http://lists.gromacs.org/mailman/listinfo/gmx-users>
>> * Please search the archive at http://www.gromacs.org/**
>> Support/Mailing_Lists/Search<http://www.gromacs.org/Support/Mailing_Lists/Search>before posting!
>> * Please don't post (un)subscribe requests to the list. Use the www
>> interface or send it to gmx-users-request at gromacs.org.
>> * Can't post? Read http://www.gromacs.org/**Support/Mailing_Lists<http://www.gromacs.org/Support/Mailing_Lists>
>>




More information about the gromacs.org_gmx-users mailing list