[gmx-users] Re: gromacs 4.6 segfault

Justin Lemkul jalemkul at vt.edu
Tue Jan 15 13:09:53 CET 2013



On 1/15/13 7:06 AM, Dr. Vitaly Chaban wrote:
>> using mdrun (version 4.6-beta3) on a GPU node (1 nvidia K10 with cuda
>> drivers and runtime 4.2 + 2 times intel 6 core E5 with hyper threading
>> and SSE4.1) I get allways after a few or few 100 ns the following segfault:
>>
>> line 15: 28957 Segmentation fault mdrun -deffnm pdz_trans_NVT_equi_4
>> -maxh 95
>>
>> I can restart the system using the cpt file and run the it for the next
>> few or few 100 ns and when I get the same segfault again.
>> The same system runs on a different cluster (mdrun version 4.6-beta3 on
>> a GPU node (1 nvidia M2090 with cuda drivers and runtime 4.2 + 2 times
>> intel 6 core X5 and SSE4.1) fine for 1μs without any complains.
>>
>> My system consists of a 95 residue protein solvated in approx 6000 spc
>> water molecules.
>> .mdp parameters:
>>
>> ;
>> title = ttt
>> cpp = /lib/cpp
>> include = -I../top
>> constraints = hbonds
>> integrator = md
>> cutoff-scheme = verlet
>>
>> dt = 0.002 ; ps !
>> nsteps = 500000000 ; total 5 ns
>> nstcomm = 25 ; frequency for center of mass motion removal
>> nstcalcenergy = 25
>> nstxout = 100000 ; frequency for writting the trajectory
>> nstvout = 100000 ; frequency for writting the velocity
>> nstfout = 100000 ; frequency to write forces to output trajectory
>> nstlog = 1000000 ; frequency to write the log file
>> nstenergy = 10000 ; frequency to write energies to energy file
>> nstxtcout = 10000
>>
>> xtc_grps = System
>>
>> nstlist = 25 ; Frequency to update the neighbor list
>> ns_type = grid ; Make a grid in the box and only check atoms in
>> neighboring grid cells when constructing a new neighbor
>> rlist = 1.4 ; cut-off distance for the short-range neighbor list
>>
>> coulombtype = PME ; Fast Particle-Mesh Ewald electrostatics
>> rcoulomb = 1.4 ; cut-off distance for the coulomb field
>> vdwtype = cut-off
>> rvdw = 1.4 ; cut-off distance for the vdw field
>> fourierspacing = 0.12 ; The maximum grid spacing for the FFT grid
>> pme_order = 6 ; Interpolation order for PME
>> optimize_fft = yes
>> pbc = xyz
>> Tcoupl = v-rescale
>> tc-grps = System
>> tau_t = 0.1
>> ref_t = 300
>>
>> energygrps = Protein Non-Protein
>>
>> Pcoupl = no;berendsen
>> tau_p = 0.1
>> compressibility = 4.5e-5
>> ref_p = 1.0
>> nstpcouple = 5
>> refcoord_scaling = all
>> Pcoupltype = isotropic
>> gen_vel = no
>> gen_temp = 300
>> gen_seed = -1
>>
>> Since I have no clue on which paramter should be tuned any guess would
>> be very welcomed.
>>
>
> I think the reason of the issue is outside your MDP file and is rather
> in the GPU installation. A primitive advice would be to decrease a
> time-step, say twice, and see what happens. Even very well
> equilibrated systems and even without GPU support, sometimes crash
> after a few millions of steps...
>

The fact that the run restarts from a checkpoint and runs for a long period of 
time and also runs on different hardware argues against that statement.  Could 
there be memory problems with the GPU card itself?

-Justin

-- 
========================================

Justin A. Lemkul, Ph.D.
Research Scientist
Department of Biochemistry
Virginia Tech
Blacksburg, VA
jalemkul[at]vt.edu | (540) 231-9080
http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin

========================================



More information about the gromacs.org_gmx-users mailing list