[gmx-users] Re: gromacs 4.6 segfault

sebastian sebastian.waltz at physik.uni-freiburg.de
Tue Jan 15 13:15:28 CET 2013


On 01/15/2013 01:09 PM, Justin Lemkul wrote:
>
>
> On 1/15/13 7:06 AM, Dr. Vitaly Chaban wrote:
>>> using mdrun (version 4.6-beta3) on a GPU node (1 nvidia K10 with cuda
>>> drivers and runtime 4.2 + 2 times intel 6 core E5 with hyper threading
>>> and SSE4.1) I get allways after a few or few 100 ns the following 
>>> segfault:
>>>
>>> line 15: 28957 Segmentation fault mdrun -deffnm pdz_trans_NVT_equi_4
>>> -maxh 95
>>>
>>> I can restart the system using the cpt file and run the it for the next
>>> few or few 100 ns and when I get the same segfault again.
>>> The same system runs on a different cluster (mdrun version 4.6-beta3 on
>>> a GPU node (1 nvidia M2090 with cuda drivers and runtime 4.2 + 2 times
>>> intel 6 core X5 and SSE4.1) fine for 1μs without any complains.
>>>
>>> My system consists of a 95 residue protein solvated in approx 6000 spc
>>> water molecules.
>>> .mdp parameters:
>>>
>>> ;
>>> title = ttt
>>> cpp = /lib/cpp
>>> include = -I../top
>>> constraints = hbonds
>>> integrator = md
>>> cutoff-scheme = verlet
>>>
>>> dt = 0.002 ; ps !
>>> nsteps = 500000000 ; total 5 ns
>>> nstcomm = 25 ; frequency for center of mass motion removal
>>> nstcalcenergy = 25
>>> nstxout = 100000 ; frequency for writting the trajectory
>>> nstvout = 100000 ; frequency for writting the velocity
>>> nstfout = 100000 ; frequency to write forces to output trajectory
>>> nstlog = 1000000 ; frequency to write the log file
>>> nstenergy = 10000 ; frequency to write energies to energy file
>>> nstxtcout = 10000
>>>
>>> xtc_grps = System
>>>
>>> nstlist = 25 ; Frequency to update the neighbor list
>>> ns_type = grid ; Make a grid in the box and only check atoms in
>>> neighboring grid cells when constructing a new neighbor
>>> rlist = 1.4 ; cut-off distance for the short-range neighbor list
>>>
>>> coulombtype = PME ; Fast Particle-Mesh Ewald electrostatics
>>> rcoulomb = 1.4 ; cut-off distance for the coulomb field
>>> vdwtype = cut-off
>>> rvdw = 1.4 ; cut-off distance for the vdw field
>>> fourierspacing = 0.12 ; The maximum grid spacing for the FFT grid
>>> pme_order = 6 ; Interpolation order for PME
>>> optimize_fft = yes
>>> pbc = xyz
>>> Tcoupl = v-rescale
>>> tc-grps = System
>>> tau_t = 0.1
>>> ref_t = 300
>>>
>>> energygrps = Protein Non-Protein
>>>
>>> Pcoupl = no;berendsen
>>> tau_p = 0.1
>>> compressibility = 4.5e-5
>>> ref_p = 1.0
>>> nstpcouple = 5
>>> refcoord_scaling = all
>>> Pcoupltype = isotropic
>>> gen_vel = no
>>> gen_temp = 300
>>> gen_seed = -1
>>>
>>> Since I have no clue on which paramter should be tuned any guess would
>>> be very welcomed.
>>>
>>
>> I think the reason of the issue is outside your MDP file and is rather
>> in the GPU installation. A primitive advice would be to decrease a
>> time-step, say twice, and see what happens. Even very well
>> equilibrated systems and even without GPU support, sometimes crash
>> after a few millions of steps...
>>
>
> The fact that the run restarts from a checkpoint and runs for a long 
> period of time and also runs on different hardware argues against that 
> statement.  Could there be memory problems with the GPU card itself?
>
> -Justin
>
There is a good change that the ECC of the new K10 creates some strange 
errors. I should ask our admin to turn OFF the ECC to check if this 
changes something

Thanks

Sebastian



More information about the gromacs.org_gmx-users mailing list