[gmx-users] Tabulated potential segmentation fault

Mon Apr 16 18:06:06 CEST 2012

On 17/04/2012 1:50 AM, Laura Leay wrote:
> On Fri, 2012-04-13 at 11:01 +1000, Mark Abraham wrote:
>> On 13/04/2012 2:48 AM, Laura Leay wrote:
>>> All,
>>>
>>> I'm trying to run a tabulated soft core potential with the form V =
>>> A + Br^2 + Cr^3 up to about r=0.1 A and the normal LJ 6-12 potential
>>> after this.
>>>
>>> I've chosen the parameters of this equation to be the same for all
>>> atoms in my system (a polymer containing carbon, nitrogen and
>>> hydrogen). I've not assigned any charges to the system.
>>>
>>> Running on Gromacs version 4.5.4 single precision on a high
>>> perfomance computing cluster the first 50 or so steps run fine,
>>> energies seem reasonable but then the simulation crashes with a
>>> segmentation fault. I submitted the job using the comand mdrun
>>> -table table.xvg -v -nt $NSLOTS -pd
>>>
>>> The job seems to run ok on my own desktop PC although I've not tried
>>> running it for more than a few minutes to check that it would indeed
>>> run.
>>>
>>> If anyone can tell me why this won't run on the computing cluster
>>> I'd appreciate it.
>>>
>>> the first few lines of my table file look like this:
>>>
>>>     0.0000000E+00   0.0000000E+00   0.0000000E+00   0.0000000E+00
>>> 0.0000000E+00   0.1500000E+05   0.0000000E+00
>>>     0.2000000E-02   0.0000000E+00   0.0000000E+00   0.3045913E-01
>>> -0.4568869E+02   0.1499547E+05   0.4525801E+04
>>>     0.4000000E-02   0.0000000E+00   0.0000000E+00   0.2436730E+00
>>> -0.1827548E+03   0.1498190E+05   0.9051602E+04
>>>     0.6000000E-02   0.0000000E+00   0.0000000E+00   0.8223965E+00
>>> -0.4111982E+03   0.1495927E+05   0.1357740E+05
>>>     0.8000000E-02   0.0000000E+00   0.0000000E+00   0.1949384E+01
>>> -0.7310191E+03   0.1492759E+05   0.1810320E+05
>>>     0.1000000E-01   0.0000000E+00   0.0000000E+00   0.3807391E+01
>>> -0.1142217E+04   0.1488685E+05   0.2262901E+05
>>>     0.1200000E-01   0.0000000E+00   0.0000000E+00   0.6579172E+01
>>> -0.1644793E+04   0.1483707E+05   0.2715481E+05
>>>     0.1400000E-01   0.0000000E+00   0.0000000E+00   0.1044748E+02
>>> -0.2238746E+04   0.1477824E+05   0.3168061E+05
>>>     0.1600000E-01   0.0000000E+00   0.0000000E+00   0.1559507E+02
>>> -0.2924076E+04   0.1471035E+05   0.3620641E+05
>>>     0.1800000E-01   0.0000000E+00   0.0000000E+00   0.2220471E+02
>>> -0.3700784E+04   0.1463341E+05   0.4073221E+05
>>>     0.2000000E-01   0.0000000E+00   0.0000000E+00   0.3045913E+02
>>> -0.4568869E+04   0.1454742E+05   0.4525801E+05
>>>     0.2200000E-01   0.0000000E+00   0.0000000E+00   0.4054110E+02
>>> -0.5528332E+04   0.1445238E+05   0.4978381E+05
>>>
>>> This is my mdp file (note that I turned dispersion correction off to
>>> see if this was the problem but it would seem that it is not):
>>>
>>> ; VARIOUS PREPROCESSING OPTIONS
>>> title                    = Yo
>>> cpp                      = /usr/bin/cpp
>>> include                  =
>>> define                   =
>>>
>>> ; RUN CONTROL PARAMETERS
>>> integrator               = md ;md for simulation, steep for Emin
>>> ; Start time and timestep in ps
>>> tinit                    = 0
>>> dt                       = 0.001
>>> nsteps                   =100000; 1000000 ;for simulation
>>> ; For exact run continuation or redoing part of a run
>>> init_step                = 0
>>> ; mode for center of mass motion removal
>>> comm-mode                = Linear
>>> ; number of steps for center of mass motion removal
>>> nstcomm                  = 1
>>> ; group(s) for center of mass motion removal
>>> comm-grps                =
>>>
>>> ; LANGEVIN DYNAMICS OPTIONS
>>> ; Temperature, friction coefficient (amu/ps) and random seed
>>> ;bd-temp                  = 300
>>> bd-fric                  = 0
>>> ld-seed                  = 1993
>>>
>>> ; ENERGY MINIMIZATION OPTIONS
>>> ; Force tolerance and initial step-size
>>> emtol                    = 100
>>> emstep                   = 0.01
>>> ; Max number of iterations in relax_shells
>>> niter                    = 20
>>> ; Step size (1/ps^2) for minimization of flexible constraints
>>> fcstep                   = 0
>>> ; Frequency of steepest descents steps when doing CG
>>> fcstep                   = 0
>>> ; Frequency of steepest descents steps when doing CG
>>> nstcgsteep               = 1000
>>> nbfgscorr                = 10
>>>
>>> ; OUTPUT CONTROL OPTIONS
>>> ; Output frequency for coords (x), velocities (v) and forces (f)
>>> nstxout                  = 0
>>> nstvout                  = 0
>>> nstfout                  = 0
>>> ; Checkpointing helps you continue after crashes
>>> nstcheckpoint            = 1000
>>> ; Output frequency for energies to log file and energy file
>>> nstlog                   = 50
>>> nstenergy                = 50
>>> ; Output frequency and precision for xtc file
>>> nstxtcout                = 50
>>> xtc-precision            = 1000
>>> ; This selects the subset of atoms for the xtc file. You can
>>> ; select multiple groups. By default all atoms will be written.
>>> xtc-grps                 =
>>> ; Selection of energy groups
>>> energygrps               =
>>>
>>> ; NEIGHBORSEARCHING PARAMETERS
>>> ; nblist update frequency
>>> nstlist                  = 10
>>> ; ns algorithm (simple or grid)
>>> ns_type                  = simple
>>> ; Periodic boundary conditions: xyz (default), no (vacuum)
>>> ; or full (infinite systems only)
>>> pbc                      = xyz
>>> ; nblist cut-off
>>> rlist                    = 0.9
>>> domain-decomposition     = no
>>>
>>> ; OPTIONS FOR ELECTROSTATICS AND VDW
>>> ; Method for doing electrostatics
>>> coulombtype              = user
>>> rcoulomb-switch          = 0
>>> rcoulomb                 = 0.9
>>> ; Dielectric constant (DC) for cut-off or DC of reaction field
>>> epsilon-r                = 1
>>> ; Method for doing Van der Waals
>>> vdw-type                 = user
>>> ; cut-off lengths
>>> rvdw-switch              = 0
>>> rvdw                     = 0.9
>>> ; Apply long range dispersion corrections for Energy and Pressure
>>> DispCorr                 = no ;EnerPres
>>> ; Extension of the potential lookup tables beyond the cut-off
>>> table-extension          = 2.0
>>> ; Spacing for the PME/PPPM FFT grid
>>> fourierspacing           = 0.12
>>> ; FFT grid size, when a value is 0 fourierspacing will be used
>>> fourier_nx               = 0
>>> fourier_ny               = 0
>>> fourier_nz               = 0
>>> ; EWALD/PME/PPPM parameters
>>> pme_order                = 4
>>> ewald_rtol               = 1e-05
>>> ewald_geometry           = 3d
>>> epsilon_surface          = 0
>>> optimize_fft             = no
>>>
>>> ; GENERALIZED BORN ELECTROSTATICS
>>> ; Algorithm for calculating Born radii
>>> gb_algorithm             = Still
>>> ; Frequency of calculating the Born radii inside rlist
>>> nstgbradii               = 1
>>> ; Cutoff for Born radii calculation; the contribution from atoms
>>> ; between rlist and rgbradii is updated every nstlist steps
>>> rgbradii                 = 2
>>> ; Salt concentration in M for Generalized Born models
>>> gb_saltconc              = 0
>>>
>>> ; IMPLICIT SOLVENT (for use with Generalized Born electrostatics)
>>> implicit_solvent         = No
>>>
>>> ; OPTIONS FOR WEAK COUPLING ALGORITHMS
>>> ; Temperature coupling
>>> Tcoupl                   = berendsen
>>> ; Groups to couple separately
>>> tc-grps                  = System
>>> ; Time constant (ps) and reference temperature (K)
>>> tau_t                    = 0.1
>>> ref_t                    = 300
>>> ; Pressure coupling
>>> Pcoupl                   = no ;berendsen
>>> Pcoupltype               = isotropic
>>> ; Time constant (ps), compressibility (1/bar) and reference P (bar)
>>> tau_p                    = 1.0
>>> compressibility          = 4.5e-5
>>> ref_p                    = 1.0
>>> ; Random seed for Andersen thermostat
>>> andersen_seed            = 815131
>>>
>>> ; SIMULATED ANNEALING
>>> ; Type of annealing for each temperature group (no/single/periodic)
>>> annealing                = no
>>> ; Number of time points to use for specifying annealing in each
>>> group
>>> annealing_npoints        =
>>> ; List of times at the annealing points for each group
>>> annealing_time           =
>>> ; Temp. at each annealing point, for each group.
>>> annealing_temp           =
>>> ; GENERATE VELOCITIES FOR STARTUP RUN
>>> gen_vel                  = yes
>>> gen_temp                 = 300
>>> gen_seed                 = 1993
>>>
>>> ; OPTIONS FOR BONDS
>>> ;constraints              = all-bonds
>>> ; Type of constraint algorithm
>>> constraint-algorithm     = Lincs
>>> ; Do not constrain the start configuration
>>> unconstrained-start      = no
>>> ; Use successive overrelaxation to reduce the number of shake
>>> iterations
>>> Shake-SOR                = no
>>> ; Relative tolerance of shake
>>> shake-tol                = 1e-04
>>> ; Highest order in the expansion of the constraint coupling matrix
>>> lincs-order              = 4
>>> ; Number of iterations in the final step of LINCS. 1 is fine for
>>> ; normal simulations, but use 2 to conserve energy in NVE runs.
>>> ; For energy minimization with constraints it should be 4 to 8.
>>> lincs-iter               = 1
>>> ; Lincs will write a warning to the stderr if in one step a bond
>>> ; rotates over more degrees than
>>> lincs-warnangle          = 30
>>> ; Convert harmonic bonds to morse potentials
>>> morse                    = no
>>>
>>> ; ENERGY GROUP EXCLUSIONS
>>> ; Pairs of energy groups for which all non-bonded interactions are
>>> excluded
>>> energygrp_excl           =
>>>
>>>
>> You're probably just
>> http://www.gromacs.org/Documentation/Terminology/Blowing_Up. I suggest
>> equilibrating with a normal potential, and then shifting to your
>> special regime. Then you can exclude initial conditions as the source
>> of the problem, so long as your special regime is not wildly different
>> from a normal potential.
>>
>> Mark
> The system was first energy minimised without the tabulated potential.
> It seems as if it is not blowing up on my desktop PC. Its currently up
> to step 4800 (compared to step 50 running on the cluster) and still
> going. I'm going to continue to run the job on my desktop PC to see if
> it will complete.
>
> The problem definitely seems to lie with the job running on the
> computing cluster but I don't know enough about parallel computing etc
> to know what the problem is. Any help would be appreciated,

Numerical integration from marginally stable initial conditions can 
succeed or fail pretty randomly. See 
http://www.gromacs.org/Documentation/Terminology/Reproducibility for 
discussion. On the information we have, the hypothesis that running in 
parallel causes problems is tenuous. Do you need to use mdrun -pd for 
some reason?

Mark