[gmx-users] Gromacs 2018.3 Exceeding Memory Issue
Justin Lemkul
jalemkul at vt.edu
Tue Nov 27 16:52:49 CET 2018
On 11/27/18 10:43 AM, Peiyin Lee wrote:
> Hi, Mark,
> Thank you for all the suggestions! Regarding to the memory limit it
> should be around 118 GB. The cluster I am using has 20 cores per node and 6
> GB of memory space per code. That's why I think it is strange for my job to
> exceed the large memory limit. Right now I have checked my mdp file and
> submission file and couldn't see any possible reason that causes this large
> memory usage issue. Do you have suggestions on other places to look at?
Probably the most applicable attribute is the number of atoms in the system.
> Thank you so much for your help.
> Regards,
> Peiyin
> On Tue, Nov 27, 2018 at 2:50 AM Mark Abraham <mark.j.abraham at gmail.com>
> wrote:
>> Hi,
>> On Tue, Nov 27, 2018 at 4:31 AM Peiyin Lee <peiyinlee329 at gmail.com> wrote:
>>> Hi, all GROMACS users,
>>> I am trying to run jobs with Gromacs 2018.3 version and constantly
>> got a
>>> memory exceeding error. The system I ran is an all-atom system with 21073
>>> atoms. The largest file that is estimated to be generated is around 5.8
>> GB.
>> Estimate sizes of disk files don't matter here.
>>> My jobs got constantly killed after running for only around 15 minutes
>> and
>>> got an error message like this: "slurmstepd: error: Job 12381762 exceeded
>>> memory limit (123122052 > 122880000), being killed". I have tried using a
>> 128MB is pretty tiny these days - no compute node will have less than 1GB
>> physical memory, so I suggest to ask for that.
>> GROMACS should never leak memory as the simulation progresses - if you
>> think you are seeing that (e.g. with slightly larger memory limit, slurm
>> interrupts a bit later) then we would like to see a bug report at
>> https://redmine.gromacs.org
>>> larger memory specification (12GB/core) but it would take too long to
>> wait
>>> and I don't think my job really uses that many memories. I have attached
>> my
>>> .mdp file as below:
>>> "title = NVT Production Run for Trpzip4 in pure H2O
>>> define = ; position restrain the protein
>>> ; Run parameters
>>> integrator = md ; leap-frog integrator
>>> nsteps = 50000000 ; 0.002 * 50000 = 100000 ps (100 ns)
>>> dt = 0.002 ; 2 fs
>>> ; Output control
>>> nstenergy = 10000 ; save energies every 20 ps
>>> nstlog = 10000 ; update log file every 20 ps
>>> nstxout-compressed = 10000 ; 20ps
>>> compressed-x-precision = 200 ; 0.05
>>> compressed-x-grps = System
>>> ; Bond parameters
>>> continuation = yes ; Restarting after NVT
>>> constraint_algorithm = lincs ; holonomic constraints
>>> constraints = all-bonds ; all bonds (even heavy atom-H bonds)
>>> constrained
>>> lincs_iter = 1 ; accuracy of LINCS
>>> lincs_order = 4 ; also related to accuracy
>>> ; Neighborsearching
>>> ns_type = grid ; search neighboring grid cels
>>> nstlist = 5 ; 10 fs
>>> rlist = 1.2 ; short-range neighborlist cutoff (in nm)
>>> rcoulomb = 1.2 ; short-range electrostatic cutoff (in nm)
>>> rvdw = 1.2 ; short-range van der Waals cutoff (in nm)
>>> ; Electrostatics
>>> coulombtype = PME ; Particle Mesh Ewald for long-range electrostatics
>>> pme_order = 4 ; cubic interpolation
>>> fourierspacing = 0.16 ; grid spacing for FFT
>>> ; Temperature coupling is on
>>> tcoupl = V-rescale ; More accurate thermostat
>>> tc-grps = Protein SOL NA ; 2 coupling groups - more accurate
>> Off topic, but it is not good practice to couple ions separately. Did you
>> perhaps follow some tutorial that we can ask the author to fix?
>>> tau_t = 0.5 0.5 0.5 ; time constant, in ps
>>> ref_t = 400 400 400 ; reference temperature, one for each group, in
>> K
>>> ; Pressure coupling is on
>>> pcoupl = No ; Pressure coupling on in NPT
>>> pcoupltype = isotropic ; uniform scaling of x-y box vectors,
>>> independent z
>>> tau_p = 5.0 ; time constant, in ps
>>> ref_p = 1.0 ; reference pressure, x-y, z (in bar)
>>> compressibility = 4.5e-5 ; isothermal compressibility, bar^-1
>>> ; Periodic boundary conditions
>>> pbc = xyz ; 3-D PBC
>>> ; Dispersion correction
>>> DispCorr = EnerPres ; account for cut-off vdW scheme
>>> ; Velocity generation
>>> gen_vel = no ; Velocity generation is off"
>>> and the command I used to run was "mpirun -np 80 gmx_mpi mdrun -npme 16
>>> -noappend -s md.tpr -c md.gro -e md.edr -x md.xtc -cpi md.cpt -cpo md.cpt
>>> -g md.log".
>> Looks fine. I encourage everybody to use the default file names, and
>> organize their projects into natural groups for the infrastructure, like
>> directories. Renaming them doesn't add value and makes your life more
>> complicated when you're doing restarts.
>> Mark
>>> This is my first time posting so please excuse anything that's
>>> unclear. I will try to clarify if needed. Any help is greatly
>> appreciated!
>>> Regards,
>>> Peiyin Lee
