[gmx-users] excess backups, and minimization issue

Justin Lemkul jalemkul at vt.edu
Wed Aug 10 14:46:53 CEST 2016

On 8/9/16 8:36 AM, Samuel Flores wrote:
> Hi Justin,
> I think the only line that used "srun" rather than "srun -n 1" (and was not
> an mdrun call) was this one:
>> srun  gmx_mpi grompp -f ions.mdp -c threaded-truncated_solv.gro -p
>> topol.top -o ions.tpr
> I added the -n 1 flag for this command. This was indeed the one that was
> generating all the backups. However following your suggestion I cannot run
> grompp without a preceding gmx_mpi command on my install:
> [samuelf at aurora1 proteinA]$ grompp -f ions.mdp -c threaded-truncated_solv.gro
> -p topol.top -o ions.tpr -bash: grompp: command not found
> Anyway, I imagine with the -n 1 flag what I do is kosher enough? I don't
> really like doing things on the command line anyway as I find it is not
> repeatable enough for my taste.

I run everything via scripts, too, but my suggestion was merely to not run all 
the prep work via a cluster/queuing system.  If something messes up, that's a 
waste of time while you wait the queue again, etc.  I keep bash scripts for all 
workflows, but I do everything through grompp on my local machine.  Personal 
preference, I suppose, but this is all I was suggesting.  I find it more 
efficient to work out roadblocks without having to submit jobs and wait on them 
to succeed or fail :)

> I am now having a problem energy minimization:
> ------------------------------------------------------- Program gmx mdrun,
> VERSION 5.1.2 Source code file:
> /local/easybuild/build/GROMACS/5.1.2/intel-2016a-hybrid/gromacs-5.1.2/src/gromacs/mdlib/constr.cpp,
> line: 555
> Fatal error:
> step 23: Water molecule starting at atom 4636051 can not be settled. Check
> for bad contacts and/or reduce the timestep if appropriate.
> For more information and tips for troubleshooting, please check the GROMACS
> website at http://www.gromacs.org/Documentation/Errors
> <http://www.gromacs.org/Documentation/Errors>
> -------------------------------------------------------
> The offending command is: srun  gmx_mpi mdrun -v -deffnm em
> this is a protein which I have generated by homology  modeling. I provided no
> water, all solvation is done by GROMACS. I suspect the problem is a mild
> clash in the protein which affected a neighboring  water molecule. Perhaps it
> is best to minimize the protein alone prior to solvation. Perhaps prior to
> the editconf step? Or perhaps "minim.mdp" in my existing workflow should have
> a -DPOSRES statement? Is there a protocol for this, or a better way to
> diagnose the problem? I append my updated job file.

The mdrun -v output should show you which atoms are experiencing large forces as 
the run proceeds.  That's your first clue.  Based on what you find there, that 
should point you towards how to deal with the problem.  Likely just an 
artificial clash that needs to be resolved.  Wholesale position restraints are 
actually the opposite of what you want; you need mdrun to be able to resolve the 
clash, not hold things in place.  Now, if you have some bad side chain 
somewhere, maybe restraining the backbone in a first step is appropriate, then 
if that EM works, then minimize the whole thing.  But everything is predicated 
upon identifying where things start to go wrong.


> Many thanks
> Sam
> #!/bin/bash -l #SBATCH -J mine #SBATCH -N  6 #SBATCH --tasks-per-node=20
> #SBATCH --exclusive #SBATCH -A snic2015-16-49 #SBATCH -t 168:00:00 # tried
> -N12. timed out. trying 6 now.
> # Disable backups. These have been causing problems for unclear reasons
> export GMX_MAXBACKUP=-1
> export LD_LIBRARY_PATH=$LD_LIBRARY_PATH # End part that is written in
> /home/samuelf/svn/breeder/src/Breed.cpp:87
> # From here down, This file is generated in
> /home/samuelf/svn/breeder/src/MysqlConnection.cpp:97 cd
> /lunarc/nobackup/users/samuelf/proteinA-mine/ echo " now working in :
> /lunarc/nobackup/users/samuelf/proteinA-mine/" cp
> /home/samuelf/svn/breeder//singleMutantFiles/ions.mdp
> /lunarc/nobackup/users/samuelf/proteinA-mine/ ; cp
> /home/samuelf/svn/breeder//singleMutantFiles/md.mdp
> /lunarc/nobackup/users/samuelf/proteinA-mine/ ; #cp
> /home/samuelf/svn/breeder//singleMutantFiles/mdout.mdp
> /lunarc/nobackup/users/samuelf/proteinA-mine/ ; cp
> /home/samuelf/svn/breeder//singleMutantFiles/minim.mdp
> /lunarc/nobackup/users/samuelf/proteinA-mine/ ; cp
> /home/samuelf/svn/breeder//singleMutantFiles/npt.mdp
> /lunarc/nobackup/users/samuelf/proteinA-mine/ ; cp
> /home/samuelf/svn/breeder//singleMutantFiles/nvt.mdp
> /lunarc/nobackup/users/samuelf/proteinA-mine/ ; # Write cluster-specific
> configuration and module load commands.. # The following portion is being
> read from >/home/samuelf/svn/breeder//singleMutantFiles/gromacs-commands.txt<
> #export OMP_NUM_THREADS=1 # Comment in original file:
> /home/samuelf/projects/1FC2.domainZ/gromacs-commands.txt #module load
> intel/2016a module load       icc/2016.1.150-GCC-4.9.3-2.25  impi/
> module load GROMACS/5.1.2-hybrid
> # End portion from
> >/home/samuelf/svn/breeder//singleMutantFiles/gromacs-commands.txt< echo 6 >
> temp.txt echo 1 >> temp.txt echo " Check 1" cat temp.txt | srun -n 1 gmx_mpi
> pdb2gmx -f threaded-truncated.pdb -o threaded-truncated_processed.gro -ignh
> echo " Check 2" #6: Amber sb99 , 3 point TIP3P water model: force field was
> selected based on this benchmark article:
> http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2905107/
> <http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2905107/> srun  -n 1 gmx_mpi
> editconf -f threaded-truncated_processed.gro -o threaded-truncated_newbox.gro
> -c -d 1.0 -bt cubic echo " Check 3" srun  -n 1 gmx_mpi solvate -cp
> threaded-truncated_newbox.gro -cs spc216.gro -o threaded-truncated_solv.gro
> -p topol.top # threaded-truncated_solv.gro has full length SpA: echo " Check
> 4" srun  -n 1 gmx_mpi grompp -f ions.mdp -c threaded-truncated_solv.gro -p
> topol.top -o ions.tpr # threaded-truncated_solv_ions.gro has the full length
> SpA: echo " Check 5" echo 13 | srun -n 1 gmx_mpi genion -s ions.tpr -o
> threaded-truncated_solv_ions.gro -p topol.top -pname NA -neutral # previously
> had erroneous srun  -n 1gmx_mpi ... : echo " Check 6" srun  -n 1 gmx_mpi
> grompp -f minim.mdp -c threaded-truncated_solv_ions.gro -p topol.top -o
> em.tpr echo " Check 7" srun  gmx_mpi mdrun -v -deffnm em # em.gro has only
> one domain for some reason echo " Check 8" srun  -n 1 gmx_mpi grompp -f
> nvt.mdp -c em.gro -p topol.top -o nvt.tpr echo " Check 9" srun  gmx_mpi mdrun
> -deffnm nvt echo " Check 10" srun  -n 1 gmx_mpi grompp -f npt.mdp -c nvt.gro
> -t nvt.cpt -p topol.top -o npt.tpr echo " Check 11" srun  gmx_mpi mdrun
> -deffnm npt echo " Check 12" srun  -n 1 gmx_mpi grompp -f md.mdp -c npt.gro
> -t npt.cpt -p topol.top -o md_0_1.tpr echo " Check 13" srun  gmx_mpi mdrun
> -deffnm md_0_1 # insert DSSP commands here # End of GROMACS command
> generator./home/samuelf/svn/breeder/src/MysqlConnection.cpp:161 # Returning
> from /home/samuelf/svn/breeder/src/MysqlConnection.cpp:163
>> On Aug 4, 2016, at 00:13, Justin Lemkul <jalemkul at vt.edu
>> <mailto:jalemkul at vt.edu>> wrote:
>> On 8/3/16 6:02 PM, Samuel Flores wrote:
>>> sorry wrong subject line!
>>> Guys,
>>> I have been plagued with an odd backup issue that I don't understand.
>>> Gromacs seems to be making scads of successive backups of certain files,
>>> even in a single run. This often leads to death due to an excess of
>>> backups. Here is the first error:
>>> Program gmx grompp, VERSION 5.1.2 Source code file:
>>> /local/easybuild/build/GROMACS/5.1.2/intel-2016a-hybrid/gromacs-5.1.2/src/gromacs/utility/futil.cpp,
>>> line: 409
>>> Fatal error: Won't make more than 99 backups of ions.tpr for you. The
>>> env.var. GMX_MAXBACKUP controls this maximum, -1 disables backups. For
>>> more information and tips for troubleshooting, please check the GROMACS
>>> website at http://www.gromacs.org/Documentation/E
>>> <http://www.gromacs.org/Documentation/E>
>>> <http://www.gromacs.org/Documentation/E
>>> <http://www.gromacs.org/Documentation/E>>rrors
>>> -------------------------------------------------------
>>> Halting parallel program gmx grompp on rank 67 out of 120 application
>>> called MPI_Abort(MPI_COMM_WORLD, 1) - process 67 Calculating fourier grid
>>> dimensions for X Y Z
>>> .. which I believe is due to this command:
>>> gmx_mpi grompp -f ions.mdp -c threaded-truncated_solv.gro -p topol.top -o
>>> ions.tpr
>>> For now I set export GMX_MAXBACKUP=-1. But it would be nice to know what
>>> is actually happening. Can anyone help? I append my SLURM job file
>>> below.
>>> I would also like it if someone told me what is up with these ranks ..
>>> Not having worked much with MPI, I have the vague impression that these
>>> are processes or threads. In any case why are these processes being
>>> killed? Is this normal?
>> The only GROMACS program that benefits from MPI is mdrun.  You're
>> effectively launching 120 instances of every command before that, which
>> serves no purpose and actively leads to the fatal error.  Run preparation
>> steps locally, then ship the .tpr off to the cluster for the actual
>> calculation.
>> -Justin
>>> Many thanks,
>>> Sam
>>> [samuelf at aurora1 proteinA-mine]$ cat job.proteinA-mine #!/bin/bash -l
>>> #SBATCH -J mine #SBATCH -N  6 #SBATCH --tasks-per-node=20 #SBATCH
>>> --exclusive #SBATCH -A snic2015-16-49 #SBATCH -t 168:00:00 # tried -N12.
>>> timed out. trying 6 now.
>>> # Disable backups. These have been causing problems for unclear reasons
>>> export GMX_MAXBACKUP=-1
>>> export LD_LIBRARY_PATH=$LD_LIBRARY_PATH # End part that is written in
>>> /home/samuelf/svn/breeder/src/Breed.cpp:87
>>> # From here down, This file is generated in
>>> /home/samuelf/svn/breeder/src/MysqlConnection.cpp:97 cd
>>> /lunarc/nobackup/users/samuelf/proteinA-mine/ echo " now working in :
>>> /lunarc/nobackup/users/samuelf/proteinA-mine/" cp
>>> /home/samuelf/svn/breeder//singleMutantFiles/ions.mdp
>>> /lunarc/nobackup/users/samuelf/proteinA-mine/ ; cp
>>> /home/samuelf/svn/breeder//singleMutantFiles/md.mdp
>>> /lunarc/nobackup/users/samuelf/proteinA-mine/ ; #cp
>>> /home/samuelf/svn/breeder//singleMutantFiles/mdout.mdp
>>> /lunarc/nobackup/users/samuelf/proteinA-mine/ ; cp
>>> /home/samuelf/svn/breeder//singleMutantFiles/minim.mdp
>>> /lunarc/nobackup/users/samuelf/proteinA-mine/ ; cp
>>> /home/samuelf/svn/breeder//singleMutantFiles/npt.mdp
>>> /lunarc/nobackup/users/samuelf/proteinA-mine/ ; cp
>>> /home/samuelf/svn/breeder//singleMutantFiles/nvt.mdp
>>> /lunarc/nobackup/users/samuelf/proteinA-mine/ ; # Write cluster-specific
>>> configuration and module load commands.. # The following portion is being
>>> read from
>>> >/home/samuelf/svn/breeder//singleMutantFiles/gromacs-commands.txt<
>>> #export OMP_NUM_THREADS=1 # Comment in original file:
>>> /home/samuelf/projects/1FC2.domainZ/gromacs-commands.txt #module load
>>> intel/2016a module load       icc/2016.1.150-GCC-4.9.3-2.25
>>> impi/ module load GROMACS/5.1.2-hybrid
>>> # End portion from
>>> >/home/samuelf/svn/breeder//singleMutantFiles/gromacs-commands.txt< echo
>>> 6 > temp.txt echo 1 >> temp.txt echo " Check 1" cat temp.txt | srun -n 1
>>> gmx_mpi pdb2gmx -f threaded-truncated.pdb -o
>>> threaded-truncated_processed.gro -ignh echo " Check 2" #6: Amber sb99 , 3
>>> point TIP3P water model: force field was selected based on this benchmark
>>> article: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2905107/
>>> <http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2905107/>
>>> <http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2905107/
>>> <http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2905107/>> srun  gmx_mpi
>>> editconf -f threaded-truncated_processed.gro -o
>>> threaded-truncated_newbox.gro -c -d 1.0 -bt cubic echo " Check 3" srun
>>> -n 1 gmx_mpi solvate -cp threaded-truncated_newbox.gro -cs spc216.gro -o
>>> threaded-truncated_solv.gro -p topol.top # threaded-truncated_solv.gro
>>> has full length SpA: echo " Check 4" srun  gmx_mpi grompp -f ions.mdp -c
>>> threaded-truncated_solv.gro -p topol.top -o ions.tpr #
>>> threaded-truncated_solv_ions.gro has the full length SpA: echo " Check
>>> 5" echo 13 | srun -n 1 gmx_mpi genion -s ions.tpr -o
>>> threaded-truncated_solv_ions.gro -p topol.top -pname NA -neutral #
>>> previously had erroneous srun  -n 1gmx_mpi ... : echo " Check 6" srun  -n
>>> 1 gmx_mpi grompp -f minim.mdp -c threaded-truncated_solv_ions.gro -p
>>> topol.top -o em.tpr echo " Check 7" srun  gmx_mpi mdrun -v -deffnm em #
>>> em.gro has only one domain for some reason echo " Check 8" srun  -n 1
>>> gmx_mpi grompp -f nvt.mdp -c em.gro -p topol.top -o nvt.tpr echo " Check
>>> 9" srun  gmx_mpi mdrun -deffnm nvt echo " Check 10" srun  -n 1 gmx_mpi
>>> grompp -f npt.mdp -c nvt.gro -t nvt.cpt -p topol.top -o npt.tpr echo "
>>> Check 11" srun  gmx_mpi mdrun -deffnm npt echo " Check 12" srun  -n 1
>>> gmx_mpi grompp -f md.mdp -c npt.gro -t npt.cpt -p topol.top -o
>>> md_0_1.tpr echo " Check 13" srun  gmx_mpi mdrun -deffnm md_0_1 # insert
>>> DSSP commands here # End of GROMACS command
>>> generator./home/samuelf/svn/breeder/src/MysqlConnection.cpp:161 #
>>> Returning from /home/samuelf/svn/breeder/src/MysqlConnection.cpp:163
>>> Samuel Coulbourn Flores Computational and Systems Biology Program
>>> Department of Cell and Molecular Biology Uppsala University
>> -- ==================================================
>> Justin A. Lemkul, Ph.D. Ruth L. Kirschstein NRSA Postdoctoral Fellow
>> Department of Pharmaceutical Sciences School of Pharmacy Health Sciences
>> Facility II, Room 629 University of Maryland, Baltimore 20 Penn St.
>> Baltimore, MD 21201
>> jalemkul at outerbanks.umaryland.edu
>> <mailto:jalemkul at outerbanks.umaryland.edu> | (410) 706-7441
>> http://mackerell.umaryland.edu/~jalemkul
>> <http://mackerell.umaryland.edu/~jalemkul>
>> ================================================== -- Gromacs Users mailing
>> list
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> posting!
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send
>> a mail to gmx-users-request at gromacs.org.
> Samuel Coulbourn Flores Computational and Systems Biology Program Department
> of Cell and Molecular Biology Uppsala University
> Cell: +46 706.000.464 Phone: +46 (0) 18-471 45 36 Skype: samuelfloresc
> Office: BMC C8:217a Deliveries: BMC Box 596, Uppsala 75124
> Samuel Coulbourn Flores Computational and Systems Biology Program Department
> of Cell and Molecular Biology Uppsala University


Justin A. Lemkul, Ph.D.
Ruth L. Kirschstein NRSA Postdoctoral Fellow

Department of Pharmaceutical Sciences
School of Pharmacy
Health Sciences Facility II, Room 629
University of Maryland, Baltimore
20 Penn St.
Baltimore, MD 21201

jalemkul at outerbanks.umaryland.edu | (410) 706-7441


More information about the gromacs.org_gmx-users mailing list