[gmx-users] excess backups
Justin Lemkul
jalemkul at vt.edu
Thu Aug 4 00:13:34 CEST 2016
On 8/3/16 6:02 PM, Samuel Flores wrote:
>
>
> sorry wrong subject line!
>
> Guys,
>
> I have been plagued with an odd backup issue that I don't understand. Gromacs seems to be making scads of successive backups of certain files, even in a single run. This often leads to death due to an excess of backups. Here is the first error:
>
> Program gmx grompp, VERSION 5.1.2
> Source code file: /local/easybuild/build/GROMACS/5.1.2/intel-2016a-hybrid/gromacs-5.1.2/src/gromacs/utility/futil.cpp, line: 409
>
> Fatal error:
> Won't make more than 99 backups of ions.tpr for you.
> The env.var. GMX_MAXBACKUP controls this maximum, -1 disables backups.
> For more information and tips for troubleshooting, please check the GROMACS
> website at http://www.gromacs.org/Documentation/E <http://www.gromacs.org/Documentation/E>rrors
> -------------------------------------------------------
>
> Halting parallel program gmx grompp on rank 67 out of 120
> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 67
> Calculating fourier grid dimensions for X Y Z
>
> .. which I believe is due to this command:
>
> gmx_mpi grompp -f ions.mdp -c threaded-truncated_solv.gro -p topol.top -o ions.tpr
>
>
> For now I set export GMX_MAXBACKUP=-1. But it would be nice to know what is actually happening. Can anyone help? I append my SLURM job file below.
>
> I would also like it if someone told me what is up with these ranks .. Not having worked much with MPI, I have the vague impression that these are processes or threads. In any case why are these processes being killed? Is this normal?
>
The only GROMACS program that benefits from MPI is mdrun. You're effectively
launching 120 instances of every command before that, which serves no purpose
and actively leads to the fatal error. Run preparation steps locally, then ship
the .tpr off to the cluster for the actual calculation.
-Justin
>
> Many thanks,
>
> Sam
>
>
>
> [samuelf at aurora1 proteinA-mine]$ cat job.proteinA-mine
> #!/bin/bash -l
> #SBATCH -J mine
> #SBATCH -N 6
> #SBATCH --tasks-per-node=20
> #SBATCH --exclusive
> #SBATCH -A snic2015-16-49
> #SBATCH -t 168:00:00
> # tried -N12. timed out. trying 6 now.
>
> # Disable backups. These have been causing problems for unclear reasons
> export GMX_MAXBACKUP=-1
>
> export LD_LIBRARY_PATH=$LD_LIBRARY_PATH
> # End part that is written in /home/samuelf/svn/breeder/src/Breed.cpp:87
>
> # From here down, This file is generated in /home/samuelf/svn/breeder/src/MysqlConnection.cpp:97
> cd /lunarc/nobackup/users/samuelf/proteinA-mine/
> echo " now working in : /lunarc/nobackup/users/samuelf/proteinA-mine/"
> cp /home/samuelf/svn/breeder//singleMutantFiles/ions.mdp /lunarc/nobackup/users/samuelf/proteinA-mine/ ;
> cp /home/samuelf/svn/breeder//singleMutantFiles/md.mdp /lunarc/nobackup/users/samuelf/proteinA-mine/ ;
> #cp /home/samuelf/svn/breeder//singleMutantFiles/mdout.mdp /lunarc/nobackup/users/samuelf/proteinA-mine/ ;
> cp /home/samuelf/svn/breeder//singleMutantFiles/minim.mdp /lunarc/nobackup/users/samuelf/proteinA-mine/ ;
> cp /home/samuelf/svn/breeder//singleMutantFiles/npt.mdp /lunarc/nobackup/users/samuelf/proteinA-mine/ ;
> cp /home/samuelf/svn/breeder//singleMutantFiles/nvt.mdp /lunarc/nobackup/users/samuelf/proteinA-mine/ ;
> # Write cluster-specific configuration and module load commands..
> # The following portion is being read from >/home/samuelf/svn/breeder//singleMutantFiles/gromacs-commands.txt<
>
> #export OMP_NUM_THREADS=1
> # Comment in original file: /home/samuelf/projects/1FC2.domainZ/gromacs-commands.txt
> #module load intel/2016a
> module load icc/2016.1.150-GCC-4.9.3-2.25 impi/5.1.2.150
> module load GROMACS/5.1.2-hybrid
>
>
> # End portion from >/home/samuelf/svn/breeder//singleMutantFiles/gromacs-commands.txt<
> echo 6 > temp.txt
> echo 1 >> temp.txt
> echo " Check 1"
> cat temp.txt | srun -n 1 gmx_mpi pdb2gmx -f threaded-truncated.pdb -o threaded-truncated_processed.gro -ignh
> echo " Check 2"
> #6: Amber sb99 , 3 point TIP3P water model: force field was selected based on this benchmark article: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2905107/ <http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2905107/>
> srun gmx_mpi editconf -f threaded-truncated_processed.gro -o threaded-truncated_newbox.gro -c -d 1.0 -bt cubic
> echo " Check 3"
> srun -n 1 gmx_mpi solvate -cp threaded-truncated_newbox.gro -cs spc216.gro -o threaded-truncated_solv.gro -p topol.top
> # threaded-truncated_solv.gro has full length SpA:
> echo " Check 4"
> srun gmx_mpi grompp -f ions.mdp -c threaded-truncated_solv.gro -p topol.top -o ions.tpr
> # threaded-truncated_solv_ions.gro has the full length SpA:
> echo " Check 5"
> echo 13 | srun -n 1 gmx_mpi genion -s ions.tpr -o threaded-truncated_solv_ions.gro -p topol.top -pname NA -neutral
> # previously had erroneous srun -n 1gmx_mpi ... :
> echo " Check 6"
> srun -n 1 gmx_mpi grompp -f minim.mdp -c threaded-truncated_solv_ions.gro -p topol.top -o em.tpr
> echo " Check 7"
> srun gmx_mpi mdrun -v -deffnm em
> # em.gro has only one domain for some reason
> echo " Check 8"
> srun -n 1 gmx_mpi grompp -f nvt.mdp -c em.gro -p topol.top -o nvt.tpr
> echo " Check 9"
> srun gmx_mpi mdrun -deffnm nvt
> echo " Check 10"
> srun -n 1 gmx_mpi grompp -f npt.mdp -c nvt.gro -t nvt.cpt -p topol.top -o npt.tpr
> echo " Check 11"
> srun gmx_mpi mdrun -deffnm npt
> echo " Check 12"
> srun -n 1 gmx_mpi grompp -f md.mdp -c npt.gro -t npt.cpt -p topol.top -o md_0_1.tpr
> echo " Check 13"
> srun gmx_mpi mdrun -deffnm md_0_1
> # insert DSSP commands here
> # End of GROMACS command generator./home/samuelf/svn/breeder/src/MysqlConnection.cpp:161
> # Returning from /home/samuelf/svn/breeder/src/MysqlConnection.cpp:163
>
>
> Samuel Coulbourn Flores
> Computational and Systems Biology Program
> Department of Cell and Molecular Biology
> Uppsala University
>
--
==================================================
Justin A. Lemkul, Ph.D.
Ruth L. Kirschstein NRSA Postdoctoral Fellow
Department of Pharmaceutical Sciences
School of Pharmacy
Health Sciences Facility II, Room 629
University of Maryland, Baltimore
20 Penn St.
Baltimore, MD 21201
jalemkul at outerbanks.umaryland.edu | (410) 706-7441
http://mackerell.umaryland.edu/~jalemkul
==================================================
More information about the gromacs.org_gmx-users
mailing list