[gmx-users] excess backups

Mark Abraham mark.j.abraham at gmail.com
Thu Aug 4 01:25:55 CEST 2016


Hi,

This is what the "srun -n 1" instances in your script are managing - that
these stages are useless to run in parallel, and in this case several
potentially useful features of GROMACS collide in an unfortunate way.

There's ways we could detect and block the useless instances of grompp from
doing actual work, but (for example) there's probably ways to make a
sensible workflow out of something conceptually like

mpirun -cwd $dir_for_this_rank gmx_mpi grompp
mpirun gmx_mpi mdrun -multidir $all_the_dirs

Mark

On Thu, Aug 4, 2016 at 12:13 AM Justin Lemkul <jalemkul at vt.edu> wrote:

>
>
> On 8/3/16 6:02 PM, Samuel Flores wrote:
> >
> >
> > sorry wrong subject line!
> >
> > Guys,
> >
> > I have been plagued with an odd backup issue that I don't understand.
> Gromacs seems to be making scads of successive backups of certain files,
> even in a single run. This often leads to death due to an excess of
> backups. Here is the first error:
> >
> > Program gmx grompp, VERSION 5.1.2
> > Source code file:
> /local/easybuild/build/GROMACS/5.1.2/intel-2016a-hybrid/gromacs-5.1.2/src/gromacs/utility/futil.cpp,
> line: 409
> >
> > Fatal error:
> > Won't make more than 99 backups of ions.tpr for you.
> > The env.var. GMX_MAXBACKUP controls this maximum, -1 disables backups.
> > For more information and tips for troubleshooting, please check the
> GROMACS
> > website at http://www.gromacs.org/Documentation/E <
> http://www.gromacs.org/Documentation/E>rrors
> > -------------------------------------------------------
> >
> > Halting parallel program gmx grompp on rank 67 out of 120
> > application called MPI_Abort(MPI_COMM_WORLD, 1) - process 67
> > Calculating fourier grid dimensions for X Y Z
> >
> > .. which I believe is due to this command:
> >
> >    gmx_mpi grompp -f ions.mdp -c threaded-truncated_solv.gro -p
> topol.top -o ions.tpr
> >
> >
> > For now I set export GMX_MAXBACKUP=-1. But it would be nice to know what
> is actually happening. Can anyone help? I append my SLURM job file below.
> >
> > I would also like it if someone told me what is up with these ranks ..
> Not having worked much with MPI, I have the vague impression that these are
> processes or threads. In any case why are these processes being killed? Is
> this normal?
> >
>
> The only GROMACS program that benefits from MPI is mdrun.  You're
> effectively
> launching 120 instances of every command before that, which serves no
> purpose
> and actively leads to the fatal error.  Run preparation steps locally,
> then ship
> the .tpr off to the cluster for the actual calculation.
>
> -Justin
>
> >
> > Many thanks,
> >
> > Sam
> >
> >
> >
> > [samuelf at aurora1 proteinA-mine]$ cat job.proteinA-mine
> > #!/bin/bash -l
> > #SBATCH -J mine
> > #SBATCH -N  6
> > #SBATCH --tasks-per-node=20
> > #SBATCH --exclusive
> > #SBATCH -A snic2015-16-49
> > #SBATCH -t 168:00:00
> > # tried -N12. timed out. trying 6 now.
> >
> > # Disable backups. These have been causing problems for unclear reasons
> > export GMX_MAXBACKUP=-1
> >
> > export LD_LIBRARY_PATH=$LD_LIBRARY_PATH
> > # End part that is written in /home/samuelf/svn/breeder/src/Breed.cpp:87
> >
> > # From here down, This file is generated in
> /home/samuelf/svn/breeder/src/MysqlConnection.cpp:97
> > cd /lunarc/nobackup/users/samuelf/proteinA-mine/
> > echo " now working in : /lunarc/nobackup/users/samuelf/proteinA-mine/"
> > cp /home/samuelf/svn/breeder//singleMutantFiles/ions.mdp
> /lunarc/nobackup/users/samuelf/proteinA-mine/ ;
> > cp /home/samuelf/svn/breeder//singleMutantFiles/md.mdp
> /lunarc/nobackup/users/samuelf/proteinA-mine/ ;
> > #cp /home/samuelf/svn/breeder//singleMutantFiles/mdout.mdp
> /lunarc/nobackup/users/samuelf/proteinA-mine/ ;
> > cp /home/samuelf/svn/breeder//singleMutantFiles/minim.mdp
> /lunarc/nobackup/users/samuelf/proteinA-mine/ ;
> > cp /home/samuelf/svn/breeder//singleMutantFiles/npt.mdp
> /lunarc/nobackup/users/samuelf/proteinA-mine/ ;
> > cp /home/samuelf/svn/breeder//singleMutantFiles/nvt.mdp
> /lunarc/nobackup/users/samuelf/proteinA-mine/ ;
> > # Write cluster-specific configuration and module load commands..
> > # The following portion is being read from
> >/home/samuelf/svn/breeder//singleMutantFiles/gromacs-commands.txt<
> >
> > #export OMP_NUM_THREADS=1
> > # Comment in original file:
> /home/samuelf/projects/1FC2.domainZ/gromacs-commands.txt
> > #module load intel/2016a
> > module load       icc/2016.1.150-GCC-4.9.3-2.25  impi/5.1.2.150
> > module load GROMACS/5.1.2-hybrid
> >
> >
> > # End portion from
> >/home/samuelf/svn/breeder//singleMutantFiles/gromacs-commands.txt<
> > echo 6 > temp.txt
> > echo 1 >> temp.txt
> > echo " Check 1"
> > cat temp.txt | srun -n 1 gmx_mpi pdb2gmx -f threaded-truncated.pdb -o
> threaded-truncated_processed.gro -ignh
> > echo " Check 2"
> > #6: Amber sb99 , 3 point TIP3P water model: force field was selected
> based on this benchmark article:
> http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2905107/ <
> http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2905107/>
> > srun  gmx_mpi editconf -f threaded-truncated_processed.gro -o
> threaded-truncated_newbox.gro -c -d 1.0 -bt cubic
> > echo " Check 3"
> > srun  -n 1 gmx_mpi solvate -cp threaded-truncated_newbox.gro -cs
> spc216.gro -o threaded-truncated_solv.gro -p topol.top
> > # threaded-truncated_solv.gro has full length SpA:
> > echo " Check 4"
> > srun  gmx_mpi grompp -f ions.mdp -c threaded-truncated_solv.gro -p
> topol.top -o ions.tpr
> > # threaded-truncated_solv_ions.gro has the full length SpA:
> > echo " Check 5"
> > echo 13 | srun -n 1 gmx_mpi genion -s ions.tpr -o
> threaded-truncated_solv_ions.gro -p topol.top -pname NA -neutral
> > # previously had erroneous srun  -n 1gmx_mpi ... :
> > echo " Check 6"
> > srun  -n 1 gmx_mpi grompp -f minim.mdp -c
> threaded-truncated_solv_ions.gro -p topol.top -o em.tpr
> > echo " Check 7"
> > srun  gmx_mpi mdrun -v -deffnm em
> > # em.gro has only one domain for some reason
> > echo " Check 8"
> > srun  -n 1 gmx_mpi grompp -f nvt.mdp -c em.gro -p topol.top -o nvt.tpr
> > echo " Check 9"
> > srun  gmx_mpi mdrun -deffnm nvt
> > echo " Check 10"
> > srun  -n 1 gmx_mpi grompp -f npt.mdp -c nvt.gro -t nvt.cpt -p topol.top
> -o npt.tpr
> > echo " Check 11"
> > srun  gmx_mpi mdrun -deffnm npt
> > echo " Check 12"
> > srun  -n 1 gmx_mpi grompp -f md.mdp -c npt.gro -t npt.cpt -p topol.top
> -o md_0_1.tpr
> > echo " Check 13"
> > srun  gmx_mpi mdrun -deffnm md_0_1
> > # insert DSSP commands here
> > # End of GROMACS command
> generator./home/samuelf/svn/breeder/src/MysqlConnection.cpp:161
> > # Returning from /home/samuelf/svn/breeder/src/MysqlConnection.cpp:163
> >
> >
> > Samuel Coulbourn Flores
> > Computational and Systems Biology Program
> > Department of Cell and Molecular Biology
> > Uppsala University
> >
>
> --
> ==================================================
>
> Justin A. Lemkul, Ph.D.
> Ruth L. Kirschstein NRSA Postdoctoral Fellow
>
> Department of Pharmaceutical Sciences
> School of Pharmacy
> Health Sciences Facility II, Room 629
> University of Maryland, Baltimore
> 20 Penn St.
> Baltimore, MD 21201
>
> jalemkul at outerbanks.umaryland.edu | (410) 706-7441
> http://mackerell.umaryland.edu/~jalemkul
>
> ==================================================
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>


More information about the gromacs.org_gmx-users mailing list