[gmx-users] Two machines, same job, one fails
Justin A. Lemkul
jalemkul at vt.edu
Wed Jan 26 00:53:25 CET 2011
TJ Mustard wrote:
<snip>
> > 1. Do the systems in question crash immediately (i.e., step zero) or
> do they run
> > for some time?
> >
>
> Step 0, every time.
>
>
>
> > 2. If they give you even a little bit of output, you can analyze
> which energy
> > terms, etc go haywire with the tips listed here:
> >
>
> All I have seen on these is LINCS Errors and Water molecules unable to
> be settled.
>
>
>
> But I will check this out right now, and email if I smell trouble.
>
>
>
> >
> http://www.gromacs.org/Documentation/Terminology/Blowing_Up#Diagnosing_an_Unstable_System
> >
> > That would help in tracking down any potential bug or error.
> >
> > 3. Is it just the production runs that are crashing, or everything?
> If EM isn't
> > even working, that smells even buggier.
>
> Awesome question here, we have seen some weird stuff. Sometimes the
> cluster will give us segmentation faults, then it will fail on our
> machines or sometimes not on our iMacs. I know weird! If EM starts on
> the cluster it will finish. Where we have issues is in positional
> restraint (PR) and MD and MD/FEP. It doesn't matter if FEP is on or off
> in a MD (although we are using SD for these MD/FEP runs).
>
>
Does "sometimes" refer to different simulations, or multiple invocations of the
same simulation system? If you're referencing the fact that system A works
while system B doesn't, we're talking apples and oranges and it's irrelevant to
the diagnosis (and perhaps some systems simply require greater finesse or a
different protocol). If one system continually fails on one system and works on
another, that's what we need to be discussing. Sorry if I've missed something,
I'm just getting confused.
>
> >
> > 4. Are the compilers the same on the iMac vs. AMD cluster?
>
> No I am using x86_64-apple-darwin10 GCC 4.4.4 and the cluster is using
> x86_64-redhat-linux 4.1.2 GCC.
>
Well, I know that for years weird behavior has been attributed to the gcc-4.1.x
series, including the famous warning on the downloads page:
"WARNING: do not use the gcc 4.1.x set of compilers. They are broken. These
compilers come with recent Linux distributions like Fedora 5/6 etc."
I don't know if those issues were ever resolved (some error in Gromacs that
wasn't playing nice with gcc, or vice versa).
> I just did a quick yum search and there doesn't seem to be a newer GCC.
> We know you are going to cmake but we have yet to get it implemented on
> our cluster successfully.
>
The build system is irrelevant. You still need a reliable C compiler, whether
using autoconf or cmake.
-Justin
>
>
> Thank you,
>
> TJ Mustard
>
>
>
> >
> > -Justin
> >
> > >
> > >
> > > Now I understand that my iMac works, but it only has 2 cpus and the
> > > cluster has 320. Since we are running our jobs via a Bennet's
> Acceptance
> > > Ratio FEP with 21 lambda windows, using just one 2 cpu machine would
> > > take too long. Especially since we wish to start pseudo high throughput
> > > drug testing.
> > >
> > >
> > >
> > >
> > >
> > > In my .mdp files now, the only changes are:
> > >
> > > (the default setting is on the right of the ";")
> > >
> > >
> > >
> > >
> > >
> > > define = ; =
> > >
> > > ; RUN CONTROL PARAMETERS
> > > integrator = sd ; = md
> > > ; Start time and timestep in ps
> > > tinit = 0 ; = 0
> > > dt = 0.004 ; = 0.001
> > > nsteps = 750000 ; = 0 (this one depends on the
> > > window and particular part of our job)
> > >
> > > ; OUTPUT CONTROL OPTIONS
> > > ; Output frequency for coords (x), velocities (v) and forces (f)
> > > nstxout = 10000 ; = 100 (to save on disk space)
> > > nstvout = 10000 ; = 100
> > >
> > >
> > >
> > > ; OPTIONS FOR ELECTROSTATICS AND VDW
> > > ; Method for doing electrostatics
> > > coulombtype = PME ; = Cutoff
> > > rcoulomb-switch = 0 ; = 0
> > > rcoulomb = 1 ; = 1
> > > ; Relative dielectric constant for the medium and the reaction field
> > > epsilon_r = 1 ; = 1
> > > epsilon_rf = 1 ; = 1
> > > ; Method for doing Van der Waals
> > > vdw-type = Cut-off ; = Cut-off
> > > ; cut-off lengths
> > > rvdw-switch = 0 ; = 0
> > > rvdw = 1 ; = 1
> > > ; Spacing for the PME/PPPM FFT grid
> > > fourierspacing = 0.12 ; = 0.12
> > > ; EWALD/PME/PPPM parameters
> > > pme_order = 4 ; = 4
> > > ewald_rtol = 1e-05 ; = 1e-05
> > > ewald_geometry = 3d ; = 3d
> > > epsilon_surface = 0 ; = 0
> > > optimize_fft = yes ; = no
> > >
> > >
> > >
> > > ; OPTIONS FOR WEAK COUPLING ALGORITHMS
> > > ; Temperature coupling
> > > tcoupl = v-rescale ; = No
> > > nsttcouple = -1 ; = -1
> > > nh-chain-length = 10 ; = 10
> > > ; Groups to couple separately
> > > tc-grps = System ; =
> > > ; Time constant (ps) and reference temperature (K)
> > > tau-t = 0.1 ; =
> > > ref-t = 300 ; =
> > > ; Pressure coupling
> > > Pcoupl = Parrinello-Rahman ; = No
> > > Pcoupltype = Isotropic
> > > nstpcouple = -1 ; = -1
> > > ; Time constant (ps), compressibility (1/bar) and reference P (bar)
> > > tau-p = 1 ; = 1
> > > compressibility = 4.5e-5 ; =
> > > ref-p = 1.0 ; =
> > >
> > >
> > >
> > > ; OPTIONS FOR BONDS
> > > constraints = all-bonds ; = none
> > > ; Type of constraint algorithm
> > > constraint-algorithm = Lincs ; = Lincs
> > >
> > >
> > >
> > > ; Free energy control stuff
> > > free-energy = yes ; = no
> > > init-lambda = 0.00 ; = 0
> > > delta-lambda = 0 ; = 0
> > > foreign_lambda = 0.05 ; =
> > > sc-alpha = 0.5 ; = 0
> > > sc-power = 1.0 ; = 0
> > > sc-sigma = 0.3 ; = 0.3
> > > nstdhdl = 1 ; = 10
> > > separate-dhdl-file = yes ; = yes
> > > dhdl-derivatives = yes ; = yes
> > > dh_hist_size = 0 ; = 0
> > > dh_hist_spacing = 0.1 ; = 0.1
> > > couple-moltype = LGD ; =
> > > couple-lambda0 = vdw-q ; = vdw-q
> > > couple-lambda1 = none ; = vdw-q
> > > couple-intramol = no ; = no
> > >
> > >
> > >
> > >
> > >
> > > Some of these change due to positional restraint md and energy
> minimization.
> > >
> > >
> > >
> > > All of these settings have come from either tutorials, papers or
> peoples
> > > advice.
> > >
> > >
> > >
> > > If it would be advantageous I can post my entire energy minimization,
> > > positional restraint, md, and FEP mdp files.
> > >
> > >
> > >
> > > Thank you,
> > >
> > > TJ Mustard
> > >
> > >
> > >
> > >
> > >
> > >>>
> > >>>
> > >>> Below is my command sequence:
> > >>>
> > >>>
> > >>>
> > >>> echo
> > >>>
> ==============================================================================================================================
> > >>> date >>RNAP-C.joblog
> > >>> echo g453s-grompp -f em.mdp -c RNAP-C_b4em.gro -p RNAP-C.top -o
> > >>> RNAP-C_em.tpr
> > >>> /share/apps/gromacs-4.5.3-single/bin/g453s-grompp -f em.mdp -c
> > >>> RNAP-C_b4em.gro -p RNAP-C.top -o RNAP-C_em.tpr
> > >>> date >>RNAP-C.joblog
> > >>> echo g453s-mdrun -v -s RNAP-C_em.tpr -c RNAP-C_after_em.gro -g
> > >>> emlog.log -cpo state_em.cpt -nt 2
> > >>> /share/apps/gromacs-4.5.3-single/bin/g453s-mdrun -v -s RNAP-C_em.tpr
> > >>> -c RNAP-C_after_em.gro -g emlog.log -cpo stat_em.cpt -nt 2
> > >>> date >>RNAP-C.joblog
> > >>> echo g453s-grompp -f pr.mdp -c RNAP-C_after_em.gro -p RNAP-C.top -o
> > >>> RNAP-C_pr.tpr
> > >>> /share/apps/gromacs-4.5.3-single/bin/g453s-grompp -f pr.mdp -c
> > >>> RNAP-C_after_em.gro -p RNAP-C.top -o RNAP-C_pr.tpr
> > >>> echo g453s-mdrun -v -s RNAP-C_pr.tpr -e pr.edr -c RNAP-C_after_pr.gro
> > >>> -g prlog.log -cpo state_pr.cpt -nt 2 -dhdl dhdl-pr.xvg
> > >>> /share/apps/gromacs-4.5.3-single/bin/g453s-mdrun -v -s RNAP-C_pr.tpr
> > >>> -e pr.edr -c RNAP-C_after_pr.gro -g prlog.log -cpo state_pr.cpt -nt 2
> > >>> -dhdl dhdl-pr.xvg
> > >>> date >>RNAP-C.joblog
> > >>> echo g453s-grompp -f md.mdp -c RNAP-C_after_pr.gro -p RNAP-C.top -o
> > >>> RNAP-C_md.tpr
> > >>> /share/apps/gromacs-4.5.3-single/bin/g453s-grompp -f md.mdp -c
> > >>> RNAP-C_after_pr.gro -p RNAP-C.top -o RNAP-C_md.tpr
> > >>> date >>RNAP-C.joblog
> > >>> echo g453s-mdrun -v -s RNAP-C_md.tpr -o RNAP-C_md.trr -c
> > >>> RNAP-C_after_md.gro -g md.log -e md.edr -cpo state_md.cpt -nt 2 -dhdl
> > >>> dhdl-md.xvg
> > >>> /share/apps/gromacs-4.5.3-single/bin/g453s-mdrun -v -s RNAP-C_md.tpr
> > >>> -o RNAP-C_md.trr -c RNAP-C_after_md.gro -g md.log -e md.edr -cpo
> > >>> state_md.cpt -nt 2 -dhdl dhdl-md.xvg
> > >>> date >>RNAP-C.joblog
> > >>> echo g453s-grompp -f FEP.mdp -c RNAP-C_after_md.gro -p RNAP-C.top -o
> > >>> RNAP-C_fep.tpr
> > >>> /share/apps/gromacs-4.5.3-single/bin/g453s-grompp -f FEP.mdp -c
> > >>> RNAP-C_after_md.gro -p RNAP-C.top -o RNAP-C_fep.tpr
> > >>> date >>RNAP-C.joblog
> > >>> echo g453s-mdrun -v -s RNAP-C_fep.tpr -o RNAP-C_fep.trr -c
> > >>> RNAP-C_after_fep.gro -g fep.log -e fep.edr -cpo state_fep.cpt -nt 2
> > >>> -dhdl dhdl-fep.xvg
> > >>> /share/apps/gromacs-4.5.3-single/bin/g453s-mdrun -v -s RNAP-C_fep.tpr
> > >>> -o RNAP-C_fep.trr -c RNAP-C_after_fep.gro -g fep.log -e fep.edr -cpo
> > >>> state_fep.cpt -nt 2 -dhdl dhdl-fep.xvg
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> I can add my .mdps but I do not think they are the problem since I
> > >>> know it works on my personal iMac.
> > >>>
> > >>>
> > >>>
> > >>> Thank you,
> > >>>
> > >>> TJ Mustard
> > >>> Email: mustardt at onid.orst.edu <mailto:mustardt at onid.orst.edu>
> > >>>
> > >>
> > >
> > >
> > > TJ Mustard
> > > Email: mustardt at onid.orst.edu
> > >
> >
> > --
> > ========================================
> >
> > Justin A. Lemkul
> > Ph.D. Candidate
> > ICTAS Doctoral Scholar
> > MILES-IGERT Trainee
> > Department of Biochemistry
> > Virginia Tech
> > Blacksburg, VA
> > jalemkul[at]vt.edu | (540) 231-9080
> > http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin
> >
> > ========================================
> > --
> > gmx-users mailing list gmx-users at gromacs.org
> > http://lists.gromacs.org/mailman/listinfo/gmx-users
> > Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> > Please don't post (un)subscribe requests to the list. Use the
> > www interface or send it to gmx-users-request at gromacs.org.
> > Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
>
>
>
> TJ Mustard
> Email: mustardt at onid.orst.edu
>
--
========================================
Justin A. Lemkul
Ph.D. Candidate
ICTAS Doctoral Scholar
MILES-IGERT Trainee
Department of Biochemistry
Virginia Tech
Blacksburg, VA
jalemkul[at]vt.edu | (540) 231-9080
http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin
========================================
More information about the gromacs.org_gmx-users
mailing list