[gmx-users] mdrun initialises, fails to run, no error message
Natalie Tatum
nataliejtatum at gmail.com
Tue Jan 10 11:35:39 CET 2017
Hi Mark,
Excellent, thanks for the helpful guidance!
Natalie
On 10 January 2017 at 10:12, Mark Abraham <mark.j.abraham at gmail.com> wrote:
> Hi,
>
> Yes. Or if you are running two quite similar simulations of the same
> length, use gmx_mpi mdrun -multidir first second and leave the details to
> mdrun (it'll use everything it can, and the default is what you want).
> Check the performance against the above run for sanity.
>
> Mark
>
> On Tue, Jan 10, 2017 at 10:59 AM Natalie Tatum <nataliejtatum at gmail.com>
> wrote:
>
> > Hi Mark,
> >
> > So using one GPU, with say 6 of 12 logical cores, something like this
> would
> > be more appropriate?
> >
> > gmx mdrun -gpu_id 0 -nt 6 -pin on
> >
> > Adding an offset for any second process?
> >
> >
> > Natalie
> >
> > On 9 January 2017 at 15:18, Mark Abraham <mark.j.abraham at gmail.com>
> wrote:
> >
> > > Hi,
> > >
> > > That's still likely disastrous for performance. Mdrun uses all the
> cores
> > of
> > > the CPU that you permit, as well as the GPU, and running two mdrun on
> the
> > > same cores risks a super-linear slowdown. See suggested examples at
> > > http://manual.gromacs.org/documentation/2016.1/user-
> > > guide/mdrun-performance.html#examples-for-mdrun-on-one-node
> > >
> > > Mark
> > >
> > > On Mon, 9 Jan 2017 16:12 Natalie Tatum <nataliejtatum at gmail.com>
> wrote:
> > >
> > > > Dear Justin,
> > > >
> > > > Thanks for the advice - after a clean up, a reboot, and some careful
> > > > application of commands, everything seems to be running nicely again.
> > > > Switching the call to below (instead of using -deffnm) is working.
> > > >
> > > > gmx mdrun -s md.tpr -gpu_id 1 &
> > > >
> > > > Many thanks,
> > > >
> > > > Natalie
> > > >
> > > >
> > > >
> > > >
> > > > On 4 January 2017 at 01:02, Justin Lemkul <jalemkul at vt.edu> wrote:
> > > >
> > > > >
> > > > >
> > > > > On 1/3/17 10:43 AM, Natalie Tatum wrote:
> > > > >
> > > > >> Dear all,
> > > > >>
> > > > >> I'm hoping you can shed light on (a) what my mdrun problem is and
> > (b)
> > > > >> where
> > > > >> to start fixing it.
> > > > >>
> > > > >> I'm simulating different mutants of a protein dimer on DNA, for 10
> > ns
> > > > >> a-piece. I have successfully run this protocol on the wild-type
> > > protein,
> > > > >> on
> > > > >> two single residue mutants, and on a double mutant. I came to run
> > the
> > > > same
> > > > >> on a fourth, single site mutant. I have followed the same
> protocols
> > > and
> > > > >> utilised the same MDP settings throughout. All were subject to
> 5000
> > > > steps
> > > > >> of steepest-descent energy minimisation, then 200 ps of
> > equilibration
> > > in
> > > > >> the NVT ensemble, then the same in the NPT. For this particular
> > mutant
> > > > >> there were no issues apparent going into production MD.
> Therefore, I
> > > > don't
> > > > >> think it's an issue of my MDP setup or system...
> > > > >>
> > > > >> So I have two compatible (OpenCL 1.2) AMD Radeon HD Firepro D300
> > GPUs,
> > > > and
> > > > >> I have one mutant (run/process) assigned to each.
> > > > >>
> > > > >> For this mutant I call mdrun with:
> > > > >>
> > > > >> gmx mdrun -deffnm md -gpu_id 1 &
> > > > >>
> > > > >> Whereas the other is on -gpu_id 0, and walk away. This worked
> > > > successfully
> > > > >> in the week prior for two other systems. It's New Year, then I
> come
> > > back
> > > > >> to
> > > > >> what should be completed simulations this morning to get my hands
> > > dirty
> > > > in
> > > > >> analysis.
> > > > >>
> > > > >> Run on gpu 0 has completed successfully, all is grand.
> > > > >>
> > > > >> Mutant on gpu 1 has not. Attempts to resume/restart fail (on
> either
> > > GPU,
> > > > >> or
> > > > >> both, or calling neither explicitly). All output looks like this:
> > > > >>
> > > > >> GROMACS: gmx mdrun, VERSION 5.1.3
> > > > >>
> > > > >> Executable: /usr/local/gromacs/bin/gmx
> > > > >>
> > > > >> Data prefix: /usr/local/gromacs
> > > > >>
> > > > >> Command line:
> > > > >>
> > > > >>
> > > > >>
> > > > >> gmx mdrun -deffnm md
> > > > >>
> > > > >>
> > > > > From the .log, it appears your command was not what you think it
> was.
> > > Is
> > > > > it possible that the job failed because mdrun tried to consume all
> > > > > available hardware and got hung up?
> > > > >
> > > > >
> > > > >>
> > > > >> GROMACS version: VERSION 5.1.3
> > > > >>
> > > > >> Precision: single
> > > > >>
> > > > >> Memory model: 64 bit
> > > > >>
> > > > >> MPI library: thread_mpi
> > > > >>
> > > > >> OpenMP support: disabled
> > > > >>
> > > > >> GPU support: enabled
> > > > >>
> > > > >> OpenCL support: enabled
> > > > >>
> > > > >> invsqrt routine: gmx_software_invsqrt(x)
> > > > >>
> > > > >> SIMD instructions: AVX_256
> > > > >>
> > > > >> FFT library: fftw-3.3.4-sse2
> > > > >>
> > > > >> RDTSCP usage: enabled
> > > > >>
> > > > >> C++11 compilation: disabled
> > > > >>
> > > > >> TNG support: enabled
> > > > >>
> > > > >> Tracing support: disabled
> > > > >>
> > > > >> Built on: Mon 1 Aug 2016 17:20:18 BST
> > > > >>
> > > > >> Built by: natalie at t <natalie at nicr00353.ncl.ac.uk>
> > > > >> hemachineIuse.here.there [CMAKE]
> > > > >>
> > > > >>
> > > > >> Build OS/arch: Darwin 15.5.0 x86_64
> > > > >>
> > > > >> Build CPU vendor: GenuineIntel
> > > > >>
> > > > >> Build CPU brand: Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz
> > > > >>
> > > > >> Build CPU family: 6 Model: 62 Stepping: 4
> > > > >>
> > > > >> Build CPU features: aes apic avx clfsh cmov cx8 cx16 f16c htt
> > lahf_lm
> > > > mmx
> > > > >> msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp
> > > sse2
> > > > >> sse3 sse4.1 sse4.2 ssse3 tdt x2apic
> > > > >>
> > > > >> C compiler: /Applications/Xcode.app/Conte
> > > > >> nts/Developer/Toolchains/
> > > > >> XcodeDefault.xctoolchain/usr/bin/cc Clang 7.3.0.7030031
> > > > >>
> > > > >> C compiler flags: -mavx -Wall -Wno-unused -Wunused-value
> > > > >> -Wunused-parameter -Wno-unknown-pragmas -O3 -DNDEBUG
> > > > >>
> > > > >> C++ compiler: /Applications/Xcode.app/Conte
> > > > >> nts/Developer/Toolchains/
> > > > >> XcodeDefault.xctoolchain/usr/bin/c++ Clang 7.3.0.7030031
> > > > >>
> > > > >> C++ compiler flags: -mavx -Wextra
> > -Wno-missing-field-initializers
> > > > >> -Wpointer-arith -Wall -Wno-unused-function -Wno-unknown-pragmas
> -O3
> > > > >> -DNDEBUG
> > > > >>
> > > > >> Boost version: 1.60.0 (external)
> > > > >>
> > > > >> OpenCL include dir: /System/Library/Frameworks/OpenCL.framework
> > > > >>
> > > > >> OpenCL library: /System/Library/Frameworks/OPENCL.framework
> > > > >>
> > > > >> OpenCL version: 1.2
> > > > >>
> > > > >>
> > > > >> And there it ends. No files except the log shown above - and
> though
> > > this
> > > > >> initial output looks identical in content to the beginnings of
> logs
> > > for
> > > > >> successful simulations, mdrun does not then seem to engage with
> the
> > > > >> GPU/CPUs available.
> > > > >>
> > > > >> There are no error messages, no apparent indication as to where
> this
> > > has
> > > > >> gone wrong... And now I can't run mdrun at all, for any system.
> > > > >>
> > > > >>
> > > > > Test whether or not your GPU is still accessible and capable of
> > running
> > > > > test programs.
> > > > >
> > > > > -Justin
> > > > >
> > > > > I've checked my disk space (fine, >100 GB available), I'm able to
> > call
> > > > and
> > > > >> execute other gmx commands, but mdrun does the above.
> > > > >>
> > > > >> The closest error I can find with my google-fu is three years ago
> > > where
> > > > >> this user (
> > > > >> http://gromacs.org_gmx-users.maillist.sys.kth.narkive.com/FE
> > > > >> dWd6gC/mdrun-no-error-but-hangs-no-results
> > > > >> ) got no error but a killed process, but I don't even get as far
> as
> > > > >> detection of CPUs/GPUs or domain decomposition.
> > > > >>
> > > > >> Any suggestions much appreciated,
> > > > >>
> > > > >> Natalie
> > > > >>
> > > > >>
> > > > > --
> > > > > ==================================================
> > > > >
> > > > > Justin A. Lemkul, Ph.D.
> > > > > Ruth L. Kirschstein NRSA Postdoctoral Fellow
> > > > >
> > > > > Department of Pharmaceutical Sciences
> > > > > School of Pharmacy
> > > > > Health Sciences Facility II, Room 629
> > > > > University of Maryland, Baltimore
> > > > > 20 Penn St.
> > > > > Baltimore, MD 21201
> > > > >
> > > > > jalemkul at outerbanks.umaryland.edu | (410) 706-7441
> > > > > http://mackerell.umaryland.edu/~jalemkul
> > > > >
> > > > > ==================================================
> > > > > --
> > > > > Gromacs Users mailing list
> > > > >
> > > > > * Please search the archive at http://www.gromacs.org/Support
> > > > > /Mailing_Lists/GMX-Users_List before posting!
> > > > >
> > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > > > >
> > > > > * For (un)subscribe requests visit
> > > > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> > or
> > > > > send a mail to gmx-users-request at gromacs.org.
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > *Dr. Natalie J. Tatum*
> > > > Post-doctoral Research Associate
> > > > Northern Institute for Cancer Research
> > > > Newcastle University
> > > > --
> > > > Gromacs Users mailing list
> > > >
> > > > * Please search the archive at
> > > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > > > posting!
> > > >
> > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > > >
> > > > * For (un)subscribe requests visit
> > > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> or
> > > > send a mail to gmx-users-request at gromacs.org.
> > > >
> > > --
> > > Gromacs Users mailing list
> > >
> > > * Please search the archive at http://www.gromacs.org/
> > > Support/Mailing_Lists/GMX-Users_List before posting!
> > >
> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >
> > > * For (un)subscribe requests visit
> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > > send a mail to gmx-users-request at gromacs.org.
> > >
> >
> >
> >
> > --
> > *Dr. Natalie J. Tatum*
> > Post-doctoral Research Associate
> > Northern Institute for Cancer Research
> > Newcastle University
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> >
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/
> Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>
--
*Dr. Natalie J. Tatum*
Post-doctoral Research Associate
Northern Institute for Cancer Research
Newcastle University
More information about the gromacs.org_gmx-users
mailing list