[gmx-users] mdrun initialises, fails to run, no error message

Mark Abraham mark.j.abraham at gmail.com
Mon Jan 9 16:18:48 CET 2017


Hi,

That's still likely disastrous for performance. Mdrun uses all the cores of
the CPU that you permit, as well as the GPU, and running two mdrun on the
same cores risks a super-linear slowdown. See suggested examples at
http://manual.gromacs.org/documentation/2016.1/user-guide/mdrun-performance.html#examples-for-mdrun-on-one-node

Mark

On Mon, 9 Jan 2017 16:12 Natalie Tatum <nataliejtatum at gmail.com> wrote:

> Dear Justin,
>
> Thanks for the advice - after a clean up, a reboot, and some careful
> application of commands, everything seems to be running nicely again.
> Switching the call to below (instead of using -deffnm) is working.
>
> gmx mdrun -s md.tpr -gpu_id 1 &
>
> Many thanks,
>
> Natalie
>
>
>
>
> On 4 January 2017 at 01:02, Justin Lemkul <jalemkul at vt.edu> wrote:
>
> >
> >
> > On 1/3/17 10:43 AM, Natalie Tatum wrote:
> >
> >> Dear all,
> >>
> >> I'm hoping you can shed light on (a) what my mdrun problem is and (b)
> >> where
> >> to start fixing it.
> >>
> >> I'm simulating different mutants of a protein dimer on DNA, for 10 ns
> >> a-piece. I have successfully run this protocol on the wild-type protein,
> >> on
> >> two single residue mutants, and on a double mutant. I came to run the
> same
> >> on a fourth, single site mutant. I have followed the same protocols and
> >> utilised the same MDP settings throughout. All were subject to 5000
> steps
> >> of steepest-descent energy minimisation, then 200 ps of equilibration in
> >> the NVT ensemble, then the same in the NPT. For this particular mutant
> >> there were no issues apparent going into production MD. Therefore, I
> don't
> >> think it's an issue of my MDP setup or system...
> >>
> >> So I have two compatible (OpenCL 1.2) AMD Radeon HD Firepro D300 GPUs,
> and
> >> I have one mutant (run/process) assigned to each.
> >>
> >> For this mutant I call mdrun with:
> >>
> >> gmx mdrun -deffnm md -gpu_id 1 &
> >>
> >> Whereas the other is on -gpu_id 0, and walk away. This worked
> successfully
> >> in the week prior for two other systems. It's New Year, then I come back
> >> to
> >> what should be completed simulations this morning to get my hands dirty
> in
> >> analysis.
> >>
> >> Run on gpu 0 has completed successfully, all is grand.
> >>
> >> Mutant on gpu 1 has not. Attempts to resume/restart fail (on either GPU,
> >> or
> >> both, or calling neither explicitly). All output looks like this:
> >>
> >> GROMACS:      gmx mdrun, VERSION 5.1.3
> >>
> >> Executable:   /usr/local/gromacs/bin/gmx
> >>
> >> Data prefix:  /usr/local/gromacs
> >>
> >> Command line:
> >>
> >>
> >>
> >>   gmx mdrun -deffnm md
> >>
> >>
> > From the .log, it appears your command was not what you think it was.  Is
> > it possible that the job failed because mdrun tried to consume all
> > available hardware and got hung up?
> >
> >
> >>
> >> GROMACS version:    VERSION 5.1.3
> >>
> >> Precision:          single
> >>
> >> Memory model:       64 bit
> >>
> >> MPI library:        thread_mpi
> >>
> >> OpenMP support:     disabled
> >>
> >> GPU support:        enabled
> >>
> >> OpenCL support:     enabled
> >>
> >> invsqrt routine:    gmx_software_invsqrt(x)
> >>
> >> SIMD instructions:  AVX_256
> >>
> >> FFT library:        fftw-3.3.4-sse2
> >>
> >> RDTSCP usage:       enabled
> >>
> >> C++11 compilation:  disabled
> >>
> >> TNG support:        enabled
> >>
> >> Tracing support:    disabled
> >>
> >> Built on:           Mon  1 Aug 2016 17:20:18 BST
> >>
> >> Built by:           natalie at t <natalie at nicr00353.ncl.ac.uk>
> >> hemachineIuse.here.there [CMAKE]
> >>
> >>
> >> Build OS/arch:      Darwin 15.5.0 x86_64
> >>
> >> Build CPU vendor:   GenuineIntel
> >>
> >> Build CPU brand:    Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz
> >>
> >> Build CPU family:   6   Model: 62   Stepping: 4
> >>
> >> Build CPU features: aes apic avx clfsh cmov cx8 cx16 f16c htt lahf_lm
> mmx
> >> msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp sse2
> >> sse3 sse4.1 sse4.2 ssse3 tdt x2apic
> >>
> >> C compiler:         /Applications/Xcode.app/Conte
> >> nts/Developer/Toolchains/
> >> XcodeDefault.xctoolchain/usr/bin/cc Clang 7.3.0.7030031
> >>
> >> C compiler flags:    -mavx    -Wall -Wno-unused -Wunused-value
> >> -Wunused-parameter -Wno-unknown-pragmas  -O3 -DNDEBUG
> >>
> >> C++ compiler:       /Applications/Xcode.app/Conte
> >> nts/Developer/Toolchains/
> >> XcodeDefault.xctoolchain/usr/bin/c++ Clang 7.3.0.7030031
> >>
> >> C++ compiler flags:  -mavx    -Wextra -Wno-missing-field-initializers
> >> -Wpointer-arith -Wall -Wno-unused-function -Wno-unknown-pragmas  -O3
> >> -DNDEBUG
> >>
> >> Boost version:      1.60.0 (external)
> >>
> >> OpenCL include dir: /System/Library/Frameworks/OpenCL.framework
> >>
> >> OpenCL library:     /System/Library/Frameworks/OPENCL.framework
> >>
> >> OpenCL version:     1.2
> >>
> >>
> >> And there it ends. No files except the log shown above - and though this
> >> initial output looks identical in content to the beginnings of logs for
> >> successful simulations, mdrun does not then seem to engage with the
> >> GPU/CPUs available.
> >>
> >> There are no error messages, no apparent indication as to where this has
> >> gone wrong... And now I can't run mdrun at all, for any system.
> >>
> >>
> > Test whether or not your GPU is still accessible and capable of running
> > test programs.
> >
> > -Justin
> >
> > I've checked my disk space (fine, >100 GB available), I'm able to call
> and
> >> execute other gmx commands, but mdrun does the above.
> >>
> >> The closest error I can find with my google-fu is three years ago where
> >> this user (
> >> http://gromacs.org_gmx-users.maillist.sys.kth.narkive.com/FE
> >> dWd6gC/mdrun-no-error-but-hangs-no-results
> >> ) got no error but a killed process, but I don't even get as far as
> >> detection of CPUs/GPUs or domain decomposition.
> >>
> >> Any suggestions much appreciated,
> >>
> >> Natalie
> >>
> >>
> > --
> > ==================================================
> >
> > Justin A. Lemkul, Ph.D.
> > Ruth L. Kirschstein NRSA Postdoctoral Fellow
> >
> > Department of Pharmaceutical Sciences
> > School of Pharmacy
> > Health Sciences Facility II, Room 629
> > University of Maryland, Baltimore
> > 20 Penn St.
> > Baltimore, MD 21201
> >
> > jalemkul at outerbanks.umaryland.edu | (410) 706-7441
> > http://mackerell.umaryland.edu/~jalemkul
> >
> > ==================================================
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at http://www.gromacs.org/Support
> > /Mailing_Lists/GMX-Users_List before posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> >
>
>
>
> --
> *Dr. Natalie J. Tatum*
> Post-doctoral Research Associate
> Northern Institute for Cancer Research
> Newcastle University
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>


More information about the gromacs.org_gmx-users mailing list