[gmx-users] mdrun initialises, fails to run, no error message
Natalie Tatum
nataliejtatum at gmail.com
Tue Jan 10 10:58:54 CET 2017
Hi Mark,
So using one GPU, with say 6 of 12 logical cores, something like this would
be more appropriate?
gmx mdrun -gpu_id 0 -nt 6 -pin on
Adding an offset for any second process?
Natalie
On 9 January 2017 at 15:18, Mark Abraham <mark.j.abraham at gmail.com> wrote:
> Hi,
>
> That's still likely disastrous for performance. Mdrun uses all the cores of
> the CPU that you permit, as well as the GPU, and running two mdrun on the
> same cores risks a super-linear slowdown. See suggested examples at
> http://manual.gromacs.org/documentation/2016.1/user-
> guide/mdrun-performance.html#examples-for-mdrun-on-one-node
>
> Mark
>
> On Mon, 9 Jan 2017 16:12 Natalie Tatum <nataliejtatum at gmail.com> wrote:
>
> > Dear Justin,
> >
> > Thanks for the advice - after a clean up, a reboot, and some careful
> > application of commands, everything seems to be running nicely again.
> > Switching the call to below (instead of using -deffnm) is working.
> >
> > gmx mdrun -s md.tpr -gpu_id 1 &
> >
> > Many thanks,
> >
> > Natalie
> >
> >
> >
> >
> > On 4 January 2017 at 01:02, Justin Lemkul <jalemkul at vt.edu> wrote:
> >
> > >
> > >
> > > On 1/3/17 10:43 AM, Natalie Tatum wrote:
> > >
> > >> Dear all,
> > >>
> > >> I'm hoping you can shed light on (a) what my mdrun problem is and (b)
> > >> where
> > >> to start fixing it.
> > >>
> > >> I'm simulating different mutants of a protein dimer on DNA, for 10 ns
> > >> a-piece. I have successfully run this protocol on the wild-type
> protein,
> > >> on
> > >> two single residue mutants, and on a double mutant. I came to run the
> > same
> > >> on a fourth, single site mutant. I have followed the same protocols
> and
> > >> utilised the same MDP settings throughout. All were subject to 5000
> > steps
> > >> of steepest-descent energy minimisation, then 200 ps of equilibration
> in
> > >> the NVT ensemble, then the same in the NPT. For this particular mutant
> > >> there were no issues apparent going into production MD. Therefore, I
> > don't
> > >> think it's an issue of my MDP setup or system...
> > >>
> > >> So I have two compatible (OpenCL 1.2) AMD Radeon HD Firepro D300 GPUs,
> > and
> > >> I have one mutant (run/process) assigned to each.
> > >>
> > >> For this mutant I call mdrun with:
> > >>
> > >> gmx mdrun -deffnm md -gpu_id 1 &
> > >>
> > >> Whereas the other is on -gpu_id 0, and walk away. This worked
> > successfully
> > >> in the week prior for two other systems. It's New Year, then I come
> back
> > >> to
> > >> what should be completed simulations this morning to get my hands
> dirty
> > in
> > >> analysis.
> > >>
> > >> Run on gpu 0 has completed successfully, all is grand.
> > >>
> > >> Mutant on gpu 1 has not. Attempts to resume/restart fail (on either
> GPU,
> > >> or
> > >> both, or calling neither explicitly). All output looks like this:
> > >>
> > >> GROMACS: gmx mdrun, VERSION 5.1.3
> > >>
> > >> Executable: /usr/local/gromacs/bin/gmx
> > >>
> > >> Data prefix: /usr/local/gromacs
> > >>
> > >> Command line:
> > >>
> > >>
> > >>
> > >> gmx mdrun -deffnm md
> > >>
> > >>
> > > From the .log, it appears your command was not what you think it was.
> Is
> > > it possible that the job failed because mdrun tried to consume all
> > > available hardware and got hung up?
> > >
> > >
> > >>
> > >> GROMACS version: VERSION 5.1.3
> > >>
> > >> Precision: single
> > >>
> > >> Memory model: 64 bit
> > >>
> > >> MPI library: thread_mpi
> > >>
> > >> OpenMP support: disabled
> > >>
> > >> GPU support: enabled
> > >>
> > >> OpenCL support: enabled
> > >>
> > >> invsqrt routine: gmx_software_invsqrt(x)
> > >>
> > >> SIMD instructions: AVX_256
> > >>
> > >> FFT library: fftw-3.3.4-sse2
> > >>
> > >> RDTSCP usage: enabled
> > >>
> > >> C++11 compilation: disabled
> > >>
> > >> TNG support: enabled
> > >>
> > >> Tracing support: disabled
> > >>
> > >> Built on: Mon 1 Aug 2016 17:20:18 BST
> > >>
> > >> Built by: natalie at t <natalie at nicr00353.ncl.ac.uk>
> > >> hemachineIuse.here.there [CMAKE]
> > >>
> > >>
> > >> Build OS/arch: Darwin 15.5.0 x86_64
> > >>
> > >> Build CPU vendor: GenuineIntel
> > >>
> > >> Build CPU brand: Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz
> > >>
> > >> Build CPU family: 6 Model: 62 Stepping: 4
> > >>
> > >> Build CPU features: aes apic avx clfsh cmov cx8 cx16 f16c htt lahf_lm
> > mmx
> > >> msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp
> sse2
> > >> sse3 sse4.1 sse4.2 ssse3 tdt x2apic
> > >>
> > >> C compiler: /Applications/Xcode.app/Conte
> > >> nts/Developer/Toolchains/
> > >> XcodeDefault.xctoolchain/usr/bin/cc Clang 7.3.0.7030031
> > >>
> > >> C compiler flags: -mavx -Wall -Wno-unused -Wunused-value
> > >> -Wunused-parameter -Wno-unknown-pragmas -O3 -DNDEBUG
> > >>
> > >> C++ compiler: /Applications/Xcode.app/Conte
> > >> nts/Developer/Toolchains/
> > >> XcodeDefault.xctoolchain/usr/bin/c++ Clang 7.3.0.7030031
> > >>
> > >> C++ compiler flags: -mavx -Wextra -Wno-missing-field-initializers
> > >> -Wpointer-arith -Wall -Wno-unused-function -Wno-unknown-pragmas -O3
> > >> -DNDEBUG
> > >>
> > >> Boost version: 1.60.0 (external)
> > >>
> > >> OpenCL include dir: /System/Library/Frameworks/OpenCL.framework
> > >>
> > >> OpenCL library: /System/Library/Frameworks/OPENCL.framework
> > >>
> > >> OpenCL version: 1.2
> > >>
> > >>
> > >> And there it ends. No files except the log shown above - and though
> this
> > >> initial output looks identical in content to the beginnings of logs
> for
> > >> successful simulations, mdrun does not then seem to engage with the
> > >> GPU/CPUs available.
> > >>
> > >> There are no error messages, no apparent indication as to where this
> has
> > >> gone wrong... And now I can't run mdrun at all, for any system.
> > >>
> > >>
> > > Test whether or not your GPU is still accessible and capable of running
> > > test programs.
> > >
> > > -Justin
> > >
> > > I've checked my disk space (fine, >100 GB available), I'm able to call
> > and
> > >> execute other gmx commands, but mdrun does the above.
> > >>
> > >> The closest error I can find with my google-fu is three years ago
> where
> > >> this user (
> > >> http://gromacs.org_gmx-users.maillist.sys.kth.narkive.com/FE
> > >> dWd6gC/mdrun-no-error-but-hangs-no-results
> > >> ) got no error but a killed process, but I don't even get as far as
> > >> detection of CPUs/GPUs or domain decomposition.
> > >>
> > >> Any suggestions much appreciated,
> > >>
> > >> Natalie
> > >>
> > >>
> > > --
> > > ==================================================
> > >
> > > Justin A. Lemkul, Ph.D.
> > > Ruth L. Kirschstein NRSA Postdoctoral Fellow
> > >
> > > Department of Pharmaceutical Sciences
> > > School of Pharmacy
> > > Health Sciences Facility II, Room 629
> > > University of Maryland, Baltimore
> > > 20 Penn St.
> > > Baltimore, MD 21201
> > >
> > > jalemkul at outerbanks.umaryland.edu | (410) 706-7441
> > > http://mackerell.umaryland.edu/~jalemkul
> > >
> > > ==================================================
> > > --
> > > Gromacs Users mailing list
> > >
> > > * Please search the archive at http://www.gromacs.org/Support
> > > /Mailing_Lists/GMX-Users_List before posting!
> > >
> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >
> > > * For (un)subscribe requests visit
> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > > send a mail to gmx-users-request at gromacs.org.
> > >
> >
> >
> >
> > --
> > *Dr. Natalie J. Tatum*
> > Post-doctoral Research Associate
> > Northern Institute for Cancer Research
> > Newcastle University
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> >
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/
> Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>
--
*Dr. Natalie J. Tatum*
Post-doctoral Research Associate
Northern Institute for Cancer Research
Newcastle University
More information about the gromacs.org_gmx-users
mailing list