[gmx-users] mdrun initialises, fails to run, no error message

Mark Abraham mark.j.abraham at gmail.com
Tue Jan 10 11:12:50 CET 2017


Hi,

Yes. Or if you are running two quite similar simulations of the same
length, use gmx_mpi mdrun -multidir first second and leave the details to
mdrun (it'll use everything it can, and the default is what you want).
Check the performance against the above run for sanity.

Mark

On Tue, Jan 10, 2017 at 10:59 AM Natalie Tatum <nataliejtatum at gmail.com>
wrote:

> Hi Mark,
>
> So using one GPU, with say 6 of 12 logical cores, something like this would
> be more appropriate?
>
> gmx mdrun -gpu_id 0 -nt 6 -pin on
>
> Adding an offset for any second process?
>
>
> Natalie
>
> On 9 January 2017 at 15:18, Mark Abraham <mark.j.abraham at gmail.com> wrote:
>
> > Hi,
> >
> > That's still likely disastrous for performance. Mdrun uses all the cores
> of
> > the CPU that you permit, as well as the GPU, and running two mdrun on the
> > same cores risks a super-linear slowdown. See suggested examples at
> > http://manual.gromacs.org/documentation/2016.1/user-
> > guide/mdrun-performance.html#examples-for-mdrun-on-one-node
> >
> > Mark
> >
> > On Mon, 9 Jan 2017 16:12 Natalie Tatum <nataliejtatum at gmail.com> wrote:
> >
> > > Dear Justin,
> > >
> > > Thanks for the advice - after a clean up, a reboot, and some careful
> > > application of commands, everything seems to be running nicely again.
> > > Switching the call to below (instead of using -deffnm) is working.
> > >
> > > gmx mdrun -s md.tpr -gpu_id 1 &
> > >
> > > Many thanks,
> > >
> > > Natalie
> > >
> > >
> > >
> > >
> > > On 4 January 2017 at 01:02, Justin Lemkul <jalemkul at vt.edu> wrote:
> > >
> > > >
> > > >
> > > > On 1/3/17 10:43 AM, Natalie Tatum wrote:
> > > >
> > > >> Dear all,
> > > >>
> > > >> I'm hoping you can shed light on (a) what my mdrun problem is and
> (b)
> > > >> where
> > > >> to start fixing it.
> > > >>
> > > >> I'm simulating different mutants of a protein dimer on DNA, for 10
> ns
> > > >> a-piece. I have successfully run this protocol on the wild-type
> > protein,
> > > >> on
> > > >> two single residue mutants, and on a double mutant. I came to run
> the
> > > same
> > > >> on a fourth, single site mutant. I have followed the same protocols
> > and
> > > >> utilised the same MDP settings throughout. All were subject to 5000
> > > steps
> > > >> of steepest-descent energy minimisation, then 200 ps of
> equilibration
> > in
> > > >> the NVT ensemble, then the same in the NPT. For this particular
> mutant
> > > >> there were no issues apparent going into production MD. Therefore, I
> > > don't
> > > >> think it's an issue of my MDP setup or system...
> > > >>
> > > >> So I have two compatible (OpenCL 1.2) AMD Radeon HD Firepro D300
> GPUs,
> > > and
> > > >> I have one mutant (run/process) assigned to each.
> > > >>
> > > >> For this mutant I call mdrun with:
> > > >>
> > > >> gmx mdrun -deffnm md -gpu_id 1 &
> > > >>
> > > >> Whereas the other is on -gpu_id 0, and walk away. This worked
> > > successfully
> > > >> in the week prior for two other systems. It's New Year, then I come
> > back
> > > >> to
> > > >> what should be completed simulations this morning to get my hands
> > dirty
> > > in
> > > >> analysis.
> > > >>
> > > >> Run on gpu 0 has completed successfully, all is grand.
> > > >>
> > > >> Mutant on gpu 1 has not. Attempts to resume/restart fail (on either
> > GPU,
> > > >> or
> > > >> both, or calling neither explicitly). All output looks like this:
> > > >>
> > > >> GROMACS:      gmx mdrun, VERSION 5.1.3
> > > >>
> > > >> Executable:   /usr/local/gromacs/bin/gmx
> > > >>
> > > >> Data prefix:  /usr/local/gromacs
> > > >>
> > > >> Command line:
> > > >>
> > > >>
> > > >>
> > > >>   gmx mdrun -deffnm md
> > > >>
> > > >>
> > > > From the .log, it appears your command was not what you think it was.
> > Is
> > > > it possible that the job failed because mdrun tried to consume all
> > > > available hardware and got hung up?
> > > >
> > > >
> > > >>
> > > >> GROMACS version:    VERSION 5.1.3
> > > >>
> > > >> Precision:          single
> > > >>
> > > >> Memory model:       64 bit
> > > >>
> > > >> MPI library:        thread_mpi
> > > >>
> > > >> OpenMP support:     disabled
> > > >>
> > > >> GPU support:        enabled
> > > >>
> > > >> OpenCL support:     enabled
> > > >>
> > > >> invsqrt routine:    gmx_software_invsqrt(x)
> > > >>
> > > >> SIMD instructions:  AVX_256
> > > >>
> > > >> FFT library:        fftw-3.3.4-sse2
> > > >>
> > > >> RDTSCP usage:       enabled
> > > >>
> > > >> C++11 compilation:  disabled
> > > >>
> > > >> TNG support:        enabled
> > > >>
> > > >> Tracing support:    disabled
> > > >>
> > > >> Built on:           Mon  1 Aug 2016 17:20:18 BST
> > > >>
> > > >> Built by:           natalie at t <natalie at nicr00353.ncl.ac.uk>
> > > >> hemachineIuse.here.there [CMAKE]
> > > >>
> > > >>
> > > >> Build OS/arch:      Darwin 15.5.0 x86_64
> > > >>
> > > >> Build CPU vendor:   GenuineIntel
> > > >>
> > > >> Build CPU brand:    Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz
> > > >>
> > > >> Build CPU family:   6   Model: 62   Stepping: 4
> > > >>
> > > >> Build CPU features: aes apic avx clfsh cmov cx8 cx16 f16c htt
> lahf_lm
> > > mmx
> > > >> msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp
> > sse2
> > > >> sse3 sse4.1 sse4.2 ssse3 tdt x2apic
> > > >>
> > > >> C compiler:         /Applications/Xcode.app/Conte
> > > >> nts/Developer/Toolchains/
> > > >> XcodeDefault.xctoolchain/usr/bin/cc Clang 7.3.0.7030031
> > > >>
> > > >> C compiler flags:    -mavx    -Wall -Wno-unused -Wunused-value
> > > >> -Wunused-parameter -Wno-unknown-pragmas  -O3 -DNDEBUG
> > > >>
> > > >> C++ compiler:       /Applications/Xcode.app/Conte
> > > >> nts/Developer/Toolchains/
> > > >> XcodeDefault.xctoolchain/usr/bin/c++ Clang 7.3.0.7030031
> > > >>
> > > >> C++ compiler flags:  -mavx    -Wextra
> -Wno-missing-field-initializers
> > > >> -Wpointer-arith -Wall -Wno-unused-function -Wno-unknown-pragmas  -O3
> > > >> -DNDEBUG
> > > >>
> > > >> Boost version:      1.60.0 (external)
> > > >>
> > > >> OpenCL include dir: /System/Library/Frameworks/OpenCL.framework
> > > >>
> > > >> OpenCL library:     /System/Library/Frameworks/OPENCL.framework
> > > >>
> > > >> OpenCL version:     1.2
> > > >>
> > > >>
> > > >> And there it ends. No files except the log shown above - and though
> > this
> > > >> initial output looks identical in content to the beginnings of logs
> > for
> > > >> successful simulations, mdrun does not then seem to engage with the
> > > >> GPU/CPUs available.
> > > >>
> > > >> There are no error messages, no apparent indication as to where this
> > has
> > > >> gone wrong... And now I can't run mdrun at all, for any system.
> > > >>
> > > >>
> > > > Test whether or not your GPU is still accessible and capable of
> running
> > > > test programs.
> > > >
> > > > -Justin
> > > >
> > > > I've checked my disk space (fine, >100 GB available), I'm able to
> call
> > > and
> > > >> execute other gmx commands, but mdrun does the above.
> > > >>
> > > >> The closest error I can find with my google-fu is three years ago
> > where
> > > >> this user (
> > > >> http://gromacs.org_gmx-users.maillist.sys.kth.narkive.com/FE
> > > >> dWd6gC/mdrun-no-error-but-hangs-no-results
> > > >> ) got no error but a killed process, but I don't even get as far as
> > > >> detection of CPUs/GPUs or domain decomposition.
> > > >>
> > > >> Any suggestions much appreciated,
> > > >>
> > > >> Natalie
> > > >>
> > > >>
> > > > --
> > > > ==================================================
> > > >
> > > > Justin A. Lemkul, Ph.D.
> > > > Ruth L. Kirschstein NRSA Postdoctoral Fellow
> > > >
> > > > Department of Pharmaceutical Sciences
> > > > School of Pharmacy
> > > > Health Sciences Facility II, Room 629
> > > > University of Maryland, Baltimore
> > > > 20 Penn St.
> > > > Baltimore, MD 21201
> > > >
> > > > jalemkul at outerbanks.umaryland.edu | (410) 706-7441
> > > > http://mackerell.umaryland.edu/~jalemkul
> > > >
> > > > ==================================================
> > > > --
> > > > Gromacs Users mailing list
> > > >
> > > > * Please search the archive at http://www.gromacs.org/Support
> > > > /Mailing_Lists/GMX-Users_List before posting!
> > > >
> > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > > >
> > > > * For (un)subscribe requests visit
> > > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> or
> > > > send a mail to gmx-users-request at gromacs.org.
> > > >
> > >
> > >
> > >
> > > --
> > > *Dr. Natalie J. Tatum*
> > > Post-doctoral Research Associate
> > > Northern Institute for Cancer Research
> > > Newcastle University
> > > --
> > > Gromacs Users mailing list
> > >
> > > * Please search the archive at
> > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > > posting!
> > >
> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >
> > > * For (un)subscribe requests visit
> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > > send a mail to gmx-users-request at gromacs.org.
> > >
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at http://www.gromacs.org/
> > Support/Mailing_Lists/GMX-Users_List before posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> >
>
>
>
> --
> *Dr. Natalie J. Tatum*
> Post-doctoral Research Associate
> Northern Institute for Cancer Research
> Newcastle University
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>


More information about the gromacs.org_gmx-users mailing list