[gmx-users] Long trajectory split

Mark Abraham mark.j.abraham at gmail.com
Thu Feb 27 15:51:33 CET 2014


Run scripts, log files would be a good start!
On Feb 27, 2014 1:39 PM, "Marcelo Depólo" <marcelodepolo at gmail.com> wrote:

> Dear Dr,
>
> Which details or files do you need? I would be very happy to solve this
> question by posting any kind of files that you request.
>
>
>
> 2014-02-23 22:21 GMT+01:00 Dr. Vitaly Chaban <vvchaban at gmail.com>:
>
> > You do not provide all the details. As was pointed at the very
> > beginning, most likely you have incorrect parallelism in this case.
> > Can you post all the files you obtain for people to inspect?
> >
> >
> > Dr. Vitaly V. Chaban
> >
> >
> > On Sun, Feb 23, 2014 at 9:04 PM, Marcelo Depólo <marcelodepolo at gmail.com
> >
> > wrote:
> > >  Justin, as far as I realized, the next log file starts at 0ps what
> would
> > > mean that it is re-starting for some reason. At first, I imagined that
> it
> > > was only splitting the data among files due to some kind of size limit,
> > as
> > > you said, but when I tried to concatenate the trajectories, it gives
> me a
> > > non-sense output, with a lot of 'beginnings'.
> > >
> > > I will check with the cluster experts if there is some kind of size
> > > limit.It seems to be the most logical source of the problem to me.
> > >
> > > Mark, the only difference this time is the time-scale set since the
> > > beginning. Apart from the protein itself, even the .mdp files were
> copied
> > > from a sucessful folder.
> > >
> > > But thank you both for the support.
> > >
> > >
> > > 2014-02-23 20:20 GMT+01:00 Mark Abraham <mark.j.abraham at gmail.com>:
> > >
> > >> On Sun, Feb 23, 2014 at 6:48 PM, Marcelo Depólo <
> > marcelodepolo at gmail.com
> > >> >wrote:
> > >>
> > >> > Justin, the other runs with the very same binary do not produce the
> > same
> > >> > problem.
> > >> >
> > >> > Mark, I just omitted the _mpi of the line here, but is was compiled
> as
> > >> > _mpi.
> > >> >
> > >>
> > >> OK, that rules that problem out, but please don't simplify and
> > approximate.
> > >> Computers are exact, and trouble shooting problems with them requires
> > all
> > >> the information. If we all understood perfectly we wouldn't be having
> > >> problems ;-)
> > >>
> > >> Those files do get closed at checkpoint intervals, so they can be
> hashed
> > >> for the hash value to be saved in the checkpoint. It is conceivable
> some
> > >> file system would not close-and-re-open them properly. The .log files
> > would
> > >> comment about at least some such conditions.
> > >>
> > >> But the real question is what you are doing differently from the times
> > when
> > >> you have observed normal behaviour!
> > >>
> > >> Mark
> > >>
> > >>
> > >> > My log file top:
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > *Gromacs version:    VERSION 4.6.1Precision:          singleMemory
> > >> > model:       64 bitMPI library:        MPIOpenMP support:
> > disabledGPU
> > >> > support:        disabledinvsqrt routine:
>  gmx_software_invsqrt(x)CPU
> > >> > acceleration:   SSE4.1FFT library:        fftw-3.3.2-sse2Large file
> > >> > support: enabledRDTSCP usage:       enabledBuilt on:           Sex
> > Nov 29
> > >> > 16:08:45 BRST 2013Built by:           root at jupiter [CMAKE]Build
> > >> > OS/arch:      Linux 2.6.32.13-0.4-default x86_64Build CPU vendor:
> > >> > GenuineIntelBuild CPU brand:    Intel(R) Xeon(R) CPU           X5650
> >  @
> > >> > 2.67GHzBuild CPU family:   6   Model: 44   Stepping: 2Build CPU
> > features:
> > >> > apic clfsh cmov cx8 cx16 htt lahf_lm mmx msr nonstop_tsc pcid pdcm
> > >> pdpe1gb
> > >> > popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2 ssse3(...)*
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > *Initializing Domain Decomposition on 24 nodesDynamic load
> balancing:
> > >> > autoWill sort the charge groups at every domain
> > (re)decompositionInitial
> > >> > maximum inter charge-group distances:    two-body bonded
> interactions:
> > >> > 0.621 nm, LJ-14, atoms 3801 3812  multi-body bonded interactions:
> > 0.621
> > >> nm,
> > >> > G96Angle, atoms 3802 3812Minimum cell size due to bonded
> interactions:
> > >> > 0.683 nmMaximum distance for 5 constraints, at 120 deg. angles,
> > >> all-trans:
> > >> > 0.820 nmEstimated maximum distance required for P-LINCS: 0.820
> nmThis
> > >> > distance will limit the DD cell size, you can override this with
> > >> -rconGuess
> > >> > for relative PME load: 0.26Will use 18 particle-particle and 6 PME
> > only
> > >> > nodesThis is a guess, check the performance at the end of the log
> > >> fileUsing
> > >> > 6 separate PME nodesScaling the initial minimum size with 1/0.8
> > (option
> > >> > -dds) = 1.25Optimizing the DD grid for 18 cells with a minimum
> initial
> > >> size
> > >> > of 1.025 nmThe maximum allowed number of cells is: X 8 Y 8 Z 8Domain
> > >> > decomposition grid 3 x 2 x 3, separate PME nodes 6PME domain
> > >> decomposition:
> > >> > 3 x 2 x 1Interleaving PP and PME nodesThis is a particle-particle
> only
> > >> > nodeDomain decomposition nodeid 0, coordinates 0 0 0"*
> > >> >
> > >> >
> > >> >
> > >> > 2014-02-23 18:08 GMT+01:00 Justin Lemkul <jalemkul at vt.edu>:
> > >> >
> > >> > >
> > >> > >
> > >> > > On 2/23/14, 11:32 AM, Marcelo Depólo wrote:
> > >> > >
> > >> > >> Maybe I should explain it better.
> > >> > >>
> > >> > >> I am using "*mpirun -np 24 mdrun -s prt.tpr -e prt.edr -o
> > prt.trr*",
> > >> > >> pretty
> > >> > >>
> > >> > >> much a standard line. This job in a batch creates the outputs
> and,
> > >> after
> > >> > >> some (random) time, a back up is done and new files are written,
> > but
> > >> the
> > >> > >> job itself do not finish.
> > >> > >>
> > >> > >>
> > >> > > It would help if you can post the .log file from one of the runs
> to
> > see
> > >> > > the information regarding mdrun's parallel capabilities.  This
> still
> > >> > sounds
> > >> > > like a case of an incorrectly compiled binary.  Do other runs with
> > the
> > >> > same
> > >> > > binary produce the same problem?
> > >> > >
> > >> > > -Justin
> > >> > >
> > >> > >
> > >> > >
> > >> > >> 2014-02-23 17:12 GMT+01:00 Justin Lemkul <jalemkul at vt.edu>:
> > >> > >>
> > >> > >>
> > >> > >>>
> > >> > >>> On 2/23/14, 11:00 AM, Marcelo Depólo wrote:
> > >> > >>>
> > >> > >>>  But it is not quite happening simultaneously, Justin.
> > >> > >>>>
> > >> > >>>> It is producing one after another and, consequently, backing up
> > the
> > >> > >>>> files.
> > >> > >>>>
> > >> > >>>>
> > >> > >>>>  You'll have to provide the exact commands you're issuing.
> >  Likely
> > >> > >>> you're
> > >> > >>> leaving the output names to the default, which causes them to be
> > >> backed
> > >> > >>> up
> > >> > >>> rather than overwritten.
> > >> > >>>
> > >> > >>>
> > >> > >>> -Justin
> > >> > >>>
> > >> > >>> --
> > >> > >>> ==================================================
> > >> > >>>
> > >> > >>> Justin A. Lemkul, Ph.D.
> > >> > >>> Ruth L. Kirschstein NRSA Postdoctoral Fellow
> > >> > >>>
> > >> > >>> Department of Pharmaceutical Sciences
> > >> > >>> School of Pharmacy
> > >> > >>> Health Sciences Facility II, Room 601
> > >> > >>> University of Maryland, Baltimore
> > >> > >>> 20 Penn St.
> > >> > >>> Baltimore, MD 21201
> > >> > >>>
> > >> > >>> jalemkul at outerbanks.umaryland.edu | (410) 706-7441
> > >> > >>> http://mackerell.umaryland.edu/~jalemkul
> > >> > >>>
> > >> > >>> ==================================================
> > >> > >>> --
> > >> > >>> Gromacs Users mailing list
> > >> > >>>
> > >> > >>> * Please search the archive at http://www.gromacs.org/
> > >> > >>> Support/Mailing_Lists/GMX-Users_List before posting!
> > >> > >>>
> > >> > >>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >> > >>>
> > >> > >>> * For (un)subscribe requests visit
> > >> > >>>
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-usersor
> > >> > >>> send a mail to gmx-users-request at gromacs.org.
> > >> > >>>
> > >> > >>>
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > > --
> > >> > > ==================================================
> > >> > >
> > >> > > Justin A. Lemkul, Ph.D.
> > >> > > Ruth L. Kirschstein NRSA Postdoctoral Fellow
> > >> > >
> > >> > > Department of Pharmaceutical Sciences
> > >> > > School of Pharmacy
> > >> > > Health Sciences Facility II, Room 601
> > >> > > University of Maryland, Baltimore
> > >> > > 20 Penn St.
> > >> > > Baltimore, MD 21201
> > >> > >
> > >> > > jalemkul at outerbanks.umaryland.edu | (410) 706-7441
> > >> > > http://mackerell.umaryland.edu/~jalemkul
> > >> > >
> > >> > > ==================================================
> > >> > > --
> > >> > > Gromacs Users mailing list
> > >> > >
> > >> > > * Please search the archive at http://www.gromacs.org/
> > >> > > Support/Mailing_Lists/GMX-Users_List before posting!
> > >> > >
> > >> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >> > >
> > >> > > * For (un)subscribe requests visit
> > >> > >
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-usersor
> > >> > > send a mail to gmx-users-request at gromacs.org.
> > >> > >
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > Marcelo Depólo Polêto
> > >> > Uppsala Universitet - Sweden
> > >> > Science without Borders - CAPES
> > >> > Phone: +46 76 581 67 49
> > >> > --
> > >> > Gromacs Users mailing list
> > >> >
> > >> > * Please search the archive at
> > >> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > >> > posting!
> > >> >
> > >> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >> >
> > >> > * For (un)subscribe requests visit
> > >> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-usersor
> > >> > send a mail to gmx-users-request at gromacs.org.
> > >> >
> > >> --
> > >> Gromacs Users mailing list
> > >>
> > >> * Please search the archive at
> > >> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > >> posting!
> > >>
> > >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >>
> > >> * For (un)subscribe requests visit
> > >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > >> send a mail to gmx-users-request at gromacs.org.
> > >>
> > >
> > >
> > >
> > > --
> > > Marcelo Depólo Polêto
> > > Uppsala Universitet - Sweden
> > > Science without Borders - CAPES
> > > Phone: +46 76 581 67 49
> > > --
> > > Gromacs Users mailing list
> > >
> > > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> > >
> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >
> > > * For (un)subscribe requests visit
> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> >
>
>
>
> --
> Marcelo Depólo Polêto
> Uppsala Universitet - Sweden
> Science without Borders - CAPES
> Phone: +46 76 581 67 49
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>


More information about the gromacs.org_gmx-users mailing list