[gmx-users] Long trajectory split

Dr. Vitaly Chaban vvchaban at gmail.com
Thu Feb 27 16:54:49 CET 2014


The only real way to troubleshot this kind of problems is that someone
here starts your system at his local PC and sees the problem by the
own eyes.

As no one confirmed the same issue as yours, it is most likely that
the cause of problem lies outside gromacs code. Either something is
wrong with your operational environment or you interpret your
observations incorrectly.


Dr. Vitaly V. Chaban


On Thu, Feb 27, 2014 at 1:33 PM, Marcelo Depólo <marcelodepolo at gmail.com> wrote:
> Dear Dr,
>
> Which details or files do you need? I would be very happy to solve this
> question by posting any kind of files that you request.
>
>
>
> 2014-02-23 22:21 GMT+01:00 Dr. Vitaly Chaban <vvchaban at gmail.com>:
>>
>> You do not provide all the details. As was pointed at the very
>> beginning, most likely you have incorrect parallelism in this case.
>> Can you post all the files you obtain for people to inspect?
>>
>>
>> Dr. Vitaly V. Chaban
>>
>>
>> On Sun, Feb 23, 2014 at 9:04 PM, Marcelo Depólo <marcelodepolo at gmail.com>
>> wrote:
>> >  Justin, as far as I realized, the next log file starts at 0ps what
>> > would
>> > mean that it is re-starting for some reason. At first, I imagined that
>> > it
>> > was only splitting the data among files due to some kind of size limit,
>> > as
>> > you said, but when I tried to concatenate the trajectories, it gives me
>> > a
>> > non-sense output, with a lot of 'beginnings'.
>> >
>> > I will check with the cluster experts if there is some kind of size
>> > limit.It seems to be the most logical source of the problem to me.
>> >
>> > Mark, the only difference this time is the time-scale set since the
>> > beginning. Apart from the protein itself, even the .mdp files were
>> > copied
>> > from a sucessful folder.
>> >
>> > But thank you both for the support.
>> >
>> >
>> > 2014-02-23 20:20 GMT+01:00 Mark Abraham <mark.j.abraham at gmail.com>:
>> >
>> >> On Sun, Feb 23, 2014 at 6:48 PM, Marcelo Depólo
>> >> <marcelodepolo at gmail.com
>> >> >wrote:
>> >>
>> >> > Justin, the other runs with the very same binary do not produce the
>> >> > same
>> >> > problem.
>> >> >
>> >> > Mark, I just omitted the _mpi of the line here, but is was compiled
>> >> > as
>> >> > _mpi.
>> >> >
>> >>
>> >> OK, that rules that problem out, but please don't simplify and
>> >> approximate.
>> >> Computers are exact, and trouble shooting problems with them requires
>> >> all
>> >> the information. If we all understood perfectly we wouldn't be having
>> >> problems ;-)
>> >>
>> >> Those files do get closed at checkpoint intervals, so they can be
>> >> hashed
>> >> for the hash value to be saved in the checkpoint. It is conceivable
>> >> some
>> >> file system would not close-and-re-open them properly. The .log files
>> >> would
>> >> comment about at least some such conditions.
>> >>
>> >> But the real question is what you are doing differently from the times
>> >> when
>> >> you have observed normal behaviour!
>> >>
>> >> Mark
>> >>
>> >>
>> >> > My log file top:
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > *Gromacs version:    VERSION 4.6.1Precision:          singleMemory
>> >> > model:       64 bitMPI library:        MPIOpenMP support:
>> >> > disabledGPU
>> >> > support:        disabledinvsqrt routine:
>> >> > gmx_software_invsqrt(x)CPU
>> >> > acceleration:   SSE4.1FFT library:        fftw-3.3.2-sse2Large file
>> >> > support: enabledRDTSCP usage:       enabledBuilt on:           Sex
>> >> > Nov 29
>> >> > 16:08:45 BRST 2013Built by:           root at jupiter [CMAKE]Build
>> >> > OS/arch:      Linux 2.6.32.13-0.4-default x86_64Build CPU vendor:
>> >> > GenuineIntelBuild CPU brand:    Intel(R) Xeon(R) CPU           X5650
>> >> > @
>> >> > 2.67GHzBuild CPU family:   6   Model: 44   Stepping: 2Build CPU
>> >> > features:
>> >> > apic clfsh cmov cx8 cx16 htt lahf_lm mmx msr nonstop_tsc pcid pdcm
>> >> pdpe1gb
>> >> > popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2 ssse3(...)*
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > *Initializing Domain Decomposition on 24 nodesDynamic load balancing:
>> >> > autoWill sort the charge groups at every domain
>> >> > (re)decompositionInitial
>> >> > maximum inter charge-group distances:    two-body bonded
>> >> > interactions:
>> >> > 0.621 nm, LJ-14, atoms 3801 3812  multi-body bonded interactions:
>> >> > 0.621
>> >> nm,
>> >> > G96Angle, atoms 3802 3812Minimum cell size due to bonded
>> >> > interactions:
>> >> > 0.683 nmMaximum distance for 5 constraints, at 120 deg. angles,
>> >> all-trans:
>> >> > 0.820 nmEstimated maximum distance required for P-LINCS: 0.820 nmThis
>> >> > distance will limit the DD cell size, you can override this with
>> >> -rconGuess
>> >> > for relative PME load: 0.26Will use 18 particle-particle and 6 PME
>> >> > only
>> >> > nodesThis is a guess, check the performance at the end of the log
>> >> fileUsing
>> >> > 6 separate PME nodesScaling the initial minimum size with 1/0.8
>> >> > (option
>> >> > -dds) = 1.25Optimizing the DD grid for 18 cells with a minimum
>> >> > initial
>> >> size
>> >> > of 1.025 nmThe maximum allowed number of cells is: X 8 Y 8 Z 8Domain
>> >> > decomposition grid 3 x 2 x 3, separate PME nodes 6PME domain
>> >> decomposition:
>> >> > 3 x 2 x 1Interleaving PP and PME nodesThis is a particle-particle
>> >> > only
>> >> > nodeDomain decomposition nodeid 0, coordinates 0 0 0"*
>> >> >
>> >> >
>> >> >
>> >> > 2014-02-23 18:08 GMT+01:00 Justin Lemkul <jalemkul at vt.edu>:
>> >> >
>> >> > >
>> >> > >
>> >> > > On 2/23/14, 11:32 AM, Marcelo Depólo wrote:
>> >> > >
>> >> > >> Maybe I should explain it better.
>> >> > >>
>> >> > >> I am using "*mpirun -np 24 mdrun -s prt.tpr -e prt.edr -o
>> >> > >> prt.trr*",
>> >> > >> pretty
>> >> > >>
>> >> > >> much a standard line. This job in a batch creates the outputs and,
>> >> after
>> >> > >> some (random) time, a back up is done and new files are written,
>> >> > >> but
>> >> the
>> >> > >> job itself do not finish.
>> >> > >>
>> >> > >>
>> >> > > It would help if you can post the .log file from one of the runs to
>> >> > > see
>> >> > > the information regarding mdrun's parallel capabilities.  This
>> >> > > still
>> >> > sounds
>> >> > > like a case of an incorrectly compiled binary.  Do other runs with
>> >> > > the
>> >> > same
>> >> > > binary produce the same problem?
>> >> > >
>> >> > > -Justin
>> >> > >
>> >> > >
>> >> > >
>> >> > >> 2014-02-23 17:12 GMT+01:00 Justin Lemkul <jalemkul at vt.edu>:
>> >> > >>
>> >> > >>
>> >> > >>>
>> >> > >>> On 2/23/14, 11:00 AM, Marcelo Depólo wrote:
>> >> > >>>
>> >> > >>>  But it is not quite happening simultaneously, Justin.
>> >> > >>>>
>> >> > >>>> It is producing one after another and, consequently, backing up
>> >> > >>>> the
>> >> > >>>> files.
>> >> > >>>>
>> >> > >>>>
>> >> > >>>>  You'll have to provide the exact commands you're issuing.
>> >> > >>>> Likely
>> >> > >>> you're
>> >> > >>> leaving the output names to the default, which causes them to be
>> >> backed
>> >> > >>> up
>> >> > >>> rather than overwritten.
>> >> > >>>
>> >> > >>>
>> >> > >>> -Justin
>> >> > >>>
>> >> > >>> --
>> >> > >>> ==================================================
>> >> > >>>
>> >> > >>> Justin A. Lemkul, Ph.D.
>> >> > >>> Ruth L. Kirschstein NRSA Postdoctoral Fellow
>> >> > >>>
>> >> > >>> Department of Pharmaceutical Sciences
>> >> > >>> School of Pharmacy
>> >> > >>> Health Sciences Facility II, Room 601
>> >> > >>> University of Maryland, Baltimore
>> >> > >>> 20 Penn St.
>> >> > >>> Baltimore, MD 21201
>> >> > >>>
>> >> > >>> jalemkul at outerbanks.umaryland.edu | (410) 706-7441
>> >> > >>> http://mackerell.umaryland.edu/~jalemkul
>> >> > >>>
>> >> > >>> ==================================================
>> >> > >>> --
>> >> > >>> Gromacs Users mailing list
>> >> > >>>
>> >> > >>> * Please search the archive at http://www.gromacs.org/
>> >> > >>> Support/Mailing_Lists/GMX-Users_List before posting!
>> >> > >>>
>> >> > >>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >> > >>>
>> >> > >>> * For (un)subscribe requests visit
>> >> > >>>
>> >> > >>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-usersor
>> >> > >>> send a mail to gmx-users-request at gromacs.org.
>> >> > >>>
>> >> > >>>
>> >> > >>
>> >> > >>
>> >> > >>
>> >> > > --
>> >> > > ==================================================
>> >> > >
>> >> > > Justin A. Lemkul, Ph.D.
>> >> > > Ruth L. Kirschstein NRSA Postdoctoral Fellow
>> >> > >
>> >> > > Department of Pharmaceutical Sciences
>> >> > > School of Pharmacy
>> >> > > Health Sciences Facility II, Room 601
>> >> > > University of Maryland, Baltimore
>> >> > > 20 Penn St.
>> >> > > Baltimore, MD 21201
>> >> > >
>> >> > > jalemkul at outerbanks.umaryland.edu | (410) 706-7441
>> >> > > http://mackerell.umaryland.edu/~jalemkul
>> >> > >
>> >> > > ==================================================
>> >> > > --
>> >> > > Gromacs Users mailing list
>> >> > >
>> >> > > * Please search the archive at http://www.gromacs.org/
>> >> > > Support/Mailing_Lists/GMX-Users_List before posting!
>> >> > >
>> >> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >> > >
>> >> > > * For (un)subscribe requests visit
>> >> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>> >> > > or
>> >> > > send a mail to gmx-users-request at gromacs.org.
>> >> > >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Marcelo Depólo Polêto
>> >> > Uppsala Universitet - Sweden
>> >> > Science without Borders - CAPES
>> >> > Phone: +46 76 581 67 49
>> >> > --
>> >> > Gromacs Users mailing list
>> >> >
>> >> > * Please search the archive at
>> >> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> >> > posting!
>> >> >
>> >> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >> >
>> >> > * For (un)subscribe requests visit
>> >> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> >> > send a mail to gmx-users-request at gromacs.org.
>> >> >
>> >> --
>> >> Gromacs Users mailing list
>> >>
>> >> * Please search the archive at
>> >> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> >> posting!
>> >>
>> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >>
>> >> * For (un)subscribe requests visit
>> >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> >> send a mail to gmx-users-request at gromacs.org.
>> >>
>> >
>> >
>> >
>> > --
>> > Marcelo Depólo Polêto
>> > Uppsala Universitet - Sweden
>> > Science without Borders - CAPES
>> > Phone: +46 76 581 67 49
>> > --
>> > Gromacs Users mailing list
>> >
>> > * Please search the archive at
>> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>> >
>> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >
>> > * For (un)subscribe requests visit
>> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> > send a mail to gmx-users-request at gromacs.org.
>> --
>> Gromacs Users mailing list
>>
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send
>> a mail to gmx-users-request at gromacs.org.
>
>
>
>
> --
> Marcelo Depólo Polêto
> Uppsala Universitet - Sweden
> Science without Borders - CAPES
> Phone: +46 76 581 67 49
>


More information about the gromacs.org_gmx-users mailing list