[gmx-users] Long trajectory split
Dr. Vitaly Chaban
vvchaban at gmail.com
Sun Feb 23 22:21:48 CET 2014
You do not provide all the details. As was pointed at the very
beginning, most likely you have incorrect parallelism in this case.
Can you post all the files you obtain for people to inspect?
Dr. Vitaly V. Chaban
On Sun, Feb 23, 2014 at 9:04 PM, Marcelo Depólo <marcelodepolo at gmail.com> wrote:
> Justin, as far as I realized, the next log file starts at 0ps what would
> mean that it is re-starting for some reason. At first, I imagined that it
> was only splitting the data among files due to some kind of size limit, as
> you said, but when I tried to concatenate the trajectories, it gives me a
> non-sense output, with a lot of 'beginnings'.
>
> I will check with the cluster experts if there is some kind of size
> limit.It seems to be the most logical source of the problem to me.
>
> Mark, the only difference this time is the time-scale set since the
> beginning. Apart from the protein itself, even the .mdp files were copied
> from a sucessful folder.
>
> But thank you both for the support.
>
>
> 2014-02-23 20:20 GMT+01:00 Mark Abraham <mark.j.abraham at gmail.com>:
>
>> On Sun, Feb 23, 2014 at 6:48 PM, Marcelo Depólo <marcelodepolo at gmail.com
>> >wrote:
>>
>> > Justin, the other runs with the very same binary do not produce the same
>> > problem.
>> >
>> > Mark, I just omitted the _mpi of the line here, but is was compiled as
>> > _mpi.
>> >
>>
>> OK, that rules that problem out, but please don't simplify and approximate.
>> Computers are exact, and trouble shooting problems with them requires all
>> the information. If we all understood perfectly we wouldn't be having
>> problems ;-)
>>
>> Those files do get closed at checkpoint intervals, so they can be hashed
>> for the hash value to be saved in the checkpoint. It is conceivable some
>> file system would not close-and-re-open them properly. The .log files would
>> comment about at least some such conditions.
>>
>> But the real question is what you are doing differently from the times when
>> you have observed normal behaviour!
>>
>> Mark
>>
>>
>> > My log file top:
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > *Gromacs version: VERSION 4.6.1Precision: singleMemory
>> > model: 64 bitMPI library: MPIOpenMP support: disabledGPU
>> > support: disabledinvsqrt routine: gmx_software_invsqrt(x)CPU
>> > acceleration: SSE4.1FFT library: fftw-3.3.2-sse2Large file
>> > support: enabledRDTSCP usage: enabledBuilt on: Sex Nov 29
>> > 16:08:45 BRST 2013Built by: root at jupiter [CMAKE]Build
>> > OS/arch: Linux 2.6.32.13-0.4-default x86_64Build CPU vendor:
>> > GenuineIntelBuild CPU brand: Intel(R) Xeon(R) CPU X5650 @
>> > 2.67GHzBuild CPU family: 6 Model: 44 Stepping: 2Build CPU features:
>> > apic clfsh cmov cx8 cx16 htt lahf_lm mmx msr nonstop_tsc pcid pdcm
>> pdpe1gb
>> > popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2 ssse3(...)*
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > *Initializing Domain Decomposition on 24 nodesDynamic load balancing:
>> > autoWill sort the charge groups at every domain (re)decompositionInitial
>> > maximum inter charge-group distances: two-body bonded interactions:
>> > 0.621 nm, LJ-14, atoms 3801 3812 multi-body bonded interactions: 0.621
>> nm,
>> > G96Angle, atoms 3802 3812Minimum cell size due to bonded interactions:
>> > 0.683 nmMaximum distance for 5 constraints, at 120 deg. angles,
>> all-trans:
>> > 0.820 nmEstimated maximum distance required for P-LINCS: 0.820 nmThis
>> > distance will limit the DD cell size, you can override this with
>> -rconGuess
>> > for relative PME load: 0.26Will use 18 particle-particle and 6 PME only
>> > nodesThis is a guess, check the performance at the end of the log
>> fileUsing
>> > 6 separate PME nodesScaling the initial minimum size with 1/0.8 (option
>> > -dds) = 1.25Optimizing the DD grid for 18 cells with a minimum initial
>> size
>> > of 1.025 nmThe maximum allowed number of cells is: X 8 Y 8 Z 8Domain
>> > decomposition grid 3 x 2 x 3, separate PME nodes 6PME domain
>> decomposition:
>> > 3 x 2 x 1Interleaving PP and PME nodesThis is a particle-particle only
>> > nodeDomain decomposition nodeid 0, coordinates 0 0 0"*
>> >
>> >
>> >
>> > 2014-02-23 18:08 GMT+01:00 Justin Lemkul <jalemkul at vt.edu>:
>> >
>> > >
>> > >
>> > > On 2/23/14, 11:32 AM, Marcelo Depólo wrote:
>> > >
>> > >> Maybe I should explain it better.
>> > >>
>> > >> I am using "*mpirun -np 24 mdrun -s prt.tpr -e prt.edr -o prt.trr*",
>> > >> pretty
>> > >>
>> > >> much a standard line. This job in a batch creates the outputs and,
>> after
>> > >> some (random) time, a back up is done and new files are written, but
>> the
>> > >> job itself do not finish.
>> > >>
>> > >>
>> > > It would help if you can post the .log file from one of the runs to see
>> > > the information regarding mdrun's parallel capabilities. This still
>> > sounds
>> > > like a case of an incorrectly compiled binary. Do other runs with the
>> > same
>> > > binary produce the same problem?
>> > >
>> > > -Justin
>> > >
>> > >
>> > >
>> > >> 2014-02-23 17:12 GMT+01:00 Justin Lemkul <jalemkul at vt.edu>:
>> > >>
>> > >>
>> > >>>
>> > >>> On 2/23/14, 11:00 AM, Marcelo Depólo wrote:
>> > >>>
>> > >>> But it is not quite happening simultaneously, Justin.
>> > >>>>
>> > >>>> It is producing one after another and, consequently, backing up the
>> > >>>> files.
>> > >>>>
>> > >>>>
>> > >>>> You'll have to provide the exact commands you're issuing. Likely
>> > >>> you're
>> > >>> leaving the output names to the default, which causes them to be
>> backed
>> > >>> up
>> > >>> rather than overwritten.
>> > >>>
>> > >>>
>> > >>> -Justin
>> > >>>
>> > >>> --
>> > >>> ==================================================
>> > >>>
>> > >>> Justin A. Lemkul, Ph.D.
>> > >>> Ruth L. Kirschstein NRSA Postdoctoral Fellow
>> > >>>
>> > >>> Department of Pharmaceutical Sciences
>> > >>> School of Pharmacy
>> > >>> Health Sciences Facility II, Room 601
>> > >>> University of Maryland, Baltimore
>> > >>> 20 Penn St.
>> > >>> Baltimore, MD 21201
>> > >>>
>> > >>> jalemkul at outerbanks.umaryland.edu | (410) 706-7441
>> > >>> http://mackerell.umaryland.edu/~jalemkul
>> > >>>
>> > >>> ==================================================
>> > >>> --
>> > >>> Gromacs Users mailing list
>> > >>>
>> > >>> * Please search the archive at http://www.gromacs.org/
>> > >>> Support/Mailing_Lists/GMX-Users_List before posting!
>> > >>>
>> > >>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> > >>>
>> > >>> * For (un)subscribe requests visit
>> > >>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-usersor
>> > >>> send a mail to gmx-users-request at gromacs.org.
>> > >>>
>> > >>>
>> > >>
>> > >>
>> > >>
>> > > --
>> > > ==================================================
>> > >
>> > > Justin A. Lemkul, Ph.D.
>> > > Ruth L. Kirschstein NRSA Postdoctoral Fellow
>> > >
>> > > Department of Pharmaceutical Sciences
>> > > School of Pharmacy
>> > > Health Sciences Facility II, Room 601
>> > > University of Maryland, Baltimore
>> > > 20 Penn St.
>> > > Baltimore, MD 21201
>> > >
>> > > jalemkul at outerbanks.umaryland.edu | (410) 706-7441
>> > > http://mackerell.umaryland.edu/~jalemkul
>> > >
>> > > ==================================================
>> > > --
>> > > Gromacs Users mailing list
>> > >
>> > > * Please search the archive at http://www.gromacs.org/
>> > > Support/Mailing_Lists/GMX-Users_List before posting!
>> > >
>> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> > >
>> > > * For (un)subscribe requests visit
>> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> > > send a mail to gmx-users-request at gromacs.org.
>> > >
>> >
>> >
>> >
>> > --
>> > Marcelo Depólo Polêto
>> > Uppsala Universitet - Sweden
>> > Science without Borders - CAPES
>> > Phone: +46 76 581 67 49
>> > --
>> > Gromacs Users mailing list
>> >
>> > * Please search the archive at
>> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> > posting!
>> >
>> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >
>> > * For (un)subscribe requests visit
>> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> > send a mail to gmx-users-request at gromacs.org.
>> >
>> --
>> Gromacs Users mailing list
>>
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> send a mail to gmx-users-request at gromacs.org.
>>
>
>
>
> --
> Marcelo Depólo Polêto
> Uppsala Universitet - Sweden
> Science without Borders - CAPES
> Phone: +46 76 581 67 49
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.
More information about the gromacs.org_gmx-users
mailing list