[gmx-developers] gromacs.org_gmx-developers Digest, Vol 188, Issue 10

Adrian Tobiszewski a.tobiszewski at gmail.com
Fri Dec 27 19:20:22 CET 2019


Please unsubscribe my e-mail address from your mailing list.

On Fri, 27 Dec 2019, 12:35 , <
gromacs.org_gmx-developers-request at maillist.sys.kth.se> wrote:

> Send gromacs.org_gmx-developers mailing list submissions to
>         gromacs.org_gmx-developers at maillist.sys.kth.se
>
> To subscribe or unsubscribe via the World Wide Web, visit
>
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>
> or, via email, send a message with subject or body 'help' to
>         gromacs.org_gmx-developers-request at maillist.sys.kth.se
>
> You can reach the person managing the list at
>         gromacs.org_gmx-developers-owner at maillist.sys.kth.se
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of gromacs.org_gmx-developers digest..."
>
>
> Today's Topics:
>
>    1. Re: Native endianess in TPR body (Paul bauer)
>    2. Re: Native endianess in TPR body (Len Kimms)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 27 Dec 2019 12:14:48 +0100
> From: Paul bauer <paul.bauer.q at gmail.com>
> To: gromacs.org_gmx-developers at maillist.sys.kth.se
> Subject: Re: [gmx-developers] Native endianess in TPR body
> Message-ID: <fad58ce1-7ed1-bddc-fd43-e0ea213fe02a at gmail.com>
> Content-Type: text/plain; charset="utf-8"; Format="flowed"
>
> Hello,
>
> fix has been upload here: https://gerrit.gromacs.org/c/gromacs/+/15059
>
> Cheers
>
> Paul
>
> On 27/12/2019 11:18, Paul bauer wrote:
> > Hello,
> >
> > I opened https://redmine.gromacs.org/issues/3269 for this and should
> > have a fix for it soon.
> >
> > Cheers
> >
> > Paul
> >
> > On 27/12/2019 10:12, Erik Lindahl wrote:
> >> Hi Len & Jonathan,
> >>
> >> Paul found an issue related to different-endianness-reading that has
> >> apparently slipped through the Debian tests (since they didn't run
> >> the regression tests by default). We'll get a fix in for that before
> >> the release.
> >>
> >> The reason for the change is that the XDR I/IO layer is becoming very
> >> outdated. First, while it made a lot of sense to stick to the
> >> standard (big) "network endian" in the late 90s, today the problem is
> >> that virtually every single architecture is little endian, so you
> >> incur all the overhead of swapping both on writing and reading.
> >> Second, the way this is implemented in XDR means it's very slow -
> >> we're basically doing byte-by-byte reading.
> >>
> >> This change will instead allow all architectures to use highly
> >> efficient buffered I/O in their default endian, and then we only have
> >> to bother about swapping endianness in the rare cases an actual
> >> big-endian machine is involved.
> >>
> >> We'll also look into the one-padding; for Gromacs it doesn't matter,
> >> but avoiding that might indeed make the life of other codes easier.
> >>
> >> Cheers,
> >>
> >> Erik
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Thu, Dec 26, 2019 at 11:04 PM Jonathan Barnoud
> >> <jonathan at barnoud.net <mailto:jonathan at barnoud.net>> wrote:
> >>
> >>     Hello everyone,
> >>
> >>     I upgraded the code of MDAnalysis to read the latest TPR version.
> >>     To add to Len's comments, it appears indeed that the new TPR body
> >>     is 4 times as big as it use to be for the same content, and is
> >>     not portable between architectures. gmx dump does fail at reading
> >>     a file with a different byte order than native, and there is no
> >>     obvious way to determine the endianness of the body. While the
> >>     TPR format is not meant to really be portable, it seemed commonly
> >>     agreed that it was a good file to share
> >>     (https://pubs.acs.org/doi/abs/10.1021/acs.jcim.9b00665), it is
> >>     for sure a good input file in MDAnalysis. TPR files are commonly
> >>     produced on a local machine before being actually run on a
> >>     cluster, that may use a different byte order.
> >>
> >>     > Second the individual bytes of a value are padded to 4 bytes
> >>     per original bytes (each byte is packed as `char`).
> >>
> >>     To be noted that the in-file XDR decoder in gromacs (used for the
> >>     header and prior to gromacs 2020) uses 4 bytes for "char", hence
> >>     the padding. The in-memory one reads 1 padded byte (1 byte of
> >>     information, 4 bytes in the file).
> >>
> >>     As my use case for noticing these differences is fairly niche, I
> >>     may be missing the reason for them. In such case, I would be
> >>     curious to read about them.
> >>
> >>     Best regards,
> >>     Jonathan
> >>
> >>
> >>     On 12/26/19 7:39 PM, Len Kimms wrote:
> >>>     Hello everyone,
> >>>
> >>>     while fooling around with the new (i.e. version 2020 rc1) TPR file
> format I noticed some strange behaviors that I don?t understand. As far as
> I understand the body of the new format is written by the
> `gmx::InMemorySerializer`. My following questions are basically about this
> module.
> >>>
> >>>     First it seems that the memory serializer writes the values in
> native byte order. This means that the body of TPR files differ between
> big- and little-endian systems. The XDR standard used before requires
> big-endian data. For me, a novice user, the new implementation seems to be
> less portable and robust. Endian swapping seems to be implemented but not
> currently used for TPR files.
> >>>     Is this intentional, if so, why?
> >>>
> >>>     Second the individual bytes of a value are padded to 4 bytes per
> original bytes (each byte is packed as `char`). Therefore the size
> increases accordingly.
> >>>     Do those padding bytes serve a special purpose?
> >>>     Also regarding the padding bytes: Some bytes are not, like most
> others, padded with zeros. In some places they are padded with ones. At
> first glance this seem to happen to the second byte (big-endian) of a
> float. From some initial testing my best guess is, that this is caused by
> the union conversion in `CharBuffer`. With an `unsigned char` in the
> private union `u` those values would be zero padded.
> >>>
> >>>     In the attachment one could find example files from a big- and
> little-endian system as well as a file created with GROMACS 2019.
> >>>     I also brought this to the attention of the MDAnalysis devs here:
> >>>     https://github.com/MDAnalysis/mdanalysis/issues/2428
> >>>
> >>>     Best regards,
> >>>         Len
> >>>
> >>
> >>     --
> >>     Gromacs Developers mailing list
> >>
> >>     * Please search the archive at
> >>     http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List
> >>     before posting!
> >>
> >>     * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>
> >>     * For (un)subscribe requests visit
> >>
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
> >>     or send a mail to gmx-developers-request at gromacs.org
> >>     <mailto:gmx-developers-request at gromacs.org>.
> >>
> >>
> >>
> >> --
> >> Erik Lindahl <erik.lindahl at dbb.su.se <mailto:erik.lindahl at dbb.su.se>>
> >> Professor of Biophysics, Dept. Biochemistry & Biophysics, Stockholm
> >> University
> >> Science for Life Laboratory, Box 1031, 17121 Solna, Sweden
> >>
> >
> > --
> > Paul Bauer, PhD
> > GROMACS Release Manager
> > KTH Stockholm, SciLifeLab
> > 0046737308594
>
>
> --
> Paul Bauer, PhD
> GROMACS Release Manager
> KTH Stockholm, SciLifeLab
> 0046737308594
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20191227/526ffef7/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 2
> Date: Fri, 27 Dec 2019 12:35:22 +0100 (CET)
> From: Len Kimms <len.kimms at uni-muenster.de>
> To: <gmx-developers at gromacs.org>
> Subject: Re: [gmx-developers] Native endianess in TPR body
> Message-ID:
>         <
> permail-20191227113522e490208900005005-l_kimm02 at message-id.uni-muenster.de
> >
>
> Content-Type: text/plain; charset=utf-8
>
> Hello everyone,
>
> thank you all for your explanations. I really appreciate the insight that
> I got.
>
> It makes sense to use native endianness and it was indeed not easy to set
> up a big-endian test system because they are rare nowadays. The most
> important thing for me is having a clear indication what endiannes a given
> file has. IMHO the proposed fix does a good job with this.
>
> Regarding the padding: Writing the buffer as opaque data that is not
> padded feels less unsettling, but the file size is not much of an issue for
> me. With the given hint of the endiannes the padding is irrelevant for me
> and does no harm.
>
> Thank you again for the work you put into this!
>
> Best wishes,
>    Len
>
>
> Paul bauer schrieb am 2019-12-27:
> > Hello,
>
> > fix has been upload here: https://gerrit.gromacs.org/c/gromacs/+/15059
>
> > Cheers
>
> > Paul
>
> > On 27/12/2019 11:18, Paul bauer wrote:
> > >Hello,
> > >
> > >I opened https://redmine.gromacs.org/issues/3269 for this and should
> have a fix for it soon.
> > >
> > >Cheers
> > >
> > >Paul
> > >
> > >On 27/12/2019 10:12, Erik Lindahl wrote:
> > >>Hi Len & Jonathan,
> > >>
> > >>Paul found an issue related to different-endianness-reading that has
> apparently slipped through the Debian tests (since they didn't run the
> regression tests by default). We'll get a fix in for that before the
> release.
> > >>
> > >>The reason for the change is that the XDR I/IO layer is becoming very
> outdated. First, while it made a lot of sense to stick to the standard
> (big) "network endian" in the late 90s, today the problem is that virtually
> every single architecture is little endian, so you incur all the overhead
> of swapping both on writing and reading. Second, the way this is
> implemented in XDR means it's very slow - we're basically doing
> byte-by-byte reading.
> > >>
> > >>This change will instead allow all architectures to use highly
> efficient buffered I/O in their default endian, and then we only have to
> bother about swapping endianness in the rare cases an actual big-endian
> machine is involved.
> > >>
> > >>We'll also look into the one-padding; for Gromacs it doesn't matter,
> but avoiding that might indeed make the life of other codes easier.
> > >>
> > >>Cheers,
> > >>
> > >>Erik
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>On Thu, Dec 26, 2019 at 11:04 PM Jonathan Barnoud <
> jonathan at barnoud.net <mailto:jonathan at barnoud.net>> wrote:
> > >>
> > >>    Hello everyone,
> > >>
> > >>    I upgraded the code of MDAnalysis to read the latest TPR version.
> > >>    To add to Len's comments, it appears indeed that the new TPR body
> > >>    is 4 times as big as it use to be for the same content, and is
> > >>    not portable between architectures. gmx dump does fail at reading
> > >>    a file with a different byte order than native, and there is no
> > >>    obvious way to determine the endianness of the body. While the
> > >>    TPR format is not meant to really be portable, it seemed commonly
> > >>    agreed that it was a good file to share
> > >>    (https://pubs.acs.org/doi/abs/10.1021/acs.jcim.9b00665), it is
> > >>    for sure a good input file in MDAnalysis. TPR files are commonly
> > >>    produced on a local machine before being actually run on a
> > >>    cluster, that may use a different byte order.
> > >>
> > >>    > Second the individual bytes of a value are padded to 4 bytes
> > >>    per original bytes (each byte is packed as `char`).
> > >>
> > >>    To be noted that the in-file XDR decoder in gromacs (used for the
> > >>    header and prior to gromacs 2020) uses 4 bytes for "char", hence
> > >>    the padding. The in-memory one reads 1 padded byte (1 byte of
> > >>    information, 4 bytes in the file).
> > >>
> > >>    As my use case for noticing these differences is fairly niche, I
> > >>    may be missing the reason for them. In such case, I would be
> > >>    curious to read about them.
> > >>
> > >>    Best regards,
> > >>    Jonathan
> > >>
> > >>
> > >>    On 12/26/19 7:39 PM, Len Kimms wrote:
> > >>>    Hello everyone,
> > >>>
> > >>>    while fooling around with the new (i.e. version 2020 rc1) TPR
> file format I noticed some strange behaviors that I don?t understand. As
> far as I understand the body of the new format is written by the
> `gmx::InMemorySerializer`. My following questions are basically about this
> module.
> > >>>
> > >>>    First it seems that the memory serializer writes the values in
> native byte order. This means that the body of TPR files differ between
> big- and little-endian systems. The XDR standard used before requires
> big-endian data. For me, a novice user, the new implementation seems to be
> less portable and robust. Endian swapping seems to be implemented but not
> currently used for TPR files.
> > >>>    Is this intentional, if so, why?
> > >>>
> > >>>    Second the individual bytes of a value are padded to 4 bytes per
> original bytes (each byte is packed as `char`). Therefore the size
> increases accordingly.
> > >>>    Do those padding bytes serve a special purpose?
> > >>>    Also regarding the padding bytes: Some bytes are not, like most
> others, padded with zeros. In some places they are padded with ones. At
> first glance this seem to happen to the second byte (big-endian) of a
> float. From some initial testing my best guess is, that this is caused by
> the union conversion in `CharBuffer`. With an `unsigned char` in the
> private union `u` those values would be zero padded.
> > >>>
> > >>>    In the attachment one could find example files from a big- and
> little-endian system as well as a file created with GROMACS 2019.
> > >>>    I also brought this to the attention of the MDAnalysis devs here:
> > >>>    https://github.com/MDAnalysis/mdanalysis/issues/2428
> > >>>
> > >>>    Best regards,
> > >>>        Len
> > >>>
> > >>
> > >>    --     Gromacs Developers mailing list
> > >>
> > >>    * Please search the archive at
> > >>    http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List
> > >>    before posting!
> > >>
> > >>    * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >>
> > >>    * For (un)subscribe requests visit
> > >>
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
> > >>    or send a mail to gmx-developers-request at gromacs.org
> > >>    <mailto:gmx-developers-request at gromacs.org>.
> > >>
> > >>
> > >>
> > >>--
> > >>Erik Lindahl <erik.lindahl at dbb.su.se <mailto:erik.lindahl at dbb.su.se>>
> > >>Professor of Biophysics, Dept. Biochemistry & Biophysics, Stockholm
> University
> > >>Science for Life Laboratory, Box 1031, 17121 Solna, Sweden
> > >>
> > >
> > >--
> > >Paul Bauer, PhD
> > >GROMACS Release Manager
> > >KTH Stockholm, SciLifeLab
> > >0046737308594
>
>
> > --
> > Paul Bauer, PhD
> > GROMACS Release Manager
> > KTH Stockholm, SciLifeLab
> > 0046737308594
>
>
> ------------------------------
>
> --
> Gromacs Developers mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
> or send a mail to gmx-developers-request at gromacs.org.
>
> End of gromacs.org_gmx-developers Digest, Vol 188, Issue 10
> ***********************************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20191227/691102a0/attachment-0001.html>


More information about the gromacs.org_gmx-developers mailing list