[gmx-developers] Native endianess in TPR body
Erik Lindahl
erik.lindahl at gmail.com
Mon Dec 30 11:15:13 CET 2019
Hi,
I just wanted to follow up on this since we had some hectic activity over
the weekend :-)
First, we have solved the padding issue by using xdr opaque datatypes
instead, so the TPR size should no longer be larger - thanks for spotting
that!
Second, we did some deep thinking about the endian, and in the end we
decided that the most important short-term aspect is to not complicate life
too much for other libraries/packages that need to read the raw format.
Since the TPR *header* is still stored with normal XDR routines that means
it's big endian, and for this reason we also eventually decided to keep the
TPR *body* (which is processed in memory, and then stored to disk) as big
endian too. This should hopefully make the changes since release-2019
minimal, and we'll wait with little-endian until we make a major change in
the formats.
There's one important caveat with this, though: Any TPR files you generated
during the beta stage will likely *not* be compatible with the final
released version. Sorry about that!
Cheers,
Erik
On Fri, Dec 27, 2019 at 12:35 PM Len Kimms <len.kimms at uni-muenster.de>
wrote:
> Hello everyone,
>
> thank you all for your explanations. I really appreciate the insight that
> I got.
>
> It makes sense to use native endianness and it was indeed not easy to set
> up a big-endian test system because they are rare nowadays. The most
> important thing for me is having a clear indication what endiannes a given
> file has. IMHO the proposed fix does a good job with this.
>
> Regarding the padding: Writing the buffer as opaque data that is not
> padded feels less unsettling, but the file size is not much of an issue for
> me. With the given hint of the endiannes the padding is irrelevant for me
> and does no harm.
>
> Thank you again for the work you put into this!
>
> Best wishes,
> Len
>
>
> Paul bauer schrieb am 2019-12-27:
> > Hello,
>
> > fix has been upload here: https://gerrit.gromacs.org/c/gromacs/+/15059
>
> > Cheers
>
> > Paul
>
> > On 27/12/2019 11:18, Paul bauer wrote:
> > >Hello,
> > >
> > >I opened https://redmine.gromacs.org/issues/3269 for this and should
> have a fix for it soon.
> > >
> > >Cheers
> > >
> > >Paul
> > >
> > >On 27/12/2019 10:12, Erik Lindahl wrote:
> > >>Hi Len & Jonathan,
> > >>
> > >>Paul found an issue related to different-endianness-reading that has
> apparently slipped through the Debian tests (since they didn't run the
> regression tests by default). We'll get a fix in for that before the
> release.
> > >>
> > >>The reason for the change is that the XDR I/IO layer is becoming very
> outdated. First, while it made a lot of sense to stick to the standard
> (big) "network endian" in the late 90s, today the problem is that virtually
> every single architecture is little endian, so you incur all the overhead
> of swapping both on writing and reading. Second, the way this is
> implemented in XDR means it's very slow - we're basically doing
> byte-by-byte reading.
> > >>
> > >>This change will instead allow all architectures to use highly
> efficient buffered I/O in their default endian, and then we only have to
> bother about swapping endianness in the rare cases an actual big-endian
> machine is involved.
> > >>
> > >>We'll also look into the one-padding; for Gromacs it doesn't matter,
> but avoiding that might indeed make the life of other codes easier.
> > >>
> > >>Cheers,
> > >>
> > >>Erik
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>On Thu, Dec 26, 2019 at 11:04 PM Jonathan Barnoud <
> jonathan at barnoud.net <mailto:jonathan at barnoud.net>> wrote:
> > >>
> > >> Hello everyone,
> > >>
> > >> I upgraded the code of MDAnalysis to read the latest TPR version.
> > >> To add to Len's comments, it appears indeed that the new TPR body
> > >> is 4 times as big as it use to be for the same content, and is
> > >> not portable between architectures. gmx dump does fail at reading
> > >> a file with a different byte order than native, and there is no
> > >> obvious way to determine the endianness of the body. While the
> > >> TPR format is not meant to really be portable, it seemed commonly
> > >> agreed that it was a good file to share
> > >> (https://pubs.acs.org/doi/abs/10.1021/acs.jcim.9b00665), it is
> > >> for sure a good input file in MDAnalysis. TPR files are commonly
> > >> produced on a local machine before being actually run on a
> > >> cluster, that may use a different byte order.
> > >>
> > >> > Second the individual bytes of a value are padded to 4 bytes
> > >> per original bytes (each byte is packed as `char`).
> > >>
> > >> To be noted that the in-file XDR decoder in gromacs (used for the
> > >> header and prior to gromacs 2020) uses 4 bytes for "char", hence
> > >> the padding. The in-memory one reads 1 padded byte (1 byte of
> > >> information, 4 bytes in the file).
> > >>
> > >> As my use case for noticing these differences is fairly niche, I
> > >> may be missing the reason for them. In such case, I would be
> > >> curious to read about them.
> > >>
> > >> Best regards,
> > >> Jonathan
> > >>
> > >>
> > >> On 12/26/19 7:39 PM, Len Kimms wrote:
> > >>> Hello everyone,
> > >>>
> > >>> while fooling around with the new (i.e. version 2020 rc1) TPR
> file format I noticed some strange behaviors that I don’t understand. As
> far as I understand the body of the new format is written by the
> `gmx::InMemorySerializer`. My following questions are basically about this
> module.
> > >>>
> > >>> First it seems that the memory serializer writes the values in
> native byte order. This means that the body of TPR files differ between
> big- and little-endian systems. The XDR standard used before requires
> big-endian data. For me, a novice user, the new implementation seems to be
> less portable and robust. Endian swapping seems to be implemented but not
> currently used for TPR files.
> > >>> Is this intentional, if so, why?
> > >>>
> > >>> Second the individual bytes of a value are padded to 4 bytes per
> original bytes (each byte is packed as `char`). Therefore the size
> increases accordingly.
> > >>> Do those padding bytes serve a special purpose?
> > >>> Also regarding the padding bytes: Some bytes are not, like most
> others, padded with zeros. In some places they are padded with ones. At
> first glance this seem to happen to the second byte (big-endian) of a
> float. From some initial testing my best guess is, that this is caused by
> the union conversion in `CharBuffer`. With an `unsigned char` in the
> private union `u` those values would be zero padded.
> > >>>
> > >>> In the attachment one could find example files from a big- and
> little-endian system as well as a file created with GROMACS 2019.
> > >>> I also brought this to the attention of the MDAnalysis devs here:
> > >>> https://github.com/MDAnalysis/mdanalysis/issues/2428
> > >>>
> > >>> Best regards,
> > >>> Len
> > >>>
> > >>
> > >> -- Gromacs Developers mailing list
> > >>
> > >> * Please search the archive at
> > >> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List
> > >> before posting!
> > >>
> > >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >>
> > >> * For (un)subscribe requests visit
> > >>
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
> > >> or send a mail to gmx-developers-request at gromacs.org
> > >> <mailto:gmx-developers-request at gromacs.org>.
> > >>
> > >>
> > >>
> > >>--
> > >>Erik Lindahl <erik.lindahl at dbb.su.se <mailto:erik.lindahl at dbb.su.se>>
> > >>Professor of Biophysics, Dept. Biochemistry & Biophysics, Stockholm
> University
> > >>Science for Life Laboratory, Box 1031, 17121 Solna, Sweden
> > >>
> > >
> > >--
> > >Paul Bauer, PhD
> > >GROMACS Release Manager
> > >KTH Stockholm, SciLifeLab
> > >0046737308594
>
>
> > --
> > Paul Bauer, PhD
> > GROMACS Release Manager
> > KTH Stockholm, SciLifeLab
> > 0046737308594
>
> --
> Gromacs Developers mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
> or send a mail to gmx-developers-request at gromacs.org.
--
Erik Lindahl <erik.lindahl at dbb.su.se>
Professor of Biophysics, Dept. Biochemistry & Biophysics, Stockholm
University
Science for Life Laboratory, Box 1031, 17121 Solna, Sweden
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20191230/a00337c3/attachment.html>
More information about the gromacs.org_gmx-developers
mailing list