[gmx-developers] Native endianess in TPR body
Paul bauer
paul.bauer.q at gmail.com
Fri Dec 27 11:16:25 CET 2019
Hello,
I opened https://redmine.gromacs.org/issues/3269 for this and should
have a fix for it soon.
Cheers
Paul
On 27/12/2019 10:12, Erik Lindahl wrote:
> Hi Len & Jonathan,
>
> Paul found an issue related to different-endianness-reading that has
> apparently slipped through the Debian tests (since they didn't run the
> regression tests by default). We'll get a fix in for that before the
> release.
>
> The reason for the change is that the XDR I/IO layer is becoming very
> outdated. First, while it made a lot of sense to stick to the standard
> (big) "network endian" in the late 90s, today the problem is that
> virtually every single architecture is little endian, so you incur all
> the overhead of swapping both on writing and reading. Second, the way
> this is implemented in XDR means it's very slow - we're basically
> doing byte-by-byte reading.
>
> This change will instead allow all architectures to use highly
> efficient buffered I/O in their default endian, and then we only have
> to bother about swapping endianness in the rare cases an actual
> big-endian machine is involved.
>
> We'll also look into the one-padding; for Gromacs it doesn't matter,
> but avoiding that might indeed make the life of other codes easier.
>
> Cheers,
>
> Erik
>
>
>
>
>
>
>
> On Thu, Dec 26, 2019 at 11:04 PM Jonathan Barnoud
> <jonathan at barnoud.net <mailto:jonathan at barnoud.net>> wrote:
>
> Hello everyone,
>
> I upgraded the code of MDAnalysis to read the latest TPR version.
> To add to Len's comments, it appears indeed that the new TPR body
> is 4 times as big as it use to be for the same content, and is not
> portable between architectures. gmx dump does fail at reading a
> file with a different byte order than native, and there is no
> obvious way to determine the endianness of the body. While the TPR
> format is not meant to really be portable, it seemed commonly
> agreed that it was a good file to share
> (https://pubs.acs.org/doi/abs/10.1021/acs.jcim.9b00665), it is for
> sure a good input file in MDAnalysis. TPR files are commonly
> produced on a local machine before being actually run on a
> cluster, that may use a different byte order.
>
> > Second the individual bytes of a value are padded to 4 bytes per
> original bytes (each byte is packed as `char`).
>
> To be noted that the in-file XDR decoder in gromacs (used for the
> header and prior to gromacs 2020) uses 4 bytes for "char", hence
> the padding. The in-memory one reads 1 padded byte (1 byte of
> information, 4 bytes in the file).
>
> As my use case for noticing these differences is fairly niche, I
> may be missing the reason for them. In such case, I would be
> curious to read about them.
>
> Best regards,
> Jonathan
>
>
> On 12/26/19 7:39 PM, Len Kimms wrote:
>> Hello everyone,
>>
>> while fooling around with the new (i.e. version 2020 rc1) TPR file format I noticed some strange behaviors that I don’t understand. As far as I understand the body of the new format is written by the `gmx::InMemorySerializer`. My following questions are basically about this module.
>>
>> First it seems that the memory serializer writes the values in native byte order. This means that the body of TPR files differ between big- and little-endian systems. The XDR standard used before requires big-endian data. For me, a novice user, the new implementation seems to be less portable and robust. Endian swapping seems to be implemented but not currently used for TPR files.
>> Is this intentional, if so, why?
>>
>> Second the individual bytes of a value are padded to 4 bytes per original bytes (each byte is packed as `char`). Therefore the size increases accordingly.
>> Do those padding bytes serve a special purpose?
>> Also regarding the padding bytes: Some bytes are not, like most others, padded with zeros. In some places they are padded with ones. At first glance this seem to happen to the second byte (big-endian) of a float. From some initial testing my best guess is, that this is caused by the union conversion in `CharBuffer`. With an `unsigned char` in the private union `u` those values would be zero padded.
>>
>> In the attachment one could find example files from a big- and little-endian system as well as a file created with GROMACS 2019.
>> I also brought this to the attention of the MDAnalysis devs here:
>> https://github.com/MDAnalysis/mdanalysis/issues/2428
>>
>> Best regards,
>> Len
>>
>
> --
> Gromacs Developers mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List
> before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
> or send a mail to gmx-developers-request at gromacs.org
> <mailto:gmx-developers-request at gromacs.org>.
>
>
>
> --
> Erik Lindahl <erik.lindahl at dbb.su.se <mailto:erik.lindahl at dbb.su.se>>
> Professor of Biophysics, Dept. Biochemistry & Biophysics, Stockholm
> University
> Science for Life Laboratory, Box 1031, 17121 Solna, Sweden
>
--
Paul Bauer, PhD
GROMACS Release Manager
KTH Stockholm, SciLifeLab
0046737308594
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20191227/5a216cc7/attachment-0001.html>
More information about the gromacs.org_gmx-developers
mailing list