[gmx-developers] Native endianess in TPR body
paul.bauer.q at gmail.com
Fri Dec 27 12:13:12 CET 2019
fix has been upload here: https://gerrit.gromacs.org/c/gromacs/+/15059
On 27/12/2019 11:18, Paul bauer wrote:
> I opened https://redmine.gromacs.org/issues/3269 for this and should
> have a fix for it soon.
> On 27/12/2019 10:12, Erik Lindahl wrote:
>> Hi Len & Jonathan,
>> Paul found an issue related to different-endianness-reading that has
>> apparently slipped through the Debian tests (since they didn't run
>> the regression tests by default). We'll get a fix in for that before
>> the release.
>> The reason for the change is that the XDR I/IO layer is becoming very
>> outdated. First, while it made a lot of sense to stick to the
>> standard (big) "network endian" in the late 90s, today the problem is
>> that virtually every single architecture is little endian, so you
>> incur all the overhead of swapping both on writing and reading.
>> Second, the way this is implemented in XDR means it's very slow -
>> we're basically doing byte-by-byte reading.
>> This change will instead allow all architectures to use highly
>> efficient buffered I/O in their default endian, and then we only have
>> to bother about swapping endianness in the rare cases an actual
>> big-endian machine is involved.
>> We'll also look into the one-padding; for Gromacs it doesn't matter,
>> but avoiding that might indeed make the life of other codes easier.
>> On Thu, Dec 26, 2019 at 11:04 PM Jonathan Barnoud
>> <jonathan at barnoud.net <mailto:jonathan at barnoud.net>> wrote:
>> Hello everyone,
>> I upgraded the code of MDAnalysis to read the latest TPR version.
>> To add to Len's comments, it appears indeed that the new TPR body
>> is 4 times as big as it use to be for the same content, and is
>> not portable between architectures. gmx dump does fail at reading
>> a file with a different byte order than native, and there is no
>> obvious way to determine the endianness of the body. While the
>> TPR format is not meant to really be portable, it seemed commonly
>> agreed that it was a good file to share
>> (https://pubs.acs.org/doi/abs/10.1021/acs.jcim.9b00665), it is
>> for sure a good input file in MDAnalysis. TPR files are commonly
>> produced on a local machine before being actually run on a
>> cluster, that may use a different byte order.
>> > Second the individual bytes of a value are padded to 4 bytes
>> per original bytes (each byte is packed as `char`).
>> To be noted that the in-file XDR decoder in gromacs (used for the
>> header and prior to gromacs 2020) uses 4 bytes for "char", hence
>> the padding. The in-memory one reads 1 padded byte (1 byte of
>> information, 4 bytes in the file).
>> As my use case for noticing these differences is fairly niche, I
>> may be missing the reason for them. In such case, I would be
>> curious to read about them.
>> Best regards,
>> On 12/26/19 7:39 PM, Len Kimms wrote:
>>> Hello everyone,
>>> while fooling around with the new (i.e. version 2020 rc1) TPR file format I noticed some strange behaviors that I don’t understand. As far as I understand the body of the new format is written by the `gmx::InMemorySerializer`. My following questions are basically about this module.
>>> First it seems that the memory serializer writes the values in native byte order. This means that the body of TPR files differ between big- and little-endian systems. The XDR standard used before requires big-endian data. For me, a novice user, the new implementation seems to be less portable and robust. Endian swapping seems to be implemented but not currently used for TPR files.
>>> Is this intentional, if so, why?
>>> Second the individual bytes of a value are padded to 4 bytes per original bytes (each byte is packed as `char`). Therefore the size increases accordingly.
>>> Do those padding bytes serve a special purpose?
>>> Also regarding the padding bytes: Some bytes are not, like most others, padded with zeros. In some places they are padded with ones. At first glance this seem to happen to the second byte (big-endian) of a float. From some initial testing my best guess is, that this is caused by the union conversion in `CharBuffer`. With an `unsigned char` in the private union `u` those values would be zero padded.
>>> In the attachment one could find example files from a big- and little-endian system as well as a file created with GROMACS 2019.
>>> I also brought this to the attention of the MDAnalysis devs here:
>>> Best regards,
>> Gromacs Developers mailing list
>> * Please search the archive at
>> before posting!
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> * For (un)subscribe requests visit
>> or send a mail to gmx-developers-request at gromacs.org
>> <mailto:gmx-developers-request at gromacs.org>.
>> Erik Lindahl <erik.lindahl at dbb.su.se <mailto:erik.lindahl at dbb.su.se>>
>> Professor of Biophysics, Dept. Biochemistry & Biophysics, Stockholm
>> Science for Life Laboratory, Box 1031, 17121 Solna, Sweden
> Paul Bauer, PhD
> GROMACS Release Manager
> KTH Stockholm, SciLifeLab
Paul Bauer, PhD
GROMACS Release Manager
KTH Stockholm, SciLifeLab
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the gromacs.org_gmx-developers