[gmx-developers] Native endianess in TPR body

Fri Dec 27 11:16:25 CET 2019

Hello,

I opened https://redmine.gromacs.org/issues/3269 for this and should 
have a fix for it soon.

Cheers

Paul

On 27/12/2019 10:12, Erik Lindahl wrote:
> Hi Len & Jonathan,
>
> Paul found an issue related to different-endianness-reading that has 
> apparently slipped through the Debian tests (since they didn't run the 
> regression tests by default). We'll get a fix in for that before the 
> release.
>
> The reason for the change is that the XDR I/IO layer is becoming very 
> outdated. First, while it made a lot of sense to stick to the standard 
> (big) "network endian" in the late 90s, today the problem is that 
> virtually every single architecture is little endian, so you incur all 
> the overhead of swapping both on writing and reading. Second, the way 
> this is implemented in XDR means it's very slow - we're basically 
> doing byte-by-byte reading.
>
> This change will instead allow all architectures to use highly 
> efficient buffered I/O in their default endian, and then we only have 
> to bother about swapping endianness in the rare cases an actual 
> big-endian machine is involved.
>
> We'll also look into the one-padding; for Gromacs it doesn't matter, 
> but avoiding that might indeed make the life of other codes easier.
>
> Cheers,
>
> Erik
>
>
>
>
>
>
>
> On Thu, Dec 26, 2019 at 11:04 PM Jonathan Barnoud 
> <jonathan at barnoud.net <mailto:jonathan at barnoud.net>> wrote:
>
>     Hello everyone,
>
>     I upgraded the code of MDAnalysis to read the latest TPR version.
>     To add to Len's comments, it appears indeed that the new TPR body
>     is 4 times as big as it use to be for the same content, and is not
>     portable between architectures. gmx dump does fail at reading a
>     file with a different byte order than native, and there is no
>     obvious way to determine the endianness of the body. While the TPR
>     format is not meant to really be portable, it seemed commonly
>     agreed that it was a good file to share
>     (https://pubs.acs.org/doi/abs/10.1021/acs.jcim.9b00665), it is for
>     sure a good input file in MDAnalysis. TPR files are commonly
>     produced on a local machine before being actually run on a
>     cluster, that may use a different byte order.
>
>     > Second the individual bytes of a value are padded to 4 bytes per
>     original bytes (each byte is packed as `char`).
>
>     To be noted that the in-file XDR decoder in gromacs (used for the
>     header and prior to gromacs 2020) uses 4 bytes for "char", hence
>     the padding. The in-memory one reads 1 padded byte (1 byte of
>     information, 4 bytes in the file).
>
>     As my use case for noticing these differences is fairly niche, I
>     may be missing the reason for them. In such case, I would be
>     curious to read about them.
>
>     Best regards,
>     Jonathan
>
>
>     On 12/26/19 7:39 PM, Len Kimms wrote:
>>     Hello everyone,
>>
>>     while fooling around with the new (i.e. version 2020 rc1) TPR file format I noticed some strange behaviors that I don’t understand. As far as I understand the body of the new format is written by the `gmx::InMemorySerializer`. My following questions are basically about this module.
>>
>>     First it seems that the memory serializer writes the values in native byte order. This means that the body of TPR files differ between big- and little-endian systems. The XDR standard used before requires big-endian data. For me, a novice user, the new implementation seems to be less portable and robust. Endian swapping seems to be implemented but not currently used for TPR files.
>>     Is this intentional, if so, why?
>>
>>     Second the individual bytes of a value are padded to 4 bytes per original bytes (each byte is packed as `char`). Therefore the size increases accordingly.
>>     Do those padding bytes serve a special purpose?
>>     Also regarding the padding bytes: Some bytes are not, like most others, padded with zeros. In some places they are padded with ones. At first glance this seem to happen to the second byte (big-endian) of a float. From some initial testing my best guess is, that this is caused by the union conversion in `CharBuffer`. With an `unsigned char` in the private union `u` those values would be zero padded.
>>
>>     In the attachment one could find example files from a big- and little-endian system as well as a file created with GROMACS 2019.
>>     I also brought this to the attention of the MDAnalysis devs here:
>>     https://github.com/MDAnalysis/mdanalysis/issues/2428
>>
>>     Best regards,
>>         Len
>>
>
>     -- 
>     Gromacs Developers mailing list
>
>     * Please search the archive at
>     http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List
>     before posting!
>
>     * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
>     * For (un)subscribe requests visit
>     https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>     or send a mail to gmx-developers-request at gromacs.org
>     <mailto:gmx-developers-request at gromacs.org>.
>
>
>
> -- 
> Erik Lindahl <erik.lindahl at dbb.su.se <mailto:erik.lindahl at dbb.su.se>>
> Professor of Biophysics, Dept. Biochemistry & Biophysics, Stockholm 
> University
> Science for Life Laboratory, Box 1031, 17121 Solna, Sweden
>

-- 
Paul Bauer, PhD
GROMACS Release Manager
KTH Stockholm, SciLifeLab
0046737308594

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20191227/5a216cc7/attachment-0001.html>