[gmx-developers] Native endianess in TPR body

Fri Dec 27 12:13:12 CET 2019

Hello,

fix has been upload here: https://gerrit.gromacs.org/c/gromacs/+/15059

Cheers

Paul

On 27/12/2019 11:18, Paul bauer wrote:
> Hello,
>
> I opened https://redmine.gromacs.org/issues/3269 for this and should 
> have a fix for it soon.
>
> Cheers
>
> Paul
>
> On 27/12/2019 10:12, Erik Lindahl wrote:
>> Hi Len & Jonathan,
>>
>> Paul found an issue related to different-endianness-reading that has 
>> apparently slipped through the Debian tests (since they didn't run 
>> the regression tests by default). We'll get a fix in for that before 
>> the release.
>>
>> The reason for the change is that the XDR I/IO layer is becoming very 
>> outdated. First, while it made a lot of sense to stick to the 
>> standard (big) "network endian" in the late 90s, today the problem is 
>> that virtually every single architecture is little endian, so you 
>> incur all the overhead of swapping both on writing and reading. 
>> Second, the way this is implemented in XDR means it's very slow - 
>> we're basically doing byte-by-byte reading.
>>
>> This change will instead allow all architectures to use highly 
>> efficient buffered I/O in their default endian, and then we only have 
>> to bother about swapping endianness in the rare cases an actual 
>> big-endian machine is involved.
>>
>> We'll also look into the one-padding; for Gromacs it doesn't matter, 
>> but avoiding that might indeed make the life of other codes easier.
>>
>> Cheers,
>>
>> Erik
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Dec 26, 2019 at 11:04 PM Jonathan Barnoud 
>> <jonathan at barnoud.net <mailto:jonathan at barnoud.net>> wrote:
>>
>>     Hello everyone,
>>
>>     I upgraded the code of MDAnalysis to read the latest TPR version.
>>     To add to Len's comments, it appears indeed that the new TPR body
>>     is 4 times as big as it use to be for the same content, and is
>>     not portable between architectures. gmx dump does fail at reading
>>     a file with a different byte order than native, and there is no
>>     obvious way to determine the endianness of the body. While the
>>     TPR format is not meant to really be portable, it seemed commonly
>>     agreed that it was a good file to share
>>     (https://pubs.acs.org/doi/abs/10.1021/acs.jcim.9b00665), it is
>>     for sure a good input file in MDAnalysis. TPR files are commonly
>>     produced on a local machine before being actually run on a
>>     cluster, that may use a different byte order.
>>
>>     > Second the individual bytes of a value are padded to 4 bytes
>>     per original bytes (each byte is packed as `char`).
>>
>>     To be noted that the in-file XDR decoder in gromacs (used for the
>>     header and prior to gromacs 2020) uses 4 bytes for "char", hence
>>     the padding. The in-memory one reads 1 padded byte (1 byte of
>>     information, 4 bytes in the file).
>>
>>     As my use case for noticing these differences is fairly niche, I
>>     may be missing the reason for them. In such case, I would be
>>     curious to read about them.
>>
>>     Best regards,
>>     Jonathan
>>
>>
>>     On 12/26/19 7:39 PM, Len Kimms wrote:
>>>     Hello everyone,
>>>
>>>     while fooling around with the new (i.e. version 2020 rc1) TPR file format I noticed some strange behaviors that I don’t understand. As far as I understand the body of the new format is written by the `gmx::InMemorySerializer`. My following questions are basically about this module.
>>>
>>>     First it seems that the memory serializer writes the values in native byte order. This means that the body of TPR files differ between big- and little-endian systems. The XDR standard used before requires big-endian data. For me, a novice user, the new implementation seems to be less portable and robust. Endian swapping seems to be implemented but not currently used for TPR files.
>>>     Is this intentional, if so, why?
>>>
>>>     Second the individual bytes of a value are padded to 4 bytes per original bytes (each byte is packed as `char`). Therefore the size increases accordingly.
>>>     Do those padding bytes serve a special purpose?
>>>     Also regarding the padding bytes: Some bytes are not, like most others, padded with zeros. In some places they are padded with ones. At first glance this seem to happen to the second byte (big-endian) of a float. From some initial testing my best guess is, that this is caused by the union conversion in `CharBuffer`. With an `unsigned char` in the private union `u` those values would be zero padded.
>>>
>>>     In the attachment one could find example files from a big- and little-endian system as well as a file created with GROMACS 2019.
>>>     I also brought this to the attention of the MDAnalysis devs here:
>>>     https://github.com/MDAnalysis/mdanalysis/issues/2428
>>>
>>>     Best regards,
>>>         Len
>>>
>>
>>     -- 
>>     Gromacs Developers mailing list
>>
>>     * Please search the archive at
>>     http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List
>>     before posting!
>>
>>     * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>>     * For (un)subscribe requests visit
>>     https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>>     or send a mail to gmx-developers-request at gromacs.org
>>     <mailto:gmx-developers-request at gromacs.org>.
>>
>>
>>
>> -- 
>> Erik Lindahl <erik.lindahl at dbb.su.se <mailto:erik.lindahl at dbb.su.se>>
>> Professor of Biophysics, Dept. Biochemistry & Biophysics, Stockholm 
>> University
>> Science for Life Laboratory, Box 1031, 17121 Solna, Sweden
>>
>
> -- 
> Paul Bauer, PhD
> GROMACS Release Manager
> KTH Stockholm, SciLifeLab
> 0046737308594


-- 
Paul Bauer, PhD
GROMACS Release Manager
KTH Stockholm, SciLifeLab
0046737308594

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20191227/526ffef7/attachment.html>