[gmx-developers] Native endianess in TPR body

Fri Dec 27 08:36:18 CET 2019

Hello,

I'll check the issue today and have a fix in mind that should address the
portability issue.

Concerning the TPR size I can check again for possible optimizations, but
the way the code is written is to simply ensure that variables are stored
in multiples of <char>.

Could you please also open an issue at redmine.gromacs.org with the files
for the test cases and target it at 2020?

I'll see that there is a possible fix for you to try later today.

Cheers

Paul

On Thu, 26 Dec 2019, 23:04 Jonathan Barnoud, <jonathan at barnoud.net> wrote:

> Hello everyone,
>
> I upgraded the code of MDAnalysis to read the latest TPR version. To add
> to Len's comments, it appears indeed that the new TPR body is 4 times as
> big as it use to be for the same content, and is not portable between
> architectures. gmx dump does fail at reading a file with a different byte
> order than native, and there is no obvious way to determine the endianness
> of the body. While the TPR format is not meant to really be portable, it
> seemed commonly agreed that it was a good file to share (
> https://pubs.acs.org/doi/abs/10.1021/acs.jcim.9b00665), it is for sure a
> good input file in MDAnalysis. TPR files are commonly produced on a local
> machine before being actually run on a cluster, that may use a different
> byte order.
>
> > Second the individual bytes of a value are padded to 4 bytes per
> original bytes (each byte is packed as `char`).
>
> To be noted that the in-file XDR decoder in gromacs (used for the header
> and prior to gromacs 2020) uses 4 bytes for "char", hence the padding. The
> in-memory one reads 1 padded byte (1 byte of information, 4 bytes in the
> file).
>
> As my use case for noticing these differences is fairly niche, I may be
> missing the reason for them. In such case, I would be curious to read about
> them.
>
> Best regards,
> Jonathan
>
>
> On 12/26/19 7:39 PM, Len Kimms wrote:
>
> Hello everyone,
>
> while fooling around with the new (i.e. version 2020 rc1) TPR file format I noticed some strange behaviors that I don’t understand. As far as I understand the body of the new format is written by the `gmx::InMemorySerializer`. My following questions are basically about this module.
>
> First it seems that the memory serializer writes the values in native byte order. This means that the body of TPR files differ between big- and little-endian systems. The XDR standard used before requires big-endian data. For me, a novice user, the new implementation seems to be less portable and robust. Endian swapping seems to be implemented but not currently used for TPR files.
> Is this intentional, if so, why?
>
> Second the individual bytes of a value are padded to 4 bytes per original bytes (each byte is packed as `char`). Therefore the size increases accordingly.
> Do those padding bytes serve a special purpose?
> Also regarding the padding bytes: Some bytes are not, like most others, padded with zeros. In some places they are padded with ones. At first glance this seem to happen to the second byte (big-endian) of a float. From some initial testing my best guess is, that this is caused by the union conversion in `CharBuffer`. With an `unsigned char` in the private union `u` those values would be zero padded.
>
> In the attachment one could find example files from a big- and little-endian system as well as a file created with GROMACS 2019.
> I also brought this to the attention of the MDAnalysis devs here:https://github.com/MDAnalysis/mdanalysis/issues/2428
>
> Best regards,
>    Len
>
>
>
> --
> Gromacs Developers mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
> or send a mail to gmx-developers-request at gromacs.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20191227/e3bccce6/attachment.html>