[gmx-developers] Native endianess in TPR body
paul.bauer.q at gmail.com
Fri Dec 27 08:36:18 CET 2019
I'll check the issue today and have a fix in mind that should address the
Concerning the TPR size I can check again for possible optimizations, but
the way the code is written is to simply ensure that variables are stored
in multiples of <char>.
Could you please also open an issue at redmine.gromacs.org with the files
for the test cases and target it at 2020?
I'll see that there is a possible fix for you to try later today.
On Thu, 26 Dec 2019, 23:04 Jonathan Barnoud, <jonathan at barnoud.net> wrote:
> Hello everyone,
> I upgraded the code of MDAnalysis to read the latest TPR version. To add
> to Len's comments, it appears indeed that the new TPR body is 4 times as
> big as it use to be for the same content, and is not portable between
> architectures. gmx dump does fail at reading a file with a different byte
> order than native, and there is no obvious way to determine the endianness
> of the body. While the TPR format is not meant to really be portable, it
> seemed commonly agreed that it was a good file to share (
> https://pubs.acs.org/doi/abs/10.1021/acs.jcim.9b00665), it is for sure a
> good input file in MDAnalysis. TPR files are commonly produced on a local
> machine before being actually run on a cluster, that may use a different
> byte order.
> > Second the individual bytes of a value are padded to 4 bytes per
> original bytes (each byte is packed as `char`).
> To be noted that the in-file XDR decoder in gromacs (used for the header
> and prior to gromacs 2020) uses 4 bytes for "char", hence the padding. The
> in-memory one reads 1 padded byte (1 byte of information, 4 bytes in the
> As my use case for noticing these differences is fairly niche, I may be
> missing the reason for them. In such case, I would be curious to read about
> Best regards,
> On 12/26/19 7:39 PM, Len Kimms wrote:
> Hello everyone,
> while fooling around with the new (i.e. version 2020 rc1) TPR file format I noticed some strange behaviors that I don’t understand. As far as I understand the body of the new format is written by the `gmx::InMemorySerializer`. My following questions are basically about this module.
> First it seems that the memory serializer writes the values in native byte order. This means that the body of TPR files differ between big- and little-endian systems. The XDR standard used before requires big-endian data. For me, a novice user, the new implementation seems to be less portable and robust. Endian swapping seems to be implemented but not currently used for TPR files.
> Is this intentional, if so, why?
> Second the individual bytes of a value are padded to 4 bytes per original bytes (each byte is packed as `char`). Therefore the size increases accordingly.
> Do those padding bytes serve a special purpose?
> Also regarding the padding bytes: Some bytes are not, like most others, padded with zeros. In some places they are padded with ones. At first glance this seem to happen to the second byte (big-endian) of a float. From some initial testing my best guess is, that this is caused by the union conversion in `CharBuffer`. With an `unsigned char` in the private union `u` those values would be zero padded.
> In the attachment one could find example files from a big- and little-endian system as well as a file created with GROMACS 2019.
> I also brought this to the attention of the MDAnalysis devs here:https://github.com/MDAnalysis/mdanalysis/issues/2428
> Best regards,
> Gromacs Developers mailing list
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> * For (un)subscribe requests visit
> or send a mail to gmx-developers-request at gromacs.org.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the gromacs.org_gmx-developers