[gmx-developers] How to deal with Clang-tidy/Clang static analyzer warnings in existing headers?

Ullmann, Thomas thomas.ullmann at mpibpc.mpg.de
Wed Feb 27 17:48:30 CET 2019


Hi,

> At least for the patches you link to, I cannot see any file using the SIMD module?

The SIMD headers are included in this patch https://gerrit.gromacs.org/#/c/6801/.
The others depend on this patch in turn. Strangely, the error messages are not issued
for this patch itself; not the only strange issue with the Clang analysis tools.

>When it comes to warnings, please link to specific warnings/output rather than describing them :-)

One more strange thing, the first category of warnings is only obtained in my local build
of the FMA patch with a Clang 7 subversion build from svn

1081 /home/tullman/GROMACS/gromacs.release-5-X.my/src/gromacs/simd/impl_x86_avx_256/impl_x86_avx_256_simd_double.h:60:9: error: constructor does not initialize these fields: simdInternal_ [cppcoreguidelines-pro-type-member-init,-warnings-as-errors]
1082         SimdDouble() {}
1083         ^

This error is reported for all the SimdXYZ types in the file. And its true, simdInternal_
is not initialized in these default constructors, but maybe intentionally(?).

The second category also appears in the Jenkins build of the FMA patch (https://gerrit.gromacs.org/#/c/3277/)
whose output can be found here:
http://jenkins.gromacs.org/job/Matrix_PreSubmit_master/8476/warnings13Result/package.-231232643/

There are two kinds of error messages repeated several times for different lines of scalar.h
concerning use and declaration of unions of an int and a float:

>scalar.h:335, Clang (LLVM based), Priority: High
>
>do not access members of unions; use (boost::)variant instead

and

>scalar.h:329, Clang (LLVM based), Priority: High
>
>uninitialized record type: 'conv2'

>----------------------------------------
I think there are at least two questions/issues to be aware of:

1) When it comes to using the SIMD module, one has to be aware it's very much a compromise. It leads to code that can be VERY difficult to adjust/maintain/understand, occasionally double implementations (with/without SIMD), and at least the problem that anybody editing it in the future will have to be aware of SIMD. This is a price that might be worth it for certain core parts of the code where we collectively spend millions of core-hours on large computers, but IMHO it should not be used for analysis tools, even it it might create a nice speedup.   When it comes to SIMD code an unions, I suspect you are hitting warnings related to conversions between floating-point and bitwise representations. This requires a round-trip to memory in C++, which is not strictly guaranteed by the standard (IEEE754 is not a requirement of C++). This is another reason for not touching the SIMD parts of the code unless you are working with very deep plumbing :-)
------------------------------------------------<

OK, I understand and I do not intend to touch the SIMD module itself. I wrote the SIMD code for the linear algebra on
my matrix classes for use in core gmx, that is in the lambda dynamics module, which (hopefully) will be run very frequently
and would then of course consume valuable resources on supercomputers. The functional mode analysis can just also profit
from this code ...

>To get the change you link to in shape to be commited, here are my general comments (just based in 5 minutes looking at it):
>
>1) Forget SIMD. Clean up the patch and get it working 100% first. If the most common complaint among GROMACS users a year from now is that functional mode analysis needs to be >faster, then we'll consider acceleration :-)

OK, thanks for your advice. Following your and Christian's suggestion, I will split the linear
algebra patch in a patch with the basic, scalar functions and one with the SIMD functions.
Is that what you intended with your suggestion? I'm CCing Luka for the following points.
Functionality-wise everything is very throroughly verified and tested and equipped with
the corresponding unit tests.

>6) For documentation, you will have to explain the method/algorithm in the code rather than referring people to a long (?) scientific paper.

OK, you suggested that before and I did follow your advice. I had added brief documentation about the
method to the header of the partial least-squares method, but moved it to the reference manual following
Carsten's advice and now only refer to it in this header. This approach seemed reasonable to us, because the
reference manual is accessible to users and developers, while users will perhaps seldom consult the source
code.  The FMA reference manual section was committed with the FMA patch here:

https://gerrit.gromacs.org/#/c/3277/50/docs/reference-manual/analysis/functional-mode-analysis.rst

Do you think this information should rather be moved back to the Doxygen doucmentation of the header
or duplicated (in part) there?

Thanks for your advice, It would be really great if we could finally add the FMA tool to GROMACS.
As Bert may have told you, we are confident that it will make a highly useful addition to the toolbox
of the molecular simulation community that deserves much wider adoption.

Best,
Thomas,

________________________________________
From: gromacs.org_gmx-developers-bounces at maillist.sys.kth.se <gromacs.org_gmx-developers-bounces at maillist.sys.kth.se> on behalf of Erik Lindahl <erik.lindahl at gmail.com>
Sent: Wednesday, February 27, 2019 3:22:37 PM
To: gmx-developers at gromacs.org
Subject: Re: [gmx-developers] How to deal with Clang-tidy/Clang static analyzer warnings in existing headers?

Hi,

At least for the patches you link to, I cannot see any file using the SIMD module?
When it comes to warnings, please link to specific warnings/output rather than describing them :-)


I think there are at least two questions/issues to be aware of:

1) When it comes to using the SIMD module, one has to be aware it's very much a compromise. It leads to code that can be VERY difficult to adjust/maintain/understand, occasionally double implementations (with/without SIMD), and at least the problem that anybody editing it in the future will have to be aware of SIMD. This is a price that might be worth it for certain core parts of the code where we collectively spend millions of core-hours on large computers, but IMHO it should not be used for analysis tools, even it it might create a nice speedup.   When it comes to SIMD code an unions, I suspect you are hitting warnings related to conversions between floating-point and bitwise representations. This requires a round-trip to memory in C++, which is not strictly guaranteed by the standard (IEEE754 is not a requirement of C++). This is another reason for not touching the SIMD parts of the code unless you are working with very deep plumbing :-)


2) When it comes to errors in old-vs.-new parts of the code it's fairly easy: Since the master branch passes verification, any errors in old code will have been in parts that were not used prior to the new change. That is not ideal, but it might happen. Effectively, we treat that the same way as new code (since it is newly *used* code) - it has to be fixed before the change can go in.


To get the change you link to in shape to be commited, here are my general comments (just based in 5 minutes looking at it):

1) Forget SIMD. Clean up the patch and get it working 100% first. If the most common complaint among GROMACS users a year from now is that functional mode analysis needs to be faster, then we'll consider acceleration :-)

2) Document the classes carefully. How should the class be *used* (not merely that it implements a method), and break them apart into small manageable classes. Aim at max ~5 data members per class, not 25-50. Make sure you specify what the allowed input/parameters are, and what will happen if somebody provides illegal values.

3) Same thing for member classes - aim for functions that are 5-10 lines, and in worst case 25-50. No function in a new piece of code should be 200+ lines-of-code, because it becomes impossible to understand and test everything that function does. Second, make sure the code is documented inside those routines too - because the person doing code review needs to understand the code, not merely pray the compiler found the errors and hope the first user knew what (s)he was doing.

4) Use C++ iteration constructs to avoid lots of loop variables, and avoid things like manual vector resizes unless absolute necessary. It is often better to let something go out of scope and create a new vector, unless it has unacceptable performance implications. Readability & maintainability trumps performance.

5) Several routines in the changes appear to work with raw pointers, which is a fairly big no-no for new code since we know we're not smart enough to avoid causing memory errors. Stick to STL datatypes & iterators.

6) For documentation, you will have to explain the method/algorithm in the code rather than referring people to a long (?) scientific paper.



Cheers,

Erik





Cheers,

Erik


On Wed, Feb 27, 2019 at 2:46 PM Ullmann, Thomas <thomas.ullmann at mpibpc.mpg.de<mailto:thomas.ullmann at mpibpc.mpg.de>> wrote:
Hi,

we are getting warnings in Jenkins builds with the Clang static analysis tools
for two patches that use the SIMD module. One category of warnings concerns
the member vector of the Simd classes not being initialized in the default
constructor of these classes. This time the warnings are not false positives,
but I guess there is a reason for not initializing these members(?).
The other concerns the way data in a union is accessed in scalar.h which is
apparently not in line with cppcoreguidelines-pro-type-union-access.

How should one deal with such warnings that pertain to preexisting code?

The patches can be found here:
https://gerrit.gromacs.org/#/c/4015/
https://gerrit.gromacs.org/#/c/3277/

Best,
Thomas.
--
Gromacs Developers mailing list

* Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers or send a mail to gmx-developers-request at gromacs.org<mailto:gmx-developers-request at gromacs.org>.


--
Erik Lindahl <erik.lindahl at dbb.su.se<mailto:erik.lindahl at dbb.su.se>>
Professor of Biophysics, Dept. Biochemistry & Biophysics, Stockholm University
Science for Life Laboratory, Box 1031, 17121 Solna, Sweden


More information about the gromacs.org_gmx-developers mailing list