[gmx-developers] What is the best choice to profile
sjyzhxw at gmail.com
Fri Aug 15 04:53:37 CEST 2014
Thank you for your attention.
On 8/12/2014 7:49 AM, Szilárd Páll wrote:
> First of all, the -DCMAKE_BUILD_TYPE=Profile approach is not useful at
> all with mdrun as it simply adds -pg to the compiler flags to trigger
> instrumentation for gprof. This has huge overhead as each and every
> function gets instrumented and often (always?) preventing inlining
> which can interfere with SIMD intrinsics too. Many tracing tools will
> attempt to do the same either using a frontend cross-compiler like
> vtcc or through the "-finstrument-functions" gcc facility (which is
> equivalent with -pg except that it only instruments with callbacks
> without linking against a library that implements tracing).
> Your best bet is to use selective instrumentation or sampling (e.g.
> with VTune, HPCToolkit, or others) The way you do selective
> instrumentation will depend on the tool, but you basically want to
> minimize the instrumentation overhead by limiting the number of
> instrumented functions. You will typically be able to provide file or
> function inclusion/exclusion lists or alternatively you can always use
> manual instrumentation - but for that you'd need quite in-depth
> knowledge of the code. If I remember correctly Vampir can do some
> runtime filtering of trace data, but I guess this is more to save
> space rather than lower overhead.
> Several tools (AFAIK Extra, TAU, Vampirtrace) support tracing MPI,
> pthreads, OpenMP, CUDA API etc. calls which should be quite enough to
> get started, but traces can be a bit hard to interpret without stack
> info. Such traced are made much more readable when the tools support
> stack unwinding (e.g. Extrae does). Although AFAIK unwinding this
> isn't always bulletproof, it's quite handy to get a useful trace by
> running a vanilla un-instrumented code.
> Coming back to your original issue regarding the vtcc-compiled mdrun
> segfaulting, my guess is that it is probably the result of incorrect
> code generation by the vtcc cross-compiler. I know that Vampirtrace
> has been used to collect GROMACS traces, but the work was done by a
> Vampir expert and I'm not sure if there is any developer who knows how
> was it done. Hence, you may want to try seeking help on the Vampir
> mailing list or forums.
> On Fri, Aug 8, 2014 at 4:50 AM, Theodore Si <sjyzhxw at gmail.com> wrote:
>> Hi Mark,
>> Thanks for your attention. I have some questions for you.
>> 1. Profilers that just need to compile with standard profiling flags are
>> fine - just configure with -DCMAKE_BUILD_TYPE=Profile and go for it.
>> It seems that this is convenient. What profilers can be used in this way?
>> What tools do you guys use to profile?
>> 2. If you need to pass things to the wrapper compiler, make yourself a
>> script to wrap the wrapper compiler and give that to CMake.
>> I think it's the case when I compile GMX with Vampir Trace. We need to use
>> the command vtcc -vt:cc mpicc sourcefile.c -o a.out to instrument MPI code.
>> Should "-vt:cc mpicc" be passed to vtcc, which is set to CMAKE_C_COMPILER,
>> as argument? I don't know how to "make yourself a script to wrap the wrapper
>> compiler and give that to CMake". How to give a script to CMake exactly? I
>> have not used CMake before, please give me some instructions.
>> When I use Vampir Trace, some weird errors occur:
>> 1. In some source files, some multiple-line comments are processed by the
>> compiler. If I change them to one-line comments, it works fine. I have no
>> idea why that happens.
>> 2. If I use the following configuration:
>> cmake .. -DCMAKE_INSTALL_PREFIX=$MYPRG/gmx
>> -DCUDA_CUDA_LIBRARY=/usr/lib64/libcuda.so -DCMAKE_C_COMPILER=vtcc
>> -DCMAKE_CXX_COMPILER=vtcxx -DCUDA_NVCC_EXECUTABLE=vtnvcc -DGMX_MPI=ON
>> The compilation can be done. And when I run grompp to generate tpr, it's OK
>> and Vampir Trace will generate trace files. When I run mdrun without
>> parameters, it also generate trace files. However, when I actually start
>> mdrun with a tpr file, it will give a segmentation fault.
>> On 8/7/2014 10:56 PM, Mark Abraham wrote:
>> First, what are you hoping to learn? Many questions have a known answer, or
>> are known not to have a clear answer ;-)
>> It's a compound problem. Profilers that just need to compile with standard
>> profiling flags are fine - just configure with -DCMAKE_BUILD_TYPE=Profile
>> and go for it. Those that need to influence the compilation or linker line
>> are more problematic. Just passing the (full path to) the wrapper compiler
>> should work fine. If you need to pass things to the wrapper compiler, make
>> yourself a script to wrap the wrapper compiler and give that to CMake.
>> Whether the things the tool's wrapper compiler does clashes with things
>> GROMACS is doing varies a lot, and you will need to get involved in the
>> details to see what the origin of any problems are. That's not anybody's
>> fault, per se, but the writer of a wrapper compiler typically hopes the end
>> user is not managing details themselves, but to get high performance you
>> often have to manage details, and some of those details show up on the
>> compiler command lines generated by the GROMACS build system. And naturally
>> you'll be interested in MPI+OpenMP+CUDA, each of which compounds the problem
>> with further wrapper compilers or command-line stuff.
>> Once it's working, you have the problem of whether you can get useful data.
>> Instrumenting every function call, or compromising function inlining is
>> guaranteed to be useless because that overhead kills things. The main
>> interesting case to profile is where the MD step iteration time is a few
>> milliseconds, and you can't introduce thousands of increments of tens of
>> nanoseconds and get sensible profiles. So either you have to restrict the
>> instrumentation to high-level functions (which is painful; the output at the
>> end of the GROMACS log file is a coarse version of this averaged over many
>> steps and execution contexts), or use a sampling-based approach.
>> Then you need to start collecting data after the run-time performance tuning
>> that mdrun does has already stabilized - at least a few hundred MD steps.
>> Longer if the MD load is imbalanced, which is also the main interesting case
>> to consider for code modifications.
>> On Wed, Aug 6, 2014 at 10:58 AM, Theodore Si <sjyzhxw at gmail.com> wrote:
>>> what kind of tools work best with gromacs 5.0?
>>> Gromacs Developers mailing list
>>> * Please search the archive at
>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>> * For (un)subscribe requests visit
>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers or
>>> send a mail to gmx-developers-request at gromacs.org.
>> Gromacs Developers mailing list
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers or
>> send a mail to gmx-developers-request at gromacs.org.
More information about the gromacs.org_gmx-developers