[gmx-developers] What is the best choice to profile

Szilárd Páll pall.szilard at gmail.com
Tue Aug 12 01:49:55 CEST 2014


Hi,

First of all, the -DCMAKE_BUILD_TYPE=Profile approach is not useful at
all with mdrun as it simply adds -pg to the compiler flags to trigger
instrumentation for gprof. This has huge overhead as each and every
function gets instrumented and often (always?) preventing inlining
which can interfere with SIMD intrinsics too. Many tracing tools will
attempt to do the same either using a frontend cross-compiler like
vtcc or through the "-finstrument-functions" gcc facility (which is
equivalent with -pg except that it only instruments with callbacks
without linking against a library that implements tracing).

Your best bet is to use selective instrumentation or sampling (e.g.
with VTune, HPCToolkit, or others)  The way you do selective
instrumentation will depend on the tool, but you basically want to
minimize the instrumentation overhead by limiting the number of
instrumented functions. You will typically be able to provide file or
function inclusion/exclusion lists or alternatively you can always use
manual instrumentation - but for that you'd need quite in-depth
knowledge of the code. If I remember correctly Vampir can do some
runtime filtering of trace data, but I guess this is more to save
space rather than lower overhead.

Several tools (AFAIK Extra, TAU, Vampirtrace) support tracing MPI,
pthreads, OpenMP, CUDA API etc. calls which should be quite enough to
get started, but traces can be a bit hard to interpret without stack
info. Such traced are made much more readable when the tools support
stack unwinding (e.g. Extrae does). Although AFAIK unwinding this
isn't always bulletproof, it's quite handy to get a useful trace by
running a vanilla un-instrumented code.


Coming back to your original issue regarding the vtcc-compiled mdrun
segfaulting, my guess is that it is  probably the result of incorrect
code generation by the vtcc cross-compiler. I know that Vampirtrace
has been used to collect GROMACS traces, but the work was done by a
Vampir expert and I'm not sure if there is any developer who knows how
was it done. Hence, you may want to try seeking help on the Vampir
mailing list or forums.

Cheers,
--
Szilárd


On Fri, Aug 8, 2014 at 4:50 AM, Theodore Si <sjyzhxw at gmail.com> wrote:
> Hi Mark,
>
> Thanks for your attention. I have some questions for you.
>
>
> 1. Profilers that just need to compile with standard profiling flags are
> fine - just configure with -DCMAKE_BUILD_TYPE=Profile and go for it.
> It seems that this is convenient. What profilers can be used in this way?
> What tools do you guys use to profile?
>
> 2. If you need to pass things to the wrapper compiler, make yourself a
> script to wrap the wrapper compiler and give that to CMake.
> I think it's the case when I compile GMX with Vampir Trace. We need to use
> the command vtcc -vt:cc mpicc sourcefile.c -o a.out to instrument MPI code.
> Should "-vt:cc mpicc" be passed to vtcc, which is set to CMAKE_C_COMPILER,
> as argument? I don't know how to "make yourself a script to wrap the wrapper
> compiler and give that to CMake". How to give a script to CMake exactly? I
> have not used CMake before, please give me some instructions.
>
> When I use Vampir Trace, some weird errors occur:
>
> 1. In some source files, some multiple-line comments are processed by the
> compiler. If I change them to one-line comments, it works fine. I have no
> idea why that happens.
> 2. If I use the following configuration:
> cmake .. -DCMAKE_INSTALL_PREFIX=$MYPRG/gmx
> -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda
> -DCUDA_CUDA_LIBRARY=/usr/lib64/libcuda.so -DCMAKE_C_COMPILER=vtcc
> -DCMAKE_CXX_COMPILER=vtcxx -DCUDA_NVCC_EXECUTABLE=vtnvcc -DGMX_MPI=ON
> -DGMX_BUILD_OWN_FFTW=ON
> The compilation can be done. And when I run grompp to generate tpr, it's OK
> and Vampir Trace will generate trace files. When I run mdrun without
> parameters, it also generate trace files. However, when I actually start
> mdrun with a tpr file, it will give a segmentation fault.
>
>
>
> On 8/7/2014 10:56 PM, Mark Abraham wrote:
>
> Hi,
>
> First, what are you hoping to learn? Many questions have a known answer, or
> are known not to have a clear answer ;-)
>
> It's a compound problem. Profilers that just need to compile with standard
> profiling flags are fine - just configure with -DCMAKE_BUILD_TYPE=Profile
> and go for it. Those that need to influence the compilation or linker line
> are more problematic. Just passing the (full path to) the wrapper compiler
> should work fine. If you need to pass things to the wrapper compiler, make
> yourself a script to wrap the wrapper compiler and give that to CMake.
> Whether the things the tool's wrapper compiler does clashes with things
> GROMACS is doing varies a lot, and you will need to get involved in the
> details to see what the origin of any problems are. That's not anybody's
> fault, per se, but the writer of a wrapper compiler typically hopes the end
> user is not managing details themselves, but to get high performance you
> often have to manage details, and some of those details show up on the
> compiler command lines generated by the GROMACS build system. And naturally
> you'll be interested in MPI+OpenMP+CUDA, each of which compounds the problem
> with further wrapper compilers or command-line stuff.
>
> Once it's working, you have the problem of whether you can get useful data.
> Instrumenting every function call, or compromising function inlining is
> guaranteed to be useless because that overhead kills things. The main
> interesting case to profile is where the MD step iteration time is a few
> milliseconds, and you can't introduce thousands of increments of tens of
> nanoseconds and get sensible profiles. So either you have to restrict the
> instrumentation to high-level functions (which is painful; the output at the
> end of the GROMACS log file is a coarse version of this averaged over many
> steps and execution contexts), or use a sampling-based approach.
>
> Then you need to start collecting data after the run-time performance tuning
> that mdrun does has already stabilized - at least a few hundred MD steps.
> Longer if the MD load is imbalanced, which is also the main interesting case
> to consider for code modifications.
>
> Mark
>
>
> On Wed, Aug 6, 2014 at 10:58 AM, Theodore Si <sjyzhxw at gmail.com> wrote:
>>
>> what kind of tools work best with gromacs 5.0?
>>
>>
>> --
>> Gromacs Developers mailing list
>>
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
>> posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers or
>> send a mail to gmx-developers-request at gromacs.org.
>
>
>
>
>
>
> --
> Gromacs Developers mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers or
> send a mail to gmx-developers-request at gromacs.org.


More information about the gromacs.org_gmx-developers mailing list