Hi everyone, I am new learner to Gromacs and I want to implement Gromacs in
a multi-core CPU machine which is  for my research. Because the machine we
use only support MPI (no openmp, no SIMD), so I profiled the MPI-only
version of Gromacs-5.0.4 and found that the hotspot was nbnxn_kernel_ref()
in src/gromacs/mdlib/nbnxn_kernel_ref.c which occupied 80% of the total
running time. Naturally I want to accelerate the nbnxn_kernel_ref() by
parallelization with multi-thread. After I simply make some analysis and
found that the structure of nbnxn_kernel_ref() is like below:
for (nb = 0 ; nb < nnbl; nb++)
      for( n = 0 ; n < nbl->nci ; n++ )  // defined in
So here is my quesion. When I compile with OpenMP=OFF, the value of nnbl is
1 during the runtime.  So  can I  parallelize the inner loop by just evenly
separate the inner loop  ( nbl->nci )  to multi-core?
Or I found that when I compile with OpenMP=ON the value of nnbl is not 1
which I can parallelize the outer loop with multi-thread. But my machine
does not support OpenMP. So is there any way to make some modification in
the code and compile with OpenMP=OFF to make the value of nnbl is not 1?
