[gmx-users] Difficulties with MPI in gromacs 4.6.3

Thu Sep 19 08:40:43 CEST 2013

Thanks for the response. On further investigation, the problem only seems
to occur in jobs running via MPI on our GPU-enabled nodes, even if the
simulation in question doesn't use GPUs. Re-compiling gromacs 4.6.3 without
CUDA support eliminates the memory-hogging behavior. However, I'd like to
actually use our fancy new hardware. Does that help narrow down the issue
at all?

Each node contains two 6-core processors and three Fermi M2090 cards, of
which two share the same PCI root complex. We're using CUDA version 5.5.
The thread-MPI version of gromacs 4.6.3 can use all three GPUs with no
obvious problems, and can run non-GPU simulations on GPU-containing nodes
just fine. My problem seems restricted to the use of 'real' MPI in the
presence of GPUs, even when GPUs are not used in the simulation.

On Tue, Sep 17, 2013 at 2:26 AM, Mark Abraham <mark.j.abraham at gmail.com>wrote:

>
> Hmm. That warning is a known issue in some cases:
> http://www.open-mpi.org/faq/?category=openfabrics#ofa-fork but should
> not be an issue for the above mdrun command, since it should call none
> of popen/fork/system. You might like to try some of the diagnostics on
> that page.
>

This tells me that I do have fork support.

I wasn't clear in the previous message - if it changes anything, I get the
fork warning from 'mdrun_mpi -s topol.tpr', even without running it through
orterun, for the CUDA-enabled version only. However, under these conditions
memory usage is normal. I should mention, 'orterun -np 1 mdrun_mpi ...'
works fine also.

Without CUDA, I don't get the fork warning at all.

> I can think of no reason for or past experience of this behaviour. Is
> it possible for you to run mdrun_mpi in a debugger and get a call
> stack trace to help us diagnose?
>

Do you have any suggestions on the best way to do this? I do very little
parallel programming of my own, and the only way I know of is to attach a
gdb to one of the spawned processes via the PID (e.g., as suggested here:
http://www.open-mpi.org/faq/?category=debugging#serial-debuggers). But in
this case the processes are immediately terminated by the queue. There is
no MPI_ABORT, so the mpi_abort_print_stack mca option to orterun doesn't do
the trick either.

OK, thanks, good diagnosis. Some low-level stuff did get refactored
> after 4.6.1. I don't think that will be the issue here, but you could
> see if it produces the same symptoms / magically works.
>

4.6.1 behaves the same way.

In retrospect, I spoke too soon - under the impression that MPI was the
issue, I tried to simplify matters by compiling 4.5.5 without GPU support,
which might explain why it works. I haven't re-compiled it with GPU support
yet, since it seems that it's a bit more finicky than 4.6.

Thanks,

-Kate