[gmx-users] Difficulties with MPI in gromacs 4.6.3
Mark Abraham
mark.j.abraham at gmail.com
Tue Sep 17 08:26:52 CEST 2013
On Tue, Sep 17, 2013 at 2:04 AM, Kate Stafford <kastafford at gmail.com> wrote:
> Hi all,
>
> I'm trying to install and test gromacs 4.6.3 on our new cluster, and am
> having difficulty with MPI. Gromacs has been compiled against openMPI
> 1.6.5. The symptom is, running a very simple MPI process for any of the
> DHFR test systems:
>
> orterun -np 2 mdrun_mpi -s topol.tpr
>
> produces this openMPI warning:
>
> --------------------------------------------------------------------------
> An MPI process has executed an operation involving a call to the
> "fork()" system call to create a child process. Open MPI is currently
> operating in a condition that could result in memory corruption or
> other system errors; your MPI job may hang, crash, or produce silent
> data corruption. The use of fork() (or system() or other calls that
> create child processes) is strongly discouraged.
>
> The process that invoked fork was:
>
> Local host: hb0c1n1.hpc (PID 58374)
> MPI_COMM_WORLD rank: 1
>
> If you are *absolutely sure* that your application will successfully
> and correctly survive a call to fork(), you may disable this warning
> by setting the mpi_warn_on_fork MCA parameter to 0.
> --------------------------------------------------------------------------
Hmm. That warning is a known issue in some cases:
http://www.open-mpi.org/faq/?category=openfabrics#ofa-fork but should
not be an issue for the above mdrun command, since it should call none
of popen/fork/system. You might like to try some of the diagnostics on
that page.
> ...which is immediately followed by program termination by the cluster
> queue due to exceeding the allotted memory for the job. This behavior
> persists no matter how much memory I use, up to 16GB per thread, which is
> surely excessive for any of the DHFR benchmarks. Turning the warning off,
> of course, simply suppresses the output, but doesn't affect the memory
> usage.
I can think of no reason for or past experience of this behaviour. Is
it possible for you to run mdrun_mpi in a debugger and get a call
stack trace to help us diagnose?
> The openMPI install works fine with other MPI-enabled programs, including
> gromacs 4.5.5, so the problem is specific to 4.6.3. The thread-MPI version
> of 4.6.3 is also fine.
OK, thanks, good diagnosis. Some low-level stuff did get refactored
after 4.6.1. I don't think that will be the issue here, but you could
see if it produces the same symptoms / magically works.
> The 4.6.3 MPI executable was compiled with:
>
> cmake .. -DGMX_GPU=ON -DCUDA_TOOLKIT_ROOT_DIR=/nfs/apps/cuda/5.5.22
> -DGMX_MPI=ON -DBUILD_SHARED_LIBS=OFF -DGMX_PREFER_STATIC_LIBS=ON
>
> But the presence of the GPU or static libs related flags seems not to
> affect the behavior. The gcc version (4.4 or 4.8) doesn't matter either.
>
> Any insight as to what I'm doing wrong here?
So far I'd say the problem is not of your making :-(
Mark
More information about the gromacs.org_gmx-users
mailing list