[gmx-users] Hardware-specific crash with 4.5.1
Roland Schulz
roland at utk.edu
Tue Sep 28 07:57:10 CEST 2010
On Mon, Sep 27, 2010 at 9:58 PM, Mark Abraham <mark.abraham at anu.edu.au>wrote:
>
>
> ----- Original Message -----
> From: "Justin A. Lemkul" <jalemkul at vt.edu>
> Date: Tuesday, September 28, 2010 11:39
> Subject: Re: [gmx-users] Hardware-specific crash with 4.5.1
> To: Discussion list for GROMACS users <gmx-users at gromacs.org>
>
> >
> >
> > Mark Abraham wrote:
> > >
> > >
> > >----- Original Message -----
> > >From: "Justin A. Lemkul" <jalemkul at vt.edu>
> > >Date: Tuesday, September 28, 2010 11:11
> > >Subject: Re: [gmx-users] Hardware-specific crash with 4.5.1
> > >To: Gromacs Users' List <gmx-users at gromacs.org>
> > >
> > > >
> > > >
> > > > Roland Schulz wrote:
> > > > >Justin,
> > > > >
> > > > >I think the interaction kernel is not OK on your PowerPC
> > > > machine. I assume that from: 1) The force seems to be zero
> > > > (minimization output). 2) When you use the all-to-all kernel
> > > > which is not available for the powerpc kernel, it automatically
> > > > falls back to the C kernel and then it works.
> > > > >
> > > >
> > > > Sounds about right.
> > > >
> > > > >What is the kernel you are using? It should say in the log
> > > > file. Look for: "Configuring single precision IBM Power6-
> > > > specific Fortran kernels" or "Testing Altivec/VMX support"
> > > > >
> > > >
> > > > I'm not finding either in the config.log - weird?
> > >
> > >You were meant to look in the mdrun.log for runtime
> > confirmation of what kernels GROMACS has decided to use.
> > >
> >
> > That seems entirely obvious, now that you mention it :)
> > Conveniently, I find the following in the md.log file from the
> > (failing) autoconf-assembled mdrun:
> >
> > Configuring nonbonded kernels...
> > Configuring standard C nonbonded kernels...
> > Testing Altivec/VMX support... present.
> > Configuring PPC/Altivec nonbonded kernels...
> >
> > The (non)MPI CMake build shows the following:
> >
> > Configuring nonbonded kernels...
> > Configuring standard C nonbonded kernels...
> >
> > So it seems clear to me that autoconf built faulty nonbonded
> > kernels, and CMake didn't.
>
> OK, so assuming that PPC/Altivec kernels are supposed to be good for Mac
> (as they were in 4.0.x, I believe):
>
> 1) CMake doesn't detect that it should be using those kernels, and so
> appears to work, but does an inefficient run. autoconf detects that it
> should use those kernels, but the mdrun fails for reasons that are not yet
> clear.
>
>
> > > > >You can also look in the config.h whether GMX_POWER6
> > > > and/or GMX_PPC_ALTIVEC is set. I suggest you try to compile with
> > > > one/both of them deactivated and see whether that solves it.
> > > > This will make it slower too. Thus if this is indeed the
> > > > problem, you will probably want to figure out why the fastest
> > > > kernel doesn't work correctly to get good performance.
> > > > >
> > > >
> > > > It looks like GMX_PPC_ALTIVEC is set. I suppose I
> > could re-
> > > > compile with this turned off.
> > >
> > >This is supposed to be fine for Mac, as I understand.
> > >
> > > > Here's what's even weirder. The problematic version was
> > > > compiled using the standard autoconf procedure. If I
> > use a
> > > > CMake-compiled version, the energy minimization runs fine,
> > > > giving the same results (energy and force) as the two
> > systems I
> > > > know are good. So I guess there's something wrong with the
> > > > way autoconf installed Gromacs. Perhaps this isn't of
> > > > concern since Gromacs will require CMake in subsequent releases,
> > > > but I figure I should at least report it in case it affects
> > > > anyone else.
> > > >
> > > > If I may tack one more question on here, I'm wondering why my
> > > > CMake installation doesn't actually appear to be using
> > > > MPI. I get the right result, but the problem is, I get a
> > > > .log, .edr, and .trr for every processor that's being used, as
> > > > if each processor is being given its own job and not
> > > > distributing the work. Here's how I compiled my MPI mdrun,
> > > > version 4.5.1:
> > >
> > >At the start and end of the .log files you should get
> > indicators about how many MPI processes were actually being used.
> > >
> >
> > That explains it (sort of). It looks like mdrun thinks
> > it's only being run over 1 node, just several times over, and a
> > bunch of junk that isn't getting written properly:
> >
> > Log file opened on Mon Sep 27 21:36:00 2010
> > Host: n235 pid: 6857 nodeid: 0 nnodes: 1
> > The Gromacs distribution was built @TMP_TIME@ by
> > jalemkul at sysx2.arc-int.vt.edu [CMAKE] (@TMP_MACHINE@)
> >
> > Frustrating.
>
> You can set the GMX_NOOPTIMIZEDKERNELS environment variable with your
> autoconf build to see whether the MPI issue is CMake-dependent. Normally,
> I'd say your supercomputer MPI environment isn't being invoked correctly,
> but presumably you already know how to do that right...
>
>
> > > > cmake ../gromacs-4.5.1 -
> > DFFTW3F_LIBRARIES=/home/rdiv1001/fftw-
> > > > 3.0.1-osx/lib/libfftw3f.a -
> > > > DFFTW3F_INCLUDE_DIR=/home/rdiv1001/fftw-3.0.1-osx/include/ -
> > > > DCMAKE_INSTALL_PREFIX=/home/rdiv1001/gromacs-4.5_cmake-osx -
> > > > DGMX_BINARY_SUFFIX=_4.5_cmake_mpi -DGMX_THREADS=OFF -
> > > > DBUILD_SHARED_LIBS=OFF -DGMX_X11=OFF -DGMX_MPI=ON -
> > > > DMPI_COMPILER=/home/rdiv1001/compilers/openmpi-1.2.3-
> > > > osx/bin/mpicxx -
> > > > DMPI_INCLUDE_PATH=/home/rdiv1001/compilers/openmpi-1.2.3-
> > osx/include> >
> > > > $ make mdrun
> > > >
> > > > $ make install-mdrun
> > > >
> > > > Is there anything obviously wrong with those commands? Is
> > > > there any way I should know (before actually using mdrun)
> > > > whether or not I've done things right?
> > >
> > >I think there ought to be, but IMO not enough preparation and
> > testing has gone into the CMake switch for it to be usable.
> > >
> >
> > I agree. After hours of hacking CMake to try to make it
> > work (and thinking I had gotten it squared away), the MPI
> > doesn't seem to function. The "old" way of doing things
> > worked flawlessly, except that somewhere between 4.0.7 and
> > 4.5.1, the nonbonded kernels that used to work on our
> > architecture somehow got hosed. So now I'm in limbo.
>
Well that's why Autoconf is still supported for 4.5 - to allow a smooth
transition. It is difficult to get enough feedback for cmake for just the
development version. Only know with the cmake support in the released
version their is enough feedback to make the uncommon cases work. But you
can still use autoconf if cmake doesn't work. Thus it should work for
everyone and we get the required feedback to use cmake for 5.0.
> Sounds like Bugzilla time.
>
I agree you should file 3 bugs.
1) CMake + MPI -> MPI doesn't work,
2) CMake + Altivec -> Altivec isn't detected
3) Altivec produces 0 forces
As far as I know the POWER6 kernel should work too for your hardware and
should be a little bit faster than the standard C kernel. You might want to
try that until the Altivec kernel is fixed.
Roland
--
ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
865-241-1537, ORNL PO BOX 2008 MS6309
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20100928/181aea94/attachment.html>
More information about the gromacs.org_gmx-users
mailing list