[gmx-users] Hardware-specific crash with 4.5.1
Justin A. Lemkul
jalemkul at vt.edu
Tue Sep 28 17:25:44 CEST 2010
Roland Schulz wrote:
>
>
> On Mon, Sep 27, 2010 at 9:58 PM, Mark Abraham <mark.abraham at anu.edu.au
> <mailto:mark.abraham at anu.edu.au>> wrote:
>
>
>
> ----- Original Message -----
> From: "Justin A. Lemkul" <jalemkul at vt.edu <mailto:jalemkul at vt.edu>>
> Date: Tuesday, September 28, 2010 11:39
> Subject: Re: [gmx-users] Hardware-specific crash with 4.5.1
> To: Discussion list for GROMACS users <gmx-users at gromacs.org
> <mailto:gmx-users at gromacs.org>>
>
> >
> >
> > Mark Abraham wrote:
> > >
> > >
> > >----- Original Message -----
> > >From: "Justin A. Lemkul" <jalemkul at vt.edu <mailto:jalemkul at vt.edu>>
> > >Date: Tuesday, September 28, 2010 11:11
> > >Subject: Re: [gmx-users] Hardware-specific crash with 4.5.1
> > >To: Gromacs Users' List <gmx-users at gromacs.org
> <mailto:gmx-users at gromacs.org>>
> > >
> > > >
> > > >
> > > > Roland Schulz wrote:
> > > > >Justin,
> > > > >
> > > > >I think the interaction kernel is not OK on your PowerPC
> > > > machine. I assume that from: 1) The force seems to be zero
> > > > (minimization output). 2) When you use the all-to-all kernel
> > > > which is not available for the powerpc kernel, it automatically
> > > > falls back to the C kernel and then it works.
> > > > >
> > > >
> > > > Sounds about right.
> > > >
> > > > >What is the kernel you are using? It should say in the log
> > > > file. Look for: "Configuring single precision IBM Power6-
> > > > specific Fortran kernels" or "Testing Altivec/VMX support"
> > > > >
> > > >
> > > > I'm not finding either in the config.log - weird?
> > >
> > >You were meant to look in the mdrun.log for runtime
> > confirmation of what kernels GROMACS has decided to use.
> > >
> >
> > That seems entirely obvious, now that you mention it :)
> > Conveniently, I find the following in the md.log file from the
> > (failing) autoconf-assembled mdrun:
> >
> > Configuring nonbonded kernels...
> > Configuring standard C nonbonded kernels...
> > Testing Altivec/VMX support... present.
> > Configuring PPC/Altivec nonbonded kernels...
> >
> > The (non)MPI CMake build shows the following:
> >
> > Configuring nonbonded kernels...
> > Configuring standard C nonbonded kernels...
> >
> > So it seems clear to me that autoconf built faulty nonbonded
> > kernels, and CMake didn't.
>
> OK, so assuming that PPC/Altivec kernels are supposed to be good for
> Mac (as they were in 4.0.x, I believe):
>
> 1) CMake doesn't detect that it should be using those kernels, and
> so appears to work, but does an inefficient run. autoconf detects
> that it should use those kernels, but the mdrun fails for reasons
> that are not yet clear.
>
>
> > > > >You can also look in the config.h whether GMX_POWER6
> > > > and/or GMX_PPC_ALTIVEC is set. I suggest you try to compile with
> > > > one/both of them deactivated and see whether that solves it.
> > > > This will make it slower too. Thus if this is indeed the
> > > > problem, you will probably want to figure out why the fastest
> > > > kernel doesn't work correctly to get good performance.
> > > > >
> > > >
> > > > It looks like GMX_PPC_ALTIVEC is set. I suppose I
> > could re-
> > > > compile with this turned off.
> > >
> > >This is supposed to be fine for Mac, as I understand.
> > >
> > > > Here's what's even weirder. The problematic version was
> > > > compiled using the standard autoconf procedure. If I
> > use a
> > > > CMake-compiled version, the energy minimization runs fine,
> > > > giving the same results (energy and force) as the two
> > systems I
> > > > know are good. So I guess there's something wrong with the
> > > > way autoconf installed Gromacs. Perhaps this isn't of
> > > > concern since Gromacs will require CMake in subsequent releases,
> > > > but I figure I should at least report it in case it affects
> > > > anyone else.
> > > >
> > > > If I may tack one more question on here, I'm wondering why my
> > > > CMake installation doesn't actually appear to be using
> > > > MPI. I get the right result, but the problem is, I get a
> > > > .log, .edr, and .trr for every processor that's being used, as
> > > > if each processor is being given its own job and not
> > > > distributing the work. Here's how I compiled my MPI mdrun,
> > > > version 4.5.1:
> > >
> > >At the start and end of the .log files you should get
> > indicators about how many MPI processes were actually being used.
> > >
> >
> > That explains it (sort of). It looks like mdrun thinks
> > it's only being run over 1 node, just several times over, and a
> > bunch of junk that isn't getting written properly:
> >
> > Log file opened on Mon Sep 27 21:36:00 2010
> > Host: n235 pid: 6857 nodeid: 0 nnodes: 1
> > The Gromacs distribution was built @TMP_TIME@ by
> > jalemkul at sysx2.arc-int.vt.edu
> <mailto:jalemkul at sysx2.arc-int.vt.edu> [CMAKE] (@TMP_MACHINE@)
> >
> > Frustrating.
>
> You can set the GMX_NOOPTIMIZEDKERNELS environment variable with
> your autoconf build to see whether the MPI issue is CMake-dependent.
> Normally, I'd say your supercomputer MPI environment isn't being
> invoked correctly, but presumably you already know how to do that
> right...
>
>
> > > > cmake ../gromacs-4.5.1 -
> > DFFTW3F_LIBRARIES=/home/rdiv1001/fftw-
> > > > 3.0.1-osx/lib/libfftw3f.a -
> > > > DFFTW3F_INCLUDE_DIR=/home/rdiv1001/fftw-3.0.1-osx/include/ -
> > > > DCMAKE_INSTALL_PREFIX=/home/rdiv1001/gromacs-4.5_cmake-osx -
> > > > DGMX_BINARY_SUFFIX=_4.5_cmake_mpi -DGMX_THREADS=OFF -
> > > > DBUILD_SHARED_LIBS=OFF -DGMX_X11=OFF -DGMX_MPI=ON -
> > > > DMPI_COMPILER=/home/rdiv1001/compilers/openmpi-1.2.3-
> > > > osx/bin/mpicxx -
> > > > DMPI_INCLUDE_PATH=/home/rdiv1001/compilers/openmpi-1.2.3-
> > osx/include> >
> > > > $ make mdrun
> > > >
> > > > $ make install-mdrun
> > > >
> > > > Is there anything obviously wrong with those commands? Is
> > > > there any way I should know (before actually using mdrun)
> > > > whether or not I've done things right?
> > >
> > >I think there ought to be, but IMO not enough preparation and
> > testing has gone into the CMake switch for it to be usable.
> > >
> >
> > I agree. After hours of hacking CMake to try to make it
> > work (and thinking I had gotten it squared away), the MPI
> > doesn't seem to function. The "old" way of doing things
> > worked flawlessly, except that somewhere between 4.0.7 and
> > 4.5.1, the nonbonded kernels that used to work on our
> > architecture somehow got hosed. So now I'm in limbo.
>
> Well that's why Autoconf is still supported for 4.5 - to allow a smooth
> transition. It is difficult to get enough feedback for cmake for just
> the development version. Only know with the cmake support in the
> released version their is enough feedback to make the uncommon cases
> work. But you can still use autoconf if cmake doesn't work. Thus it
> should work for everyone and we get the required feedback to use cmake
> for 5.0.
>
>
> Sounds like Bugzilla time.
>
>
> I agree you should file 3 bugs.
> 1) CMake + MPI -> MPI doesn't work,
> 2) CMake + Altivec -> Altivec isn't detected
> 3) Altivec produces 0 forces
>
Bugs 572-574 have been filed, for anyone that wants to follow them.
-Justin
> As far as I know the POWER6 kernel should work too for your hardware and
> should be a little bit faster than the standard C kernel. You might want
> to try that until the Altivec kernel is fixed.
>
> Roland
>
>
>
> --
> ORNL/UT Center for Molecular Biophysics cmb.ornl.gov <http://cmb.ornl.gov>
> 865-241-1537, ORNL PO BOX 2008 MS6309
>
--
========================================
Justin A. Lemkul
Ph.D. Candidate
ICTAS Doctoral Scholar
MILES-IGERT Trainee
Department of Biochemistry
Virginia Tech
Blacksburg, VA
jalemkul[at]vt.edu | (540) 231-9080
http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin
========================================
More information about the gromacs.org_gmx-users
mailing list