[gmx-users] Hardware-specific crash with 4.5.1

Mark Abraham mark.abraham at anu.edu.au
Tue Sep 28 03:58:57 CEST 2010



----- Original Message -----
From: "Justin A. Lemkul" <jalemkul at vt.edu>
Date: Tuesday, September 28, 2010 11:39
Subject: Re: [gmx-users] Hardware-specific crash with 4.5.1
To: Discussion list for GROMACS users <gmx-users at gromacs.org>

> 
> 
> Mark Abraham wrote:
> >
> >
> >----- Original Message -----
> >From: "Justin A. Lemkul" <jalemkul at vt.edu>
> >Date: Tuesday, September 28, 2010 11:11
> >Subject: Re: [gmx-users] Hardware-specific crash with 4.5.1
> >To: Gromacs Users' List <gmx-users at gromacs.org>
> >
> > >
> > >
> > > Roland Schulz wrote:
> > > >Justin,
> > > >
> > > >I think the interaction kernel is not OK on your PowerPC
> > > machine. I assume that from: 1) The force seems to be zero
> > > (minimization output). 2) When you use the all-to-all kernel
> > > which is not available for the powerpc kernel, it automatically
> > > falls back to the C kernel and then it works.
> > > >
> > >
> > > Sounds about right.
> > >
> > > >What is the kernel you are using? It should say in the log
> > > file. Look for: "Configuring single precision IBM Power6-
> > > specific Fortran kernels" or "Testing Altivec/VMX support"
> > > >
> > >
> > > I'm not finding either in the config.log - weird?
> >
> >You were meant to look in the mdrun.log for runtime 
> confirmation of what kernels GROMACS has decided to use.
> > 
> 
> That seems entirely obvious, now that you mention it :)  
> Conveniently, I find the following in the md.log file from the 
> (failing) autoconf-assembled mdrun:
> 
> Configuring nonbonded kernels...
> Configuring standard C nonbonded kernels...
> Testing Altivec/VMX support... present.
> Configuring PPC/Altivec nonbonded kernels...
> 
> The (non)MPI CMake build shows the following:
> 
> Configuring nonbonded kernels...
> Configuring standard C nonbonded kernels...
> 
> So it seems clear to me that autoconf built faulty nonbonded 
> kernels, and CMake didn't.

OK, so assuming that PPC/Altivec kernels are supposed to be good for Mac (as they were in 4.0.x, I believe):

1) CMake doesn't detect that it should be using those kernels, and so appears to work, but does an inefficient run. autoconf detects that it should use those kernels, but the mdrun fails for reasons that are not yet clear.
 
> > > >You can also look in the config.h whether  GMX_POWER6
> > > and/or GMX_PPC_ALTIVEC is set. I suggest you try to compile with
> > > one/both of them deactivated and see whether that solves it.
> > > This will make it slower too. Thus if this is indeed the
> > > problem, you will probably want to figure out why the fastest
> > > kernel doesn't work correctly to get good performance.
> > > >
> > >
> > > It looks like GMX_PPC_ALTIVEC is set.  I suppose I 
> could re-
> > > compile with this turned off.
> >
> >This is supposed to be fine for Mac, as I understand.
> >
> > > Here's what's even weirder.  The problematic version was
> > > compiled using the standard autoconf procedure.  If I 
> use a
> > > CMake-compiled version, the energy minimization runs fine,
> > > giving the same results (energy and force) as the two 
> systems I
> > > know are good.  So I guess there's something wrong with the
> > > way autoconf installed Gromacs.  Perhaps this isn't of
> > > concern since Gromacs will require CMake in subsequent releases,
> > > but I figure I should at least report it in case it affects
> > > anyone else.
> > >
> > > If I may tack one more question on here, I'm wondering why my
> > > CMake installation  doesn't actually appear to be using
> > > MPI.  I get the right result, but the problem is, I get a
> > > .log, .edr, and .trr for every processor that's being used, as
> > > if each processor is being given its own job and not
> > > distributing the work. Here's how I compiled my MPI mdrun,
> > > version 4.5.1:
> >
> >At the start and end of the .log files you should get 
> indicators about how many MPI processes were actually being used.
> > 
> 
> That explains it (sort of).  It looks like mdrun thinks 
> it's only being run over 1 node, just several times over, and a 
> bunch of junk that isn't getting written properly:
> 
> Log file opened on Mon Sep 27 21:36:00 2010
> Host: n235  pid: 6857  nodeid: 0  nnodes:  1
> The Gromacs distribution was built @TMP_TIME@ by
> jalemkul at sysx2.arc-int.vt.edu [CMAKE] (@TMP_MACHINE@)
> 
> Frustrating.

You can set the GMX_NOOPTIMIZEDKERNELS environment variable with your autoconf build to see whether the MPI issue is CMake-dependent. Normally, I'd say your supercomputer MPI environment isn't being invoked correctly, but presumably you already know how to do that right...

> > > cmake ../gromacs-4.5.1 -
> DFFTW3F_LIBRARIES=/home/rdiv1001/fftw-
> > > 3.0.1-osx/lib/libfftw3f.a -
> > > DFFTW3F_INCLUDE_DIR=/home/rdiv1001/fftw-3.0.1-osx/include/ -
> > > DCMAKE_INSTALL_PREFIX=/home/rdiv1001/gromacs-4.5_cmake-osx -
> > > DGMX_BINARY_SUFFIX=_4.5_cmake_mpi -DGMX_THREADS=OFF -
> > > DBUILD_SHARED_LIBS=OFF -DGMX_X11=OFF -DGMX_MPI=ON -
> > > DMPI_COMPILER=/home/rdiv1001/compilers/openmpi-1.2.3-
> > > osx/bin/mpicxx -
> > > DMPI_INCLUDE_PATH=/home/rdiv1001/compilers/openmpi-1.2.3-
> osx/include> >
> > > $ make mdrun
> > >
> > > $ make install-mdrun
> > >
> > > Is there anything obviously wrong with those commands?  Is
> > > there any way I should know (before actually using mdrun)
> > > whether or not I've done things right?
> >
> >I think there ought to be, but IMO not enough preparation and 
> testing has gone into the CMake switch for it to be usable.
> >
> 
> I agree.  After hours of hacking CMake to try to make it 
> work (and thinking I had gotten it squared away), the MPI 
> doesn't seem to function.  The "old" way of doing things 
> worked flawlessly, except that somewhere between 4.0.7 and 
> 4.5.1, the nonbonded kernels that used to work on our 
> architecture somehow got hosed. So now I'm in limbo.

Sounds like Bugzilla time.

Mark

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20100928/bfbd3dce/attachment.html>


More information about the gromacs.org_gmx-users mailing list