[gmx-users] PME problem on BG/P cluster
Mark Abraham
mark.abraham at anu.edu.au
Fri Jun 4 19:31:33 CEST 2010
----- Original Message -----
From: LuLanyuan <lulanyuan at msn.com>
Date: Saturday, June 5, 2010 2:01
Subject: [gmx-users] PME problem on BG/P cluster
To: gmx-users at gromacs.org
<!-- .hmmessage P { margin:0px; padding:0px } body.hmmessage { font-size: 10pt; font-family:Verdana } --> -----------------------------------------------------------
|
> Hello,
> I got a weird problem by running Gromacs 4.0.7 on a BG/P machine ("Intrepid" at Argonne national lab).
> The simulated system is a box of SPC water with 648,000 atoms and all MD simulations were performed on 256 CPU cores with MPI. The compiling environment was Linux with IBM compiler and libs.
> I first compile the code with flags suggested on the Wiki, such as:
> ./configure --prefix=$PREFIX \
> --host=ppc \
> --build=ppc64 \
> --disable-software-sqrt \
> --enable-ppc-sqrt=1 \
> &n bsp; --enable-ppc-altivec \
The wiki doesn't suggest this, and it's for another architecture entirely. I don't know if it's causing a problem, but you should see what kernels the .log file reports it is trying to use.
> --enable-bluegene \
> --disable-fortran \
> --enable-mpi \
> --with-fft=fftpack \
This is a waste of a good BlueGene :-) Use FFTW, which has been optimized for BlueGene. [Edit: ah I see why you tried this]
> --without-x \
> CC="mpixlc_r" \
> CFLAGS="-O3 -qarc h=450d -qtune=450" \
> MPICC="mpixlc_r"
> CXX="mpixlcxx_r"
> CXXFLAGS="-O3 -qarch=450 -qtune=450"
> F77="mpixlf77_r"
> FFLAGS="-O3 -qarch=450 -qtune=450"
> LIBS="-lmass"
>
> Here I used fftpack to ensure that the problem is not due to the fftw lib. I got the water system running will with Cut-off for electrostatics. However, the systems always crashed after a few ( ~100) steps if I used PME. The same system with same PME option runs fine on other non-blue gene clusters I tested.
> The error message I got was sth like
> t = 0.100 ps: Water molecule starting at atom 403468 can not be settled.
> Check for bad contacts and/or reduce the timestep.
> Wrote pdb files with previous and current coordinates.
>
> and
>
> 2 particles communicated to PME node 63 are more than a cell length out of the domain decomposition cell of their charge group
>
> From .log file, the kinetic energy is increasing and turned to be "nan". So the system is exploding.
>
> I found if I turned off the blue gene optimizations during configure, the water system could be run without problem. For example, I used
> --enable-software-sqrt \
> --disable-ppc-sqrt \
> --disable-bluegene \
> and everything else was the same.
> I suspect there was an issue regarding the blue gene specific code and PME.
> Could anyone give any comments?
First, try without --enable-ppc-altivec in case that's confusing things. Then try setting the enviroment variable NOASSEMBLYLOOPS to 1 before mdrun to see whether the issue really is specific to the BlueGene-kernel. (Consult mpirun documentation about how to set env vars suitably) You might also try compiling with lower optimization levels, to see if its a compiler/optimization issue. Depending what you find above, there are other things to try.
Mark
|
-----------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20100605/98027904/attachment.html>
More information about the gromacs.org_gmx-users
mailing list