[gmx-users] PME problem on BG/P cluster

LuLanyuan lulanyuan at msn.com
Fri Jun 4 17:49:24 CEST 2010


Hello,
I got a weird problem by running Gromacs 4.0.7 on a BG/P machine ("Intrepid" at Argonne national lab).
The simulated system is a box of SPC water with 648,000 atoms and all MD simulations were performed on 256 CPU cores with MPI. The compiling environment was Linux with IBM compiler and libs.
I first compile the code with flags suggested on the Wiki, such as:
./configure --prefix=$PREFIX \
                   --host=ppc \
                   --build=ppc64 \
                   --disable-software-sqrt \
                   --enable-ppc-sqrt=1 \
                   --enable-ppc-altivec \
                   --enable-bluegene \
                   --disable-fortran \
                   --enable-mpi \
                   --with-fft=fftpack \
                   --without-x \
                   CC="mpixlc_r" \
                   CFLAGS="-O3 -qarch=450d -qtune=450" \
                   MPICC="mpixlc_r"
                   CXX="mpixlcxx_r"
                   CXXFLAGS="-O3 -qarch=450 -qtune=450"
                   F77="mpixlf77_r"
                   FFLAGS="-O3 -qarch=450 -qtune=450"
                   LIBS="-lmass"

Here I used fftpack to ensure that the problem is not due to the fftw lib. I got the water system running will with Cut-off for electrostatics. However, the systems always crashed after a few ( ~100) steps if I used PME. The same system with same PME option runs fine on other non-blue gene clusters I tested. 
The error message I got was sth like 
t = 0.100 ps: Water molecule starting at atom 403468 can not be settled.
Check for bad contacts and/or reduce the timestep.
Wrote pdb files with previous and current coordinates.

and

2 particles communicated to PME node 63 are more than a cell length out of the domain decomposition cell of their charge group

>From .log file, the kinetic energy is increasing and turned to be "nan". So the system is exploding.

I found if I turned off the blue gene optimizations during configure, the water system could be run without problem. For example, I used
                   --enable-software-sqrt \

                   --disable-ppc-sqrt \

                   --disable-bluegene \
and everything else was the same. 
I suspect there was an issue regarding the blue gene specific code and PME.
Could anyone give any comments?

Thanks a lot.
Lanyuan Lu
 		 	   		  
_________________________________________________________________
MSN十年回馈,每位用户可免费获得价值25元的卡巴斯基反病毒软件2010激活码,快来领取!
http://kaba.msn.com.cn/?k=1
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20100604/9c21b744/attachment.html>


More information about the gromacs.org_gmx-users mailing list