[gmx-users] PME problem on BG/P cluster

Mark Abraham mark.abraham at anu.edu.au
Tue Jun 8 23:48:26 CEST 2010


----- Original Message -----
From: LuLanyuan <lulanyuan at msn.com>
Date: Wednesday, June 9, 2010 2:10
Subject: RE: [gmx-users] PME problem on BG/P cluster
To: gmx-users at gromacs.org


  <!-- .hmmessage P { margin:0px; padding:0px } body.hmmessage { font-size: 10pt; font-family:Verdana } --> -----------------------------------------------------------
| 

>  Dear Mark,
> When I set NOASSEMBLYLOOPS to 1, the simulation could be finished without any problem. So I guess it's related to the assembly loops for BG.

OK, that's useful data. I understand that these loops are believed to work on BG/P. Knowing the code, I can think of no reason why the problem should be BG/P-specific. Please open a http://bugzilla.gromacs.org report, CC my email address, and attach your .tpr, and I'll try it on my BG/L.

I actually have well-tested ready-to-release updates to these kernels that perform up to about 10% better on PME on BG/L. Depending what we find above, and if you're interested, I could let you try them on BG/P, since I don't have the ability to test on BG/P.

Mark

> From: mark.abraham at anu.edu.au
> To: gmx-users at gromacs.org
> Date: Sat, 5 Jun 2010 03:31:33 +1000
> Subject: Re: [gmx-users] PME problem on BG/P cluster
> 
> 
> 
> ----- Original Message -----
> From: LuLanyuan <lulanyuan at msn.com>
> Date: Saturday, June 5, 2010 2:01
> Subject: [gmx-users] PME problem on BG/P cluster
> To: gmx-users at gromacs.org
> 
>    .ExternalClass .ecxhmmessage P {padding:0px;} .ExternalClass body.ecxhmmessage {font-size:10pt;font-family:Verdana;}  -----------------------------------------------------------
| 
> >  Hello,
> > I got a weird problem by running Gromacs 4.0.7 on a BG/P machine ("Intrepid" at Argonne national lab).
> > The simulated system is a box of SPC water with 648,000 atoms and all MD simulations were performed on 256 CPU cores with MPI. The compiling environment was Linux with IBM compiler and libs.
> > I first compile the code with flags suggested on the Wiki, such as:
> > ./configure --prefix=$PREFIX \
> >                    --host=ppc \
> >                    --build=ppc64 \
> >                    --disable-software-sqrt \
> >    &  nbsp;               --enable-ppc-sqrt=1 \
> >         &n  bsp;          --enable-ppc-altivec \
> 
> 
> The wiki doesn't suggest this, and it's for another architecture entirely. I don't know if it's causing a problem, but you should see what kernels the .log file reports it is trying to use.
> 
> 
> >                    --enable-bluegene \
> >     &n  bsp;              --disable-fortran \
> >                    --enable-mpi \
> >                    --with-fft=fftpack \
> 
> 
> This is a waste of a good BlueGene :-) Use FFTW, which has been optimized for BlueGene. [Edit: ah I see why you tried this]
> 
> 
> >                    --wit  hout-x \
> > &nbs p;                  CC="mpixlc_r" \
> >                    CFLAGS="-O3 -qarc  h=450d -qtune=450" \
> >                    MPICC="mpixlc_r"
> >                    CXX="mpixlcxx_r"
> >                    CXXFLAGS="-O3 -qarch=450 -qtune=450"
> >                    F77="mpixlf77_r"
> >                    FFLAGS="-O3 -qarch=450 -qtune=450"
> >                    LIBS="-lmass"
> > 
> > Here I used fftpack to ensure that the problem is not due to the fftw lib. I got the water system running will with Cut-off for electrostatics. However,   the systems always crashed after a few ( ~100) steps if I used PME. The same system with same PME option runs fine on other non-blue gene clusters I tested. 
> > The error message I got was sth like 
> > t = 0.100 ps: Water molecule starting at atom 403468 can not be settled.
> > Check for bad contacts and/or reduce the timestep.
> > Wrote pdb files with previous and current coordinates.
> > 
> > and
> > 
> > 2 particles communicated to PME node 63 are more than a cell length out of the domain decomposition cell of their charge group
> > 
> > < /font>From .log file, the kinetic energy is increasing and turned to be "nan". So the system is exploding.
> > 
> > I found if I turned off the blue gene optimizations during configure, the water system could be run without problem. For example, I used
> >                    --enable-software-sqrt \
> >                     -  -disable-ppc-sqrt \
> >                     --disable-bluegene \
> > and everything else was the same. 
> > I suspect there was an issue regarding the blue gene specific code and PME.
> > Could anyone give any comments?
> 
> 
> First, try without --enable-ppc-altivec in case that's confusing things. Then try setting the enviroment variable NOASSEMBLYLOOPS to 1 before mdrun to see whether the issue really is s  pecific to the BlueGene-kernel. (Consult mpirun documentation about how to set env vars suitably) You might also try compiling with lower optimization levels, to see if its a compiler/optimization issue. Depending what you find above, there are other things to try.
> 
> 
> 
> Mark
> 

 |
-----------------------------------------------------------
>  		 	   		  
-----------------------------------------------------------
> 聊天+搜索+邮箱 想要轻松出游,手机MSN帮你搞定! 立刻下载! |
-----------------------------------------------------------
 > -- 
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> Please search the archive at http://www.gromacs.org/search 
> before posting!
> Please don't post (un)subscribe requests to the list. Use the 
> www interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/mailing_lists/users.php




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20100609/82e8f8c8/attachment.html>


More information about the gromacs.org_gmx-users mailing list