[gmx-users] Possible bug in parallelization, PME or load-balancing on Gromacs 4.0_rc1 ??

Berk Hess gmx3 at hotmail.com
Tue Sep 30 10:10:15 CEST 2008


Hi,

In 4.0 rc1 there is a bug in PME.
Erik mailed that this has been fixed in 4.0 rc2, but actually this is not the case.
The fix was in the head branch of CVS, but not in the release tree.
I have committed the fix now.

Could you check if the crash is due to this?
After line 1040 of src/mdlib/pme.c p0++; has to be added:
            if ((kx>0) || (ky>0)) {
                kzstart = 0;
            } else {
                kzstart = 1;
                p0++;
            }

Berk.


> Subject: RE: [gmx-users] Possible bug in parallelization,	PME or	load-balancing on Gromacs 4.0_rc1 ??
> From: st01397 at student.uib.no
> To: gmx-users at gromacs.org
> Date: Mon, 29 Sep 2008 19:50:57 +0200
> 
> The only Error message I can find is the rather cryptic::
> 
> NOTE: Turning on dynamic load balancing
> 
> _pmii_daemon(SIGCHLD): PE 4 exit signal Killed
> [NID 1412]Apid 159787: initiated application termination
> 
> There are no error's apart from that.
> This may not be very helpful, but I googled this particular error and
> came up with another massively parallel code: Gadget2, also doing domain
> decomposition, and this link:
> 
> http://www.mpa-garching.mpg.de/gadget/gadget-list/0213.html
> 
> Furthermore I can now report that this error is endemic in all my sims
> using harmonic position restraints in GROMACS 4.0_beta1 and GMX
> 4.0_rc1. 
> (I have yet to check if it remains an issue without restraints, but I
> strongly suspect it does:  I did some earlier sims on a gmx version
> downloaded from CVS on 20/08/08 when the DD scheme was just barely
> implemented, seeing similar unexplained crashes.) 
> 
> I thus have some reasons to think it has to do with the new
> domain-decomposition implementation.
> 
> About core dumps. I will talk to our HPC staff, and get back to you with
> something more substantial I hope.
> I guess I could recompile gmx for the Totalview debugger, and give you
> some debugging information from that. Would this be helpful?? 
> 
> Would it be helpful to give you diagnostics from  running mdrun
> verbosely or with the -debug flag??
> 
> If you think it beneficial I can also provide the config.log.
> 
> My configscript is really quite minimal:
> ------------------------
> ! /bin/bash
> export LDFLAGS="-lsci"
> export CFLAGS="-march=barcelona -O3"
> 
> ./configure --prefix=$HOME/gmx_latest_290908 --disable-fortran
> --enable-mpi --without-x --without-xml --with-external-lapack
> --with-external-blas --program-prefix=par  CC=cc  MPICC=cc
> ---------------------------------
> 
> I am using fftw-3.1.1, the gcc-4.2.0 quadcore-edition compiler.
> Cray's optimized XT LibSci 10.3.0 blas/lapack routines
> and Cray's optimized MPI library (based on MPICH2 I believe)
> 
> I will get back to you with more soon
> 
> Regards and thanks
> Bjørn
> 
> > 
> > 
> > Can you produce core dump files?
> > 
> > Berk
> > 
> 
> > > PBS .o: 
> > > Application 159316 exit codes: 137
> > > Application 159316 exit signals: Killed
> > > Application 159316 resources: utime 0, stime 0
> > > --------------------------------------------------
> > > Begin PBS Epilogue hexagon.bccs.uib.no
> > > Date: Mon Sep 29 12:32:54 CEST 2008
> > > Job ID: 65643.nid00003
> > > Username: bjornss
> > > Group: bjornss
> > > Job Name: pmf_hydanneal_heatup_400K
> > > Session: 10156
> > > Limits: walltime=05:00:00
> > > Resources:
> > > cput=00:00:00,mem=4940kb,vmem=22144kb,walltime=00:20:31
> > > Queue: batch
> > > Account: fysisk
> > > Base login-node: login5
> > > End PBS Epilogue Mon Sep 29 12:32:54 CEST 2008
> > > 
> > > PBS .err:
> > > _pmii_daemon(SIGCHLD): PE 0 exit signal Killed
> > > [NID 702]Apid 159316: initiated application termination.
> > > 
> > > As proper electrostatics is crucial to my modeling I am using PME
> > which
> > > comprises a large part of my calculation cost: 35-50%
> > > In the most extreme case, I use the following startup-script
> > > 
> > > run.pbs:
> > > 
> > > #!/bin/bash
> > > #PBS -A fysisk
> > > #PBS -N pmf_hydanneal_heatup_400K
> > > #PBS -o pmf_hydanneal.o
> > > #PBS -e pmf.hydanneal.err
> > > #PBS -l walltime=5:00:00,mppwidth=40,mppnppn=4
> > > 
> > > cd /work/bjornss/pmf/structII/hydrate_annealing/heatup_400K
> > > source $HOME/gmx_latest_290908/bin/GMXRC
> > > 
> > > aprun -n 40 parmdrun -s topol.tpr -maxh 5 -npme 20
> > > exit $?
> > > 
> > > 
> > > Now, apart from a significant reduction in the system dipole moment,
> > > there are no large changes in the system, nor significant
> > translations
> > > of the molecules in the box.
> > > 
> > > I enclose the md.log and my parameter file. The run-topology
> > (topol.tpr)
> > > can be found at:
> > > 
> > > http:/drop.io/mdanneal
> > > 
> > > if anyone wants to try and replicate the crash on their local
> > cluster,
> > > they are welcome.
> > > If after such trials are attempted the error persists, I am willing
> > to
> > > post a bug on bugzilla.
> > > 
> > > 
> > > If more information is needed I will try to provide it upon request
> > > 
> > > 
> > > Regards and thanks for bothering
> > > 
> > > -- 
> > > ---------------------
> > > Bjørn Steen Saethre 
> > > PhD-student
> > > Theoretical and Energy Physics Unit
> > > Institute of Physics and Technology
> > > Allegt, 41
> > > N-5020 Bergen
> > > Norway
> > > 
> > > Tel(office) +47 55582869 
> > > 
> > > 
> > 
> > 
> > ______________________________________________________________________
> > Express yourself instantly with MSN Messenger! MSN Messenger
> > _______________________________________________
> > gmx-users mailing list    gmx-users at gromacs.org
> > http://www.gromacs.org/mailman/listinfo/gmx-users
> > Please search the archive at http://www.gromacs.org/search before posting!
> > Please don't post (un)subscribe requests to the list. Use the 
> > www interface or send it to gmx-users-request at gromacs.org.
> > Can't post? Read http://www.gromacs.org/mailing_lists/users.php
> 
> _______________________________________________
> gmx-users mailing list    gmx-users at gromacs.org
> http://www.gromacs.org/mailman/listinfo/gmx-users
> Please search the archive at http://www.gromacs.org/search before posting!
> Please don't post (un)subscribe requests to the list. Use the 
> www interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/mailing_lists/users.php

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20080930/0dadf220/attachment.html>


More information about the gromacs.org_gmx-users mailing list