[gmx-users] Hard lock up

David spoel at xray.bmc.uu.se
Fri Oct 8 21:56:23 CEST 2004


On Fri, 2004-10-08 at 21:37, Bill (William) Triest wrote:
> On Fri, 2004-10-08 at 15:17, David wrote:
> > On Fri, 2004-10-08 at 21:06, Bill (William) Triest wrote:
> > > I'm an undergrad student worker, and running gromacs under linux is
> > > locking up one of our systems.  Version 3.1 used to run fine, until 3.2
> > > was installed.  3.2 started locking up the system (and I mean LOCKING it
> > > up, you can ssh into the box, you can't ctrl-c to kill it) etc.  They
> > > tried reverting to 3.1, but its still causing problems.  It only happens
> > > on large jobs, but we have a nearly identical box (running 3.1) that can
> > > run the jobs fine.  Since the lockups only happen while running gromacs,
> > > and the machine does see some other loads (vmware and custom written
> > > software), I think its related to gromacs.  The box is currently running
> > > red hat 9, and is an smp machine (and yes I did try the mapi version,
> > > and I did ensure that the installed version of lam was as the same major
> > > version).  I tried googling for the problem, so I'm just hoping for
> > > pointers as to where to start RTFMing.
> > Does this happen to be an Athlon box?
> > In that case you may want to upgrade to 3.2.1 in which a workaround for
> > a bug in the Athlon was introduced. On the other hand, the bug was in
> > 3.1 also.
> 
> Yes its an athlon box, but I double checked and we are attempting to run
> 3.2.1  (sorry about not lisitng the .1, I wasn't aware of it at the
> time)  The program runs fine on a single cpu athlon box w/ only a gig of
> memory, but it crashes on a dual processor athlon mp box w/ 2 gigs of
> memory.
How about bios settings? Maybe you need a bios upgrade? Or the MP
settings in your bios? Is your user running single or dual processor
jobs?
> 
> > 
> > Otherwise gromacs stresses the CPU really hard. Could it be heating
> > problems? Do you have temperature sensors on the chips? Could be a
> > broken fan or a rotten memory chip too...
> 
> We did have a bad memory module last spring (it was still under warrenty
> and got replaced) and the first thing I did when I heard the box started
> hard hard-locking up again was run memtest86 on it.  As for over eating,
> it runs fine EXCEPT when gromacs runs.  Since they run custom written
> apps that take over a week to run and that I know stress the CPU, I'm
> guessing its not that.  (Though I'm going to double check, just in case)
There is a CPU stress test program on our website somewhere (can't find
it now). It runs quite a few degrees warmer than the AMD testing
program.


-- 
David.
________________________________________________________________________
David van der Spoel, PhD, Assoc. Prof., Molecular Biophysics group,
Dept. of Cell and Molecular Biology, Uppsala University.
Husargatan 3, Box 596,  	75124 Uppsala, Sweden
phone:	46 18 471 4205		fax: 46 18 511 755
spoel at xray.bmc.uu.se	spoel at gromacs.org   http://xray.bmc.uu.se/~spoel
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++




More information about the gromacs.org_gmx-users mailing list