[gmx-users] Hard lock up

Bill (William) Triest wtriest at chemistry.ohio-state.edu
Fri Oct 8 21:37:41 CEST 2004


On Fri, 2004-10-08 at 15:17, David wrote:
> On Fri, 2004-10-08 at 21:06, Bill (William) Triest wrote:
> > I'm an undergrad student worker, and running gromacs under linux is
> > locking up one of our systems.  Version 3.1 used to run fine, until 3.2
> > was installed.  3.2 started locking up the system (and I mean LOCKING it
> > up, you can ssh into the box, you can't ctrl-c to kill it) etc.  They
> > tried reverting to 3.1, but its still causing problems.  It only happens
> > on large jobs, but we have a nearly identical box (running 3.1) that can
> > run the jobs fine.  Since the lockups only happen while running gromacs,
> > and the machine does see some other loads (vmware and custom written
> > software), I think its related to gromacs.  The box is currently running
> > red hat 9, and is an smp machine (and yes I did try the mapi version,
> > and I did ensure that the installed version of lam was as the same major
> > version).  I tried googling for the problem, so I'm just hoping for
> > pointers as to where to start RTFMing.
> Does this happen to be an Athlon box?
> In that case you may want to upgrade to 3.2.1 in which a workaround for
> a bug in the Athlon was introduced. On the other hand, the bug was in
> 3.1 also.

Yes its an athlon box, but I double checked and we are attempting to run
3.2.1  (sorry about not lisitng the .1, I wasn't aware of it at the
time)  The program runs fine on a single cpu athlon box w/ only a gig of
memory, but it crashes on a dual processor athlon mp box w/ 2 gigs of
memory.

> 
> Otherwise gromacs stresses the CPU really hard. Could it be heating
> problems? Do you have temperature sensors on the chips? Could be a
> broken fan or a rotten memory chip too...

We did have a bad memory module last spring (it was still under warrenty
and got replaced) and the first thing I did when I heard the box started
hard hard-locking up again was run memtest86 on it.  As for over eating,
it runs fine EXCEPT when gromacs runs.  Since they run custom written
apps that take over a week to run and that I know stress the CPU, I'm
guessing its not that.  (Though I'm going to double check, just in case)

> 
> If you are "sure" the hardware is not too blame, give us some more info
> on the kind of jobs that crash the machine.

You know I have no idea.  I know from where he's running gromacs, and I
can see the log files, but that's about it.  I'm an undergrad CS major
who works for computer support.  If you can tell me what I should look
for I will happily get you the info.

Thanks for the quick reply, and sorry becuase I feel like I'm not being
helpful here.

Thanks,
William Triest
Student Worker - Linux








More information about the gromacs.org_gmx-users mailing list