[gmx-developers] gromacs 3.3

Campbell Millar c.millar at elec.gla.ac.uk
Mon Oct 17 16:17:30 CEST 2005


Hey Rainer,

Have you checked dmesg ?

We have a problem here with the 2.6 kernel on opteron. It is due to a 
bug in the 2.6 branch which means that processes running in both 
physical and virtual memory get corrupted at seemingly random 
intervals, if the resident size is larger than the available physical 
memory. The job spins and produces no O/P but seems to be running and 
occasionally the kernel falls over and you have to physically restart. 
Not quite a zombie but close.

If it's the same problem that we are having (Not with GROMACS I should 
add :) but with other large simulation codes. ) then you will probably 
see something like:

Sep 25 04:15:01 quad01.beowulf.cluster kernel: invalid operand: 0000 
[1] SMP

 From dmesg.

As far as I am aware there isn't a proper fix for the problem yet and 
my not be until the 2.8 kernel is released. One thing that helped to 
stabilise things a little was to set kernel.shmax and kernel.shmall ( 
in /etc/sysctl.conf ) equal to the amount of physical memory available 
to each processor. I guess that would be halved in the case of dual 
core. It's not pretty but it helps to stop things falling over.

Here's hoping its a different problem 'cause it's driving me nuts here 
:).

Cheers,

Campbell

On 10 Oct 2005, at 12:50, Rainer Böckmann wrote:

> Dear All:
>
> we faced a problem on dual dual-core opterons (2GHz) using gromacs 
> 3.3_rc2/3:
>
> While the Lys/PME benchmark (few minutes runtime) gives very nice 
> results (6.83 ns/day), runs appear to be quite unstable. E.g. a 
> membrane simulation (PME) on the four cores will stop after one to ten 
> hours, without any error message.  To be more precise, the jobs appear 
> to be still running but do not produce any output after a while (cpu 
> temperature <=65 degrees, memory exchanged).
>
> Details:
> suse 9.3, kernel 2.6.11
> gromacs 3.3_rc2 (gromacs 3.3_rc3)
> fftw 2.1.5 (fftw 3.0.1)
> lam 7.1.1
>
> Is there any solution for this problem?
>
> Thanks!
> Rainer
>
> -- 
> __________________________________________________________
> Dr. Rainer Böckmann
> Theoretical & Computational Membrane Biology
> Center for Bioinformatics Saar
> Universität des Saarlandes
> Gebäude 17.1, EG
> D-66041 Saarbrücken, Germany
> Phone: ++49 +681 302-64169  FAX: ++49 +681 302-64180
> E-Mail: rainer at bioinformatik.uni-saarland.de
> http://www.bioinf.uni-sb.de/RB/
> ___________________________________________________________
>
> _______________________________________________
> gmx-developers mailing list
> gmx-developers at gromacs.org
> http://www.gromacs.org/mailman/listinfo/gmx-developers
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-developers-request at gromacs.org.
>
__________________________
Dr Campbell Millar,
University of Glasgow,
Device Modelling Group,
Oakfield Avenue,
Glasgow G12 8LT
tel: +44 141 330 4792  fax: +44 141 330 4907

"Diplomacy is the art of saying 'nice doggy', till you can find a rock"
Larry Niven.




More information about the gromacs.org_gmx-developers mailing list