[gmx-developers] gromacs 3.3
Rainer Böckmann
rainer at bioinformatik.uni-saarland.de
Mon Oct 17 20:00:54 CEST 2005
Hi Campbell,
thanks for your suggestion! Our opteron problem was due to the original BIOS
setup: replacing "ultra" by "raid" did the job...
Cheers,
rainer
Quoting Campbell Millar <c.millar at elec.gla.ac.uk>:
> Hey Rainer,
>
> Have you checked dmesg ?
>
> We have a problem here with the 2.6 kernel on opteron. It is due to a
> bug in the 2.6 branch which means that processes running in both
> physical and virtual memory get corrupted at seemingly random
> intervals, if the resident size is larger than the available physical
> memory. The job spins and produces no O/P but seems to be running and
> occasionally the kernel falls over and you have to physically
> restart. Not quite a zombie but close.
>
> If it's the same problem that we are having (Not with GROMACS I
> should add :) but with other large simulation codes. ) then you will
> probably see something like:
>
> Sep 25 04:15:01 quad01.beowulf.cluster kernel: invalid operand: 0000 [1] SMP
>
> From dmesg.
>
> As far as I am aware there isn't a proper fix for the problem yet and
> my not be until the 2.8 kernel is released. One thing that helped to
> stabilise things a little was to set kernel.shmax and kernel.shmall (
> in /etc/sysctl.conf ) equal to the amount of physical memory
> available to each processor. I guess that would be halved in the case
> of dual core. It's not pretty but it helps to stop things falling
> over.
>
> Here's hoping its a different problem 'cause it's driving me nuts here :).
>
> Cheers,
>
> Campbell
>
> On 10 Oct 2005, at 12:50, Rainer Böckmann wrote:
>
>> Dear All:
>>
>> we faced a problem on dual dual-core opterons (2GHz) using gromacs
>> 3.3_rc2/3:
>>
>> While the Lys/PME benchmark (few minutes runtime) gives very nice
>> results (6.83 ns/day), runs appear to be quite unstable. E.g. a
>> membrane simulation (PME) on the four cores will stop after one to
>> ten hours, without any error message. To be more precise, the jobs
>> appear to be still running but do not produce any output after a
>> while (cpu temperature <=65 degrees, memory exchanged).
>>
>> Details:
>> suse 9.3, kernel 2.6.11
>> gromacs 3.3_rc2 (gromacs 3.3_rc3)
>> fftw 2.1.5 (fftw 3.0.1)
>> lam 7.1.1
>>
>> Is there any solution for this problem?
>>
>> Thanks!
>> Rainer
>>
>> --
>>
> __________________________
> Dr Campbell Millar,
> University of Glasgow,
> Device Modelling Group,
> Oakfield Avenue,
> Glasgow G12 8LT
> tel: +44 141 330 4792 fax: +44 141 330 4907
>
> "Diplomacy is the art of saying 'nice doggy', till you can find a rock"
> Larry Niven.
__________________________________________________________
Dr. Rainer Böckmann
Theoretical & Computational Membrane Biology
Center for Bioinformatics Saar
Universität des Saarlandes
Gebäude 17.1, EG
D-66041 Saarbrücken, Germany
Phone: ++49 +681 302-64169 FAX: ++49 +681 302-64180
E-Mail: rainer at bioinformatik.uni-saarland.de
http://www.bioinf.uni-sb.de/RB/
___________________________________________________________
More information about the gromacs.org_gmx-developers
mailing list