[gmx-developers] gromacs 3.3

Rainer Böckmann rainer at bioinformatik.uni-saarland.de
Mon Oct 17 20:00:54 CEST 2005


Hi Campbell,

thanks for your suggestion! Our opteron problem was due to the original BIOS
setup: replacing "ultra" by "raid" did the job...

Cheers,
rainer

Quoting Campbell Millar <c.millar at elec.gla.ac.uk>:

> Hey Rainer,
>
> Have you checked dmesg ?
>
> We have a problem here with the 2.6 kernel on opteron. It is due to a 
> bug in the 2.6 branch which means that processes running in both 
> physical and virtual memory get corrupted at seemingly random 
> intervals, if the resident size is larger than the available physical 
> memory. The job spins and produces no O/P but seems to be running and 
> occasionally the kernel falls over and you have to physically 
> restart. Not quite a zombie but close.
>
> If it's the same problem that we are having (Not with GROMACS I 
> should add :) but with other large simulation codes. ) then you will 
> probably see something like:
>
> Sep 25 04:15:01 quad01.beowulf.cluster kernel: invalid operand: 0000 [1] SMP
>
> From dmesg.
>
> As far as I am aware there isn't a proper fix for the problem yet and 
> my not be until the 2.8 kernel is released. One thing that helped to 
> stabilise things a little was to set kernel.shmax and kernel.shmall ( 
> in /etc/sysctl.conf ) equal to the amount of physical memory 
> available to each processor. I guess that would be halved in the case 
> of dual core. It's not pretty but it helps to stop things falling 
> over.
>
> Here's hoping its a different problem 'cause it's driving me nuts here :).
>
> Cheers,
>
> Campbell
>
> On 10 Oct 2005, at 12:50, Rainer Böckmann wrote:
>
>> Dear All:
>>
>> we faced a problem on dual dual-core opterons (2GHz) using gromacs 
>> 3.3_rc2/3:
>>
>> While the Lys/PME benchmark (few minutes runtime) gives very nice 
>> results (6.83 ns/day), runs appear to be quite unstable. E.g. a 
>> membrane simulation (PME) on the four cores will stop after one to 
>> ten hours, without any error message.  To be more precise, the jobs 
>> appear to be still running but do not produce any output after a 
>> while (cpu temperature <=65 degrees, memory exchanged).
>>
>> Details:
>> suse 9.3, kernel 2.6.11
>> gromacs 3.3_rc2 (gromacs 3.3_rc3)
>> fftw 2.1.5 (fftw 3.0.1)
>> lam 7.1.1
>>
>> Is there any solution for this problem?
>>
>> Thanks!
>> Rainer
>>
>> -- 
>>
> __________________________
> Dr Campbell Millar,
> University of Glasgow,
> Device Modelling Group,
> Oakfield Avenue,
> Glasgow G12 8LT
> tel: +44 141 330 4792  fax: +44 141 330 4907
>
> "Diplomacy is the art of saying 'nice doggy', till you can find a rock"
> Larry Niven.



__________________________________________________________
Dr. Rainer Böckmann
Theoretical & Computational Membrane Biology
Center for Bioinformatics Saar
Universität des Saarlandes
Gebäude 17.1, EG
D-66041 Saarbrücken, Germany
Phone: ++49 +681 302-64169  FAX: ++49 +681 302-64180
E-Mail: rainer at bioinformatik.uni-saarland.de
http://www.bioinf.uni-sb.de/RB/
___________________________________________________________






More information about the gromacs.org_gmx-developers mailing list