[gmx-users] Gromacs job locks up computer (reproducibly)
Marc Baaden
baaden at smplinux.de
Wed Sep 3 12:11:01 CEST 2003
Hi,
thanks for all your comments and suggestions. Also sorry to be
so slow, but as I'll explain below, there seems to be the necessity
to run ~2 weeks to reproduce the problem.
What happened ? I have a long run, that crashed twice at exactly the
same timeframe, locking the box completely, no trace in the logs.
Unfortunately this happens about after 2 weeks runtime.
So I thought, let's take the latest saved coords in trr and start
from there, so I should be able to reproduce it more rapidly.
After ca 7 days of calculation, that attempt finished successfully
(eg without locking up the computer). Meaning that for now I'll have
to run the full 2 weeks job if I want to test it.
So I will now try at least once again the long 2 week run on this
computer and on a slightly newer AMD box, to see whether I can
reproduce it. So you'll have to hang on about 2-3 further weeks
for the results :))
Just to summarize:
- the problem is *not* related to filesize (eg 2GB), all files are
significantly smaller than 1GB
- it happened with the standard gromacs source (3.1.4)
- I could try Erik's fix for using 3Dnow instead of SSE
- it could be related to the presence of an NVidia AGP card as has
been reported. Might check that.
- I'd need to check that I do not run out of ram (1GB !?) or swap
(but how ?)
- I'd need to find a file that crashes sooner than after 2 weeks :))
- the simulation crashed exactly at step 1006400 (which is not a special
power of 2)
- for now I only have one such system, and given the 2 week duration,
little testing, so I do not know if
* it depends on the output (nst*out) settings?
* it is dependent on the system I simulate?
* it is dependent on memory (type, size, usage), machine load, gromacs
version, compiler options?
- I will check if it happens on another dual athlon
- I could try to swap CPU .. but that needs a lot of changes, so I probably
won't
Here is /proc/cpuinfo (as Erik requested):
:541; cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 6
model : 8
model name : AMD Athlon(tm) MP 2600+
stepping : 1
cpu MHz : 2133.462
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
bogomips : 4259.84
processor : 1
vendor_id : AuthenticAMD
cpu family : 6
model : 8
model name : AMD Athlon(tm) Processor
stepping : 1
cpu MHz : 2133.462
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
bogomips : 4259.84
regards,
Marc
--
Dr. Marc Baaden - Institut de Biologie Physico-Chimique, Paris
mailto:baaden at smplinux.de - http://www.marc-baaden.de
FAX: +49 697912 39550 - Tel: +33 15841 5176 ou +33 609 843217
More information about the gromacs.org_gmx-users
mailing list