[gmx-users] Gromacs job locks up computer (reproducibly)

Marc Baaden baaden at smplinux.de
Wed Sep 3 12:11:01 CEST 2003


Hi,

thanks for all your comments and suggestions. Also sorry to be
so slow, but as I'll explain below, there seems to be the necessity
to run ~2 weeks to reproduce the problem.

What happened ? I have a long run, that crashed twice at exactly the
same timeframe, locking the box completely, no trace in the logs.
Unfortunately this happens about after 2 weeks runtime.
So I thought, let's take the latest saved coords in trr and start
from there, so I should be able to reproduce it more rapidly.
After ca 7 days of calculation, that attempt finished successfully
(eg without locking up the computer). Meaning that for now I'll have
to run the full 2 weeks job if I want to test it.

So I will now try at least once again the long 2 week run on this
computer and on a slightly newer AMD box, to see whether I can
reproduce it. So you'll have to hang on about 2-3 further weeks
for the results :))

Just to summarize:
- the problem is *not* related to filesize (eg 2GB), all files are
  significantly smaller than 1GB
- it happened with the standard gromacs source (3.1.4)
- I could try Erik's fix for using 3Dnow instead of SSE
- it could be related to the presence of an NVidia AGP card as has
  been reported. Might check that.
- I'd need to check that I do not run out of ram (1GB !?) or swap
  (but how ?)
- I'd need to find a file that crashes sooner than after 2 weeks :))
- the simulation crashed exactly at step 1006400 (which is not a special
  power of 2)
- for now I only have one such system, and given the 2 week duration,
  little testing, so I do not know if
    * it depends on the output (nst*out) settings?
    * it is dependent on the system I simulate?
    * it is dependent on memory (type, size, usage), machine load, gromacs
      version, compiler options?
- I will check if it happens on another dual athlon
- I could try to swap CPU .. but that needs a lot of changes, so I probably
  won't

Here is /proc/cpuinfo (as Erik requested):
:541; cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 8
model name      : AMD Athlon(tm) MP 2600+
stepping        : 1
cpu MHz         : 2133.462
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
bogomips        : 4259.84

processor       : 1
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 8
model name      : AMD Athlon(tm) Processor
stepping        : 1
cpu MHz         : 2133.462
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
bogomips        : 4259.84

regards,
Marc

-- 
 Dr. Marc Baaden  - Institut de Biologie Physico-Chimique, Paris
 mailto:baaden at smplinux.de      -      http://www.marc-baaden.de
 FAX: +49 697912 39550  -  Tel: +33 15841 5176 ou +33 609 843217





More information about the gromacs.org_gmx-users mailing list