[gmx-users] Athlon cluster experience

Erik Lindahl lindahl at stanford.edu
Wed Feb 19 07:32:03 CET 2003


On Tuesday, Feb 18, 2003, at 22:18 US/Pacific, Lynne E. Bilston wrote:

> Justin,
>
> My dual Athlon cluster is about 10 months old (dual 1800MP processors 
> on a tyan MB). lm_sensors gives a temperature of about 49-55 degrees C 
> when running both processors on a job. Idling is about 42-45C. Yours 
> do seem a bit hot by those standards.
>
> I did initially have some problems with jobs quitting due to 
> overheating. It turned out our AC system was being switched off at 
> night. IHow warm is the room your cluster is in?
>
> Let me know if you want more info on my lm_sensors setup or output.
>
> -Lynne
>
Hi,

A couple of months ago I created a small CPU burn-in (i.e. heater :-) 
program - it should be available on the contributions page at 
www.gromacs.org.

Just for fun, I actually started writing a really tight assembly loop 
with SSE instructions, but when I installed LM-sensors according to 
Lynne's instructions I surprisingly found out that the first version 
ran colder than a normal Gromacs simulation (although it was hotter 
than any other burn-in program on the net.)

I'm pretty sure this is because the Gromacs innerloops use both the SSE 
and integer parts of the CPU (and the cache & memory), so I simply 
wrote a new version with a very small program that calls one of the 
Gromacs innerloops, tweaking the neighborlists to make it as hot as 
possible.

It probably runs 1-2 degrees hotter than normal Gromacs, but the main 
difference is that the results are compared with a "vanilla" C loop, 
and if there are any random changes during the run I print an error 
message.

I didn't find any errors when I ran this for a week on a dozen of our 
nodes, but I've heard rumors that some versions of Athlon MP have 
problems with SMP synchronization. I have NO idea whether this is true, 
but it might be worth to test

1. The burn-in program.
2. Consistency of SMP vs. non-SMP runs.
3. Different versions of LAM, and check if there really are any 
reported problems...

Cheers,

Erik







More information about the gromacs.org_gmx-users mailing list