[gmx-users] Athlon cluster experience
Erik Lindahl
lindahl at stanford.edu
Wed Feb 19 07:32:03 CET 2003
On Tuesday, Feb 18, 2003, at 22:18 US/Pacific, Lynne E. Bilston wrote:
> Justin,
>
> My dual Athlon cluster is about 10 months old (dual 1800MP processors
> on a tyan MB). lm_sensors gives a temperature of about 49-55 degrees C
> when running both processors on a job. Idling is about 42-45C. Yours
> do seem a bit hot by those standards.
>
> I did initially have some problems with jobs quitting due to
> overheating. It turned out our AC system was being switched off at
> night. IHow warm is the room your cluster is in?
>
> Let me know if you want more info on my lm_sensors setup or output.
>
> -Lynne
>
Hi,
A couple of months ago I created a small CPU burn-in (i.e. heater :-)
program - it should be available on the contributions page at
www.gromacs.org.
Just for fun, I actually started writing a really tight assembly loop
with SSE instructions, but when I installed LM-sensors according to
Lynne's instructions I surprisingly found out that the first version
ran colder than a normal Gromacs simulation (although it was hotter
than any other burn-in program on the net.)
I'm pretty sure this is because the Gromacs innerloops use both the SSE
and integer parts of the CPU (and the cache & memory), so I simply
wrote a new version with a very small program that calls one of the
Gromacs innerloops, tweaking the neighborlists to make it as hot as
possible.
It probably runs 1-2 degrees hotter than normal Gromacs, but the main
difference is that the results are compared with a "vanilla" C loop,
and if there are any random changes during the run I print an error
message.
I didn't find any errors when I ran this for a week on a dozen of our
nodes, but I've heard rumors that some versions of Athlon MP have
problems with SMP synchronization. I have NO idea whether this is true,
but it might be worth to test
1. The burn-in program.
2. Consistency of SMP vs. non-SMP runs.
3. Different versions of LAM, and check if there really are any
reported problems...
Cheers,
Erik
More information about the gromacs.org_gmx-users
mailing list