[gmx-users] Hyperthreading throughput increase

Erik Lindahl lindahl at sbc.su.se
Mon Jun 5 09:47:06 CEST 2006

Hi Matt,

On Jun 2, 2006, at 8:20 PM, mernst at tricity.wsu.edu wrote:

>> From my browsing of list archives I can only recall seeing advice  
>> that hyperthreading
> cannot offer more Gromacs performance. For all I know this remains  
> true if you're trying
> to use MPI to accelerate single calculations on hyperthreaded  
> processors. However, I
> have discovered that it may be possible to increase throughput by  
> running two
> independent jobs on a recent hyperthreaded processor, and I don't  
> recall seeing this
> mentioned on the list before.
> My typical job involves ~790 DNA atoms, a few Na+ ions, and  
> 13000-14000 water molecules.
> I use Gromacs with the Amber force field ported by the Pande group.  
> My simulation
> machines are 3.2 Ghz Pentium machines with 2 MB of cache (Pentium  
> 640, I think) and 1 GB
> RAM.
> Typical performance for one of these systems running on an unloaded  
> machine is 64.3
> hours/ns, 1.4 gflop/s. I accidentally started some pairs of  
> simulations on some of these
> machines this week and discovered that the performance of each job  
> was *not* cut in
> half. With two systems running simultaneously, each shows  
> performance of about 98.8
> hours/ns, 908 mflop/s. Running two of these jobs on each machine  
> thus appears to
> increase throughput by about 30%.
> If like me you run many independent calculations, throughput is  
> more important than
> turnaround, and you have hyperthreaded machines but have not  
> previously tried to take
> advantage of them, it may be worth testing. I suppose this issue  
> may have been covered
> on the mailing list before, but all I ever remember seeing were  
> advisements that
> hyperthreaded processors won't help performance, or even advice to  
> disable
> hyperthreading in the BIOS. A brief web search indicates that some  
> folding at home
> participants have discovered comparable throughput advantages to  
> running two client
> instances on hyperthreaded processors.

That's certainly good news. I remember trying this when  
hyperthreading first appeared, but the early implementations didn't  
make any throughput difference whatsoever.

However, it might still lead to significant problems with dual-CPU  
systems, where each CPU has hyperthreading enabled. In _theory_ the  
Linux scheduler should be able to tell logical from physical CPUs,  
but the last time I tried it (which, again, was over a year ago) it  
lead to severe load balancing problems.



More information about the gromacs.org_gmx-users mailing list