[gmx-users] Performance in ia64 and x86_64

Mark Abraham Mark.Abraham at anu.edu.au
Fri Feb 11 17:30:40 CET 2011


On 12/02/2011 12:44 AM, Ignacio Fernández Galván wrote:
> ---- Original Message ----
> From: Mark Abraham<Mark.Abraham at anu.edu.au>
>
>> We can't say with the information given. For best performance, the number of
>> threads cannot exceed the number of physical cores available to one processor.
>> To go higher, you need to compile and use GROMACS with MPI, not threading. If
>> the IA64 is "dual core" then you are not measuring anything useful. You also
>> need
>> to be sure you're measuring for a decent length of time - a few minutes at
>> least.
>
> It seems the x86_64 processor has 4 cores and 8 threads support
> (<http://ark.intel.com/Product.aspx?id=37104>), so the machine has probably two
> physical processors.

That's delivered via hyper-threading (see bottom of that page). GROMACS 
is unlikely to get any significant value out of that, because the 
number-crunching loops of the code dominate the execution time, and by 
design GROMACS doesn't do many cache misses, branch mispredictions, etc. 
in those loops, so the second thread doesn't have much dead time to use. 
HT is good for desktop workstations that can expect to do lots of that 
kind of thing. Secondarily, it probably doubles the pressure on the 
cache (but MD is normally fairly cache-friendly).

>   I thought MPI was only needed if there was network
> communication involved, as in a cluster, but not in SMP, which is what both
> machines are (single memory, single OS), I guess I was wrong. I'll try compiling
> with MPI.

As I understand it, useful threading has got to do with how many cores 
are on the same piece of processor silicon, not whether the memory is 
shared between processors. Xeon E5540 has 4 physical cores per 
processor, so that's as far as GROMACS will usefully thread. The good 
news is that if you really have shared memory, that does make 
MPI-GROMACS almost indistinguishably fast from a threaded version 
running on the theoretically equivalent hardware. I have a vaguely 
similar machine, but with dual quad-core Xeon X5570 processors. MPI and 
threading work indistinguishably out to 8 processes, then threading stops.

Mark



More information about the gromacs.org_gmx-users mailing list