[gmx-users] Performance in ia64 and x86_64
Mark.Abraham at anu.edu.au
Fri Feb 11 17:30:40 CET 2011
On 12/02/2011 12:44 AM, Ignacio Fernández Galván wrote:
> ---- Original Message ----
> From: Mark Abraham<Mark.Abraham at anu.edu.au>
>> We can't say with the information given. For best performance, the number of
>> threads cannot exceed the number of physical cores available to one processor.
>> To go higher, you need to compile and use GROMACS with MPI, not threading. If
>> the IA64 is "dual core" then you are not measuring anything useful. You also
>> to be sure you're measuring for a decent length of time - a few minutes at
> It seems the x86_64 processor has 4 cores and 8 threads support
> (<http://ark.intel.com/Product.aspx?id=37104>), so the machine has probably two
> physical processors.
That's delivered via hyper-threading (see bottom of that page). GROMACS
is unlikely to get any significant value out of that, because the
number-crunching loops of the code dominate the execution time, and by
design GROMACS doesn't do many cache misses, branch mispredictions, etc.
in those loops, so the second thread doesn't have much dead time to use.
HT is good for desktop workstations that can expect to do lots of that
kind of thing. Secondarily, it probably doubles the pressure on the
cache (but MD is normally fairly cache-friendly).
> I thought MPI was only needed if there was network
> communication involved, as in a cluster, but not in SMP, which is what both
> machines are (single memory, single OS), I guess I was wrong. I'll try compiling
> with MPI.
As I understand it, useful threading has got to do with how many cores
are on the same piece of processor silicon, not whether the memory is
shared between processors. Xeon E5540 has 4 physical cores per
processor, so that's as far as GROMACS will usefully thread. The good
news is that if you really have shared memory, that does make
MPI-GROMACS almost indistinguishably fast from a threaded version
running on the theoretically equivalent hardware. I have a vaguely
similar machine, but with dual quad-core Xeon X5570 processors. MPI and
threading work indistinguishably out to 8 processes, then threading stops.
More information about the gromacs.org_gmx-users