[gmx-users] Scaling problems in 8-cores nodes with GROMACS 4.0x
lindahl at cbr.su.se
Fri Sep 4 08:59:58 CEST 2009
On Sep 3, 2009, at 4:52 AM, Daniel Adriano Silva M wrote:
> Dear Gromacs users, (all related to GROMACS ver 4.0.x)
> I am facing a very strange problem on a recently acquired supermicro 8
> XEON-cores nodes (2.5GHz quad-core/node, 4G/RAM with the four memory
> channels activated, XEON E5420, 20Gbs Infiniband Infinihost III Lx
> DDR): I had been testing these nodes with one of our most familiar
> protein model (49887 atoms: 2873 for protein and the rest for water
> into a dodecahedron cell) which I known scales almost linearly until
> 32 cores in a quad-core/node Opteron 2.4 GHz cluster.
Without going deeper into the rest of the discussion, note that these
the E5420 isn't a real quad-core, but a multi-chip-module with two
dual cores connected by Intel's old/slow front side bus.
In particular, this means all communication and memory operations have
to share the narrow bus. Since PME involves more memory IO (charge
spreading/interpolation) I'm not entirely surprised if the relative
PME scaling doesn't match the direct space scaling. I don't think I've
*ever* seen perfect scaling on these chips.
The point of separate PME nodes is mainly to improve the high end
scaling, since it reduces the number of MPI calls significantly.
However, for the same reason it can obviously lead to load imbalance
issues with fewer processors. You can always turn it off manually -
the 12-cpu limit is very much heuristic.
Finally, it will be virtually impossible to load balance effectively
over e.g. 11 CPUs in your cluster. Remember, there are at least three
different latency levels (cores on the same chip, cores on different
chips in the same node, cores on different nodes), and all processes
running on a node share the IB host adapter. Stick to multiples of 8
and try to have even sizes both for your direct space decomposition as
well as the reciprocal space grid.
More information about the gromacs.org_gmx-users