[gmx-users] GROMACS not scaling well with Core4 Quad technology CPUs
lindahl at cbr.su.se
Mon May 28 08:57:22 CEST 2007
On May 28, 2007, at 1:59 AM, Trevor Marshall wrote:
> I also have older systems which use Opteron 165 CPUs. I have run
> tests of the AMD Opteron 165 CPUs (2.18GHz) against the Intel Core2
> Duos (3GHz). Twelve concurrent AutoDock jobs on each machine show
> the Core2 duos outperforming the Opterons by a factor of two.
Yes, but are those AutoDock jobs MPI-parallel or just multiple
independent scalar jobs not communicating between the cores?
Gromacs also provides beautiful performance (close to 100% scaling)
if you run e.g. 8 independent jobs on a dual quad-core box.
> The data I posted showed inconsistencies which have nothing to do
> with memory bandwidth, and I was rather hoping for an analysis
> based upon the manner in which GROMACS mdrun distributes its
> computing tasks.
Gromacs isn't doing the distribution. That's entirely up to the MPI
library and the OS.
> I don't believe my data shows memory bandwidth-limiting effects.
> For example, three 'local' CPUs on the quad core are faster
> (6.65Gflops) than one of the Quads (5.02 Gflops) and two from the
> cluster. How does that support the memory bandwidth hypothesis?
As far as I understand you're using gigabit ethernet. Even with Gamma
that's going to be way higher latency and lower bandwidth compared to
the shared memory communication on a quad-core machine.
> I figured that it might be possible that the GAMMA MP software is
> causing overhead, but when I examined the distribution of tasks by
> GROMACS (in the log I provided) it would seem that the tasks which
> mdrun distributed to GAMMA actually were distributed well, but that
> that the manner in which CPU0 hogged most of the mdrun calculations
> might be a bottleneck. It was insight into GROMACS' mdrun
> distribution methodology which I was seeking. Is there any
> quantitative data available for me to review?
If you're interested in comparing the scaling performance of quad-
core compared to other hardware I would start with the benchmarks on
the www site.
If it's about getting the highest possible performance you could
either play with the "-load" option to grompp, or check out the CVS
development tree with full domain decomposition and dynamic load
balance implemented (warning, there could still be bugs).
More information about the gromacs.org_gmx-users