[gmx-users] Scaling/performance on Gromacs 4
manuvajpai at gmail.com
Wed Jun 6 13:19:08 CEST 2012
Apologies for reviving such an old thread. For clarifications, interlagos
and bulldozer both have a modular architecture, as mentioned earlier. Each
bulldozer module has 2 integer cores and one floating point unit shared
between the two cores. So, although you have 64 cores (counting integer
cores) reported by the os, the number of floating point units is still 32.
Moreover, each FP unit can process two threads when it is possible, but
since gromacs is so compute intensive I am guessing it is saturated by just
one. Hence you are not observing a scale-up by moving from 32 to 64
On Fri, Mar 16, 2012 at 4:24 PM, Szilárd Páll <szilard.pall at cbr.su.se>wrote:
> Hi Sara,
> The bad performance you are seeing is most probably caused by the
> combination of the new AMD "Interlagos" CPUs, compiler, operating
> system and it is very likely the the old Gromacs version also
> In practice these new CPUs don't perform as well as expected, but that
> is partly due to compilers and operating systems not having full
> support for the new architecture. However, based on the quite
> extensive benchmarking I've done, the with such a large system should
> be considerably better than what your numbers show.
> This is what you should try:
> - compile Gromacs with gcc 4.6 using the "-march=bdver1" optimization flag;
> - have at least 3.0 or preferably newer Linux kernel;
> - if you're not required to use 4.0.x, use 4.5.
> Note that you have to be careful with drawing conclusions from
> benchmarking on small number of cores with large systems; you will get
> artifacts from caching effects.
> And now a bit of fairly technical explanation, for more details ask Google
> The machine you are using has AMD Interlagos CPUs based on the
> Bulldozer micro-architecture. This is a new architecture, a departure
> from previous AMD processors and in fact quite different from most
> current CPUs. "Bulldozer cores" are not the traditional physical
> cores. In fact the hardware unit is the "module" which consists of two
> "half cores" (at least when it comes to floating point units). and
> enable a special type of multithreading called "clustered
> multithreading". This is slightly similar to the Intel cores with
> On Mon, Feb 20, 2012 at 5:12 PM, Sara Campos <srrcampos at gmail.com> wrote:
> > Dear GROMACS users
> > My group has had access to a quad processor, 64 core machine (4 x Opteron
> > 6274 @ 2.2 GHz with 16 cores)
> > and I made some performance tests, using the following specifications:
> > System size: 299787 atoms
> > Number of MD steps: 1500
> > Electrostatics treatment: PME
> > Gromacs version: 4.0.4
> > MPI: LAM
> > Command ran: mpirun -ssi rpi tcp C mdrun_mpi ...
> > #CPUS Time (s) Steps/s
> > 64 195.000 7.69
> > 32 192.000 7.81
> > 16 275.000 5.45
> > 8 381.000 3.94
> > 4 751.000 2.00
> > 2 1001.000 1.50
> > 1 2352.000 0.64
> > The scaling is not good. But the weirdest is the 64 processors performing
> > the same as 32. I see the plots from Dr. Hess on the GROMACS 4 paper on
> > and I do not understand why this is happening. Can anyone help?
> > Thanks in advance,
> > Sara
> > --
> > gmx-users mailing list gmx-users at gromacs.org
> > http://lists.gromacs.org/mailman/listinfo/gmx-users
> > Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> > Please don't post (un)subscribe requests to the list. Use the
> > www interface or send it to gmx-users-request at gromacs.org.
> > Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> gmx-users mailing list gmx-users at gromacs.org
> Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the gromacs.org_gmx-users