[gmx-developers] gromacs @ AMD Interlagos

Tue May 1 23:08:58 CEST 2012

Running on an Optereon 6274 (Interlagos) and 6176 (Magny Cours) with
using the *whole* processor (16 and 12 threads respectively) using
thread-MPI:
- group scheme kernels: almost no difference
- verlet scheme kernels: the Interlagos is 18% faster with PME (onyl
~3% with reaction-field).

--
Szilárd


On Tue, May 1, 2012 at 9:59 AM, David van der Spoel
<spoel at xray.bmc.uu.se> wrote:
> On 2012-05-01 10:44, Carsten Kutzner wrote:
>>
>> On May 1, 2012, at 10:26 AM, David van der Spoel wrote:
>>
>>> On 2012-05-01 09:19, David van der Spoel wrote:
>>>>
>>>> Hi,
>>>>
>>>> we have done some very preliminary benchmarks on AMD Interlagos nodes, 4
>>>> CPUs with 12 cores each. After hearing a report that in fact on these
>>>> machines 2 cores share an FPU and a SSE unit, a colleague tested a
>>>> simple protein in water:
>>>>
>>>> ... after a short 1 ns test run, the 24 (2x12) core solution
>>>> outperformed the 48 core one (16.3 ns/day vs. 1.4 ns/day). Restraint on
>>>> protein heavy atoms, NVT, waters mobile, ca. 38000 atoms.
>>>>
>>>> Can anyone confirm this result? It seems excessive to me.
>>>>
>>> What this would imply, if confirmed, is that we would need to allocate
>>> only half the number of threads on this architecture, compared to the number
>>> of cores.
>>
>> Hi David,
>>
>> we get a performance on our Interlagos nodes that is comparable to
>> the Magny-Cours. I would guess there is something wrong in
>> your case. Is the kernel recent enough?
>> For a 80,000 atom test system with PME, we get about 15 ns/day both
>> on 48 Magny-Cours cores as well as on 48 Interlagos cores.
>> I have not tries with 24 Interlagos cores yet, but maybe it's
>> faster :)
>>
> Would be great if you could try it.
>
> If it is the case that two core share one FPU/SSE unit than we have two
> threads competing for the FPU/SSE unit and that would definitely slow things
> down. On the other hand having double the amount of threads should give some
> speed up in other parts such as neighbor searching, such that the final
> result with 24or 48 threads should be comparable. It seems that this is not
> the best architecture for FP intensive codes if all this holds.
>
>
> --
> David van der Spoel, Ph.D., Professor of Biology
> Dept. of Cell & Molec. Biol., Uppsala University.
> Box 596, 75124 Uppsala, Sweden. Phone:  +46184714205.
> spoel at xray.bmc.uu.se    http://folding.bmc.uu.se
> --
> gmx-developers mailing list
> gmx-developers at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-developers
> Please don't post (un)subscribe requests to the list. Use the www interface
> or send it to gmx-developers-request at gromacs.org.