[gmx-users] Hyper-threading Gromacs 5.0.1

Szilárd Páll pall.szilard at gmail.com
Wed Sep 17 10:44:24 CEST 2014


On Thu, Sep 11, 2014 at 2:40 PM, Mark Abraham <mark.j.abraham at gmail.com> wrote:
> Hi,
>
> Hyper-threading is generally not useful with applications that are compute-
> or network-bound, as GROMACS is. You should expect to see maximum
> performance when using one real thread per x86 core (and you should find
> out how many cores really exist, and not infer it from something else). You
> should start with an MPI  rank per core (thus one thread per rank), and
> consider reducing the number of ranks by having more OpenMP threads per
> rank - but this is generally only useful for non-GPU runs when running on a
> lot of Intel x86 hardware.
>
> On Thu, Sep 11, 2014 at 3:52 AM, Johnny Lu <johnny.lu128 at gmail.com> wrote:
>
>> Is it a good idea to use 48 OpenMP thread, under 1 MPI thread on 24 Xeon
>> Processors?
>>
>
> No.
>
>
>> The mail list say such practice give about 8-20% performance increase
>>
>
> If it did, that might have been in context of managing the work done on the
> CPU while using a GPU, which is not what you are doing. But without a link,
> the reference is useless...

I guess it's time for providing a reference as this has come up a few
times before.

System: rnase dodecahedron box, 16.7k atoms (same setup as linked
here: http://www.gromacs.org/GPU_acceleration)
Settings:  rcut=1.0 nm, PME (ewald_rtol=1e-5, fourier_spacing=0.125)
Legend: {Nranks}x{Ncores}: perf / perf_w-HT where the former is the
performance with -ntomp == Ncores, and for the latter with --ntomp =
2*Ncores to make use of HT.

Hardware: Core i7 3930K, HT on.
1x6: 42.1 / 46.1
2x3: 36.5 / 39.8
3x2: 37.5 / 40.1
6x1: 36.6 / 38.7
12x1: 39
=> with HT 6-10% faster.

Hardware: Core i7 4770K, HT on.
1x4: 36.2 / 40.2
2x2: 32.4 / 36.1
4x1: 32.5 / 35.2
=> with HT 8-11% faster

Note that things change at higher parallelization and HT become
increasingly less useful, but on a single node very often (f not in
most cases) it improves performance.

Cheers,
--
Szilárd

> Should I try g_tune_pme when I searched for "imbalance" in the log file and
>> found nothing (24 OMP thread under 1 MPI thread on 24 Xeon Processor)? Or
>> is that done automatically?
>>
>
> You're not using more than one rank, so there's not really any load
> imbalance to tune - it's just bad.
>
> Does gromacs support double precision calculation on GPU if the hardware
>> supports that?
>>
>
> No.
>
>
>> The optimize fft option is also obsolete.
>>
>
> Yes, it got removed before 5.0, but there were a few things left in the
> docs which I have now removed. Thanks.
>
> Mark
>
>
>> Thanks again.
>> --
>> Gromacs Users mailing list
>>
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> send a mail to gmx-users-request at gromacs.org.
>>
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.


More information about the gromacs.org_gmx-users mailing list