[gmx-users] Hyper-threading Gromacs 5.0.1
Szilárd Páll
pall.szilard at gmail.com
Thu Sep 11 14:45:51 CEST 2014
That many threads will most likely not be very efficient. If you are
running on a single node it could be the case that 1 rank with 24
OpenMP threads will still the the fastest configuration, but 48 will
be too much.
Depending on how imbalanced your system is using DD can still be
faster, so I suggest that you try I suggest that you also try a few
configurations, e.g 12 ranks 2/4 threads, 8 ranks 3/6 threads, etc.
On Thu, Sep 11, 2014 at 4:31 AM, Johnny Lu <johnny.lu128 at gmail.com> wrote:
> tried that, and the result was:
>
> Reading file npt.tpr, VERSION 5.0.1 (double precision)
> Changing nstlist from 10 to 40, rlist from 1 to 1.028
>
> The number of OpenMP threads was set by environment variable
> OMP_NUM_THREADS to 48
> Using 1 MPI thread
> Using 48 OpenMP threads
>
> WARNING: Oversubscribing the available 24 logical CPU cores with 48 threads.
> This will cause considerable performance loss!
This suggests that HT is actually off as mdrun detects only 24 hardware threads!
--
Szilárd
> -------------------------------------------------------
> Program mdrun_d, VERSION 5.0.1
> Source code file:
> /export/data1/kho/software/gromacs5.1/gromacs-5.0.1/src/gromacs/mdlib/nbnxn_search.c,
> line: 2577
>
> Fatal error:
> 48 OpenMP threads were requested. Since the non-bonded force buffer
> reduction is prohibitively slow with more than 32 threads, we do not allow
> this. Use 32 or less OpenMP threads.
> For more information and tips for troubleshooting, please check the GROMACS
> website at http://www.gromacs.org/Documentation/Errors
> -------------------------------------------------------
>
> so. i guess that is not a good idea.
>
> On Wed, Sep 10, 2014 at 9:55 PM, Johnny Lu <johnny.lu128 at gmail.com> wrote:
>
>> that 8-20% performance increase was for gromacs 4.6.5
>>
>> On Wed, Sep 10, 2014 at 9:52 PM, Johnny Lu <johnny.lu128 at gmail.com> wrote:
>>
>>> Is it a good idea to use 48 OpenMP thread, under 1 MPI thread on 24 Xeon
>>> Processors?
>>>
>>> The mail list say such practice give about 8-20% performance increase
>>>
>>> Should I try g_tune_pme when I searched for "imbalance" in the log file
>>> and found nothing (24 OMP thread under 1 MPI thread on 24 Xeon Processor)?
>>> Or is that done automatically?
>>>
>>> Does gromacs support double precision calculation on GPU if the hardware
>>> supports that?
>>>
>>> The optimize fft option is also obsolete.
>>>
>>> Thanks again.
>>>
>>
>>
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.
More information about the gromacs.org_gmx-users
mailing list