[gmx-users] Limitation on the maximum number of OpenMPI threads

Mark Abraham mark.j.abraham at gmail.com
Tue Sep 9 00:20:53 CEST 2014


Hi,

Generally speaking, in the absence of accelerators, OpenMP as used in
GROMACS 4.6/5.0 is only useful as you get down to around a few hundred
atoms per core (details vary, but since you often can't get fewer than 512
cores of BG/Q the point is often moot there), and only at fairly low OpenMP
thread counts (hence the error). BG/Q has 4 hardware threads per core, but
because of the way the processor issues instructions to them, you will
observe benefit with mdrun only if you use either 2 or 3 of them. You
should start by setting an MPI rank per core, and vary -ntomp to observe
that (probably) 2 is best. You can then increase the number of nodes=ranks
until mdrun starts complaining that it can't find a suitable domain
decomposition (because they're getting too small). Then you can get some
value from splitting an MPI rank over two cores (and thus doubling -ntomp
to try 4 or 6), etc. Under such conditions, and with a good PP-PME load
balance, I have seen mdrun (admittedly NVE, and not writing a trajectory)
continue getting faster until about 30 atoms/core, but that's an
unrealistic scenario. Your mileage will be worse in the real world.

Mark

On Mon, Sep 8, 2014 at 8:15 PM, Abhi Acharya <abhi117acharya at gmail.com>
wrote:

> Hello,
> I was trying to run a simulation on Gromacs-4.6.3 which has been compiled
> without thread MPI on a BlueGene/Q system. The configurations per node are
> as follows:
>
>  PowerPC A2, 64-bit, 1.6 GHz, 16 cores SMP, 4 threads per core
>
> For running on 8 nodes I tried:
>
> srun mdrun_mpi -ntomp 64
>
> But, this gave me an error:
>
> Program mdrun_mpi, VERSION 4.6.3
> Source code file:
> /home/staff/sheed/apps/gromacs-4.6.3/src/mdlib/nbnxn_search.c, line: 2520
>
> Fatal error:
> 64 OpenMP threads were requested. Since the non-bonded force buffer
> reduction is prohibitively slow with more than 32 threads, we do not allow
> this. Use 32 or less OpenMP threads.
>
> So, I tried using 32 and it works fine. The problem is the performance
> seems to be too low; for 1 ns run it shows an estimated time of more than a
> day. The same run on a
>
> workstation with 6 cores and 2 GPU gives a performance of 17 ns/day.
>
> I am now at loss. Any ideas what is happening ??
>
> Regards,
> Abhishek Acharya
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>


More information about the gromacs.org_gmx-users mailing list