[gmx-users] Domain decomposition
Mark Abraham
mark.j.abraham at gmail.com
Tue Jul 26 12:13:02 CEST 2016
Hi,
So you know your cell dimensions, and mdrun is reporting that it can't
decompose because you have a bonded interaction that is almost the length
of the one of the cell dimensions. How big should that interaction distance
be, and what might you do about it?
Probably mdrun should be smarter about pbc and use better periodic image
handling during DD setup, but you can fix that yourself before you call
grompp.
Mark
On Tue, Jul 26, 2016 at 11:46 AM Alexander Alexander <
alexanderwien2k at gmail.com> wrote:
> Dear gromacs user,
>
> Now is more than one week that I am engaging with the fatal error due to
> domain decomposition, and I have not been succeeded yet, and it is more
> painful when I have to test different number of cpu's to see which one
> works in a cluster with a long queuing time, means being two or three days
> in the queue just to see again the fatal error in two minutes.
>
> These are the dimensions of the cell " 3.53633, 4.17674, 4.99285",
> and below is the log file of my test submitted on 2 nodes with total 128
> cores, I even reduced to 32 CPU's and even changed from "gmx_mpi mdrun" to
> "gmx mdrun", but the problem is still surviving.
>
> Please do not refer me to this link (
>
> http://www.gromacs.org/Documentation/Errors#There_is_no_domain_decomposition_for_n_nodes_that_is_compatible_with_the_given_box_and_a_minimum_cell_size_of_x_nm
> )
> as I know what is the problem but I can not solve it:
>
>
> Thanks,
>
> Regards,
> Alex
>
>
>
> Log file opened on Fri Jul 22 00:55:56 2016
> Host: node074 pid: 12281 rank ID: 0 number of ranks: 64
>
> GROMACS: gmx mdrun, VERSION 5.1.2
> Executable:
> /home/fb_chem/chemsoft/lx24-amd64/gromacs-5.1.2-mpi/bin/gmx_mpi
> Data prefix: /home/fb_chem/chemsoft/lx24-amd64/gromacs-5.1.2-mpi
> Command line:
> gmx_mpi mdrun -ntomp 1 -deffnm min1.6 -s min1.6
>
> GROMACS version: VERSION 5.1.2
> Precision: single
> Memory model: 64 bit
> MPI library: MPI
> OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 32)
> GPU support: disabled
> OpenCL support: disabled
> invsqrt routine: gmx_software_invsqrt(x)
> SIMD instructions: AVX_128_FMA
> FFT library: fftw-3.2.1
> RDTSCP usage: enabled
> C++11 compilation: disabled
> TNG support: enabled
> Tracing support: disabled
> Built on: Thu Jun 23 14:17:43 CEST 2016
> Built by: reuter at marc2-h2 [CMAKE]
> Build OS/arch: Linux 2.6.32-642.el6.x86_64 x86_64
> Build CPU vendor: AuthenticAMD
> Build CPU brand: AMD Opteron(TM) Processor 6276
> Build CPU family: 21 Model: 1 Stepping: 2
> Build CPU features: aes apic avx clfsh cmov cx8 cx16 fma4 htt lahf_lm
> misalignsse mmx msr nonstop_tsc pclmuldq pdpe1gb popcnt pse rdtscp sse2
> sse3 sse4a sse4.1 sse4.2 ssse3 xop
> C compiler: /usr/lib64/ccache/cc GNU 4.4.7
> C compiler flags: -mavx -mfma4 -mxop -Wundef -Wextra
> -Wno-missing-field-initializers -Wno-sign-compare -Wpointer-arith -Wall
> -Wno-unused -Wunused-value -Wunused-parameter -O3 -DNDEBUG
> -funroll-all-loops -Wno-array-bounds
>
> C++ compiler: /usr/lib64/ccache/c++ GNU 4.4.7
> C++ compiler flags: -mavx -mfma4 -mxop -Wundef -Wextra
> -Wno-missing-field-initializers -Wpointer-arith -Wall -Wno-unused-function
> -O3 -DNDEBUG -funroll-all-loops -Wno-array-bounds
> Boost version: 1.55.0 (internal)
>
>
> Running on 2 nodes with total 128 cores, 128 logical cores
> Cores per node: 64
> Logical cores per node: 64
> Hardware detected on host node074 (the node of MPI rank 0):
> CPU info:
> Vendor: AuthenticAMD
> Brand: AMD Opteron(TM) Processor 6276
> Family: 21 model: 1 stepping: 2
> CPU features: aes apic avx clfsh cmov cx8 cx16 fma4 htt lahf_lm
> misalignsse mmx msr nonstop_tsc pclmuldq pdpe1gb popcnt pse rdtscp sse2
> sse3 sse4a sse4.1 sse4.2 ssse3 xop
> SIMD instructions most likely to fit this hardware: AVX_128_FMA
> SIMD instructions selected at GROMACS compile time: AVX_128_FMA
> Initializing Domain Decomposition on 64 ranks
> Dynamic load balancing: off
> Will sort the charge groups at every domain (re)decomposition
> Initial maximum inter charge-group distances:
> two-body bonded interactions: 3.196 nm, LJC Pairs NB, atoms 24 28
> multi-body bonded interactions: 0.397 nm, Ryckaert-Bell., atoms 5 13
> Minimum cell size due to bonded interactions: 3.516 nm
> Maximum distance for 5 constraints, at 120 deg. angles, all-trans: 0.218 nm
> Estimated maximum distance required for P-LINCS: 0.218 nm
> Guess for relative PME load: 0.19
> Will use 48 particle-particle and 16 PME only ranks
> This is a guess, check the performance at the end of the log file
> Using 16 separate PME ranks, as guessed by mdrun
> Optimizing the DD grid for 48 cells with a minimum initial size of 3.516 nm
> The maximum allowed number of cells is: X 1 Y 1 Z 1
>
> -------------------------------------------------------
> Program gmx mdrun, VERSION 5.1.2
> Source code file: /home/alex/gromacs-5.1.2/src/gromacs/domdec/domdec.cpp,
> line: 6987
>
> Fatal error:
> There is no domain decomposition for 48 ranks that is compatible with the
> given box and a minimum cell size of 3.51565 nm
> Change the number of ranks or mdrun option -rdd
> Look in the log file for details on the domain decomposition
> For more information and tips for troubleshooting, please check the GROMACS
> website at http://www.gromacs.org/Documentation/Errors
> -------------------------------------------------------
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>
More information about the gromacs.org_gmx-users
mailing list