[gmx-users] Domain decomposition

Justin Lemkul jalemkul at vt.edu
Tue Jul 26 14:21:32 CEST 2016



On 7/26/16 8:17 AM, Alexander Alexander wrote:
> Hi,
>
> Thanks for your response.
> I do not know which two atoms has bonded interaction comparable with the
> cell size, however, based on this line in log file "two-body bonded
> interactions: 3.196 nm, LJC Pairs NB, atoms 24 28", I though the 24 and 28
> are the couple whom their coordination are as below:
>
> 1ARG   HH22   24   0.946   1.497   4.341
> 2CL      CL       28   1.903   0.147   0.492
>
> Indeed their geometrical distance is too big but it is normal I think. I
> manually changed the coordination of CL atom to bring it closer to the
> other one hoping solve the problem, and test it again, but, the problem is
> still here.
>

You'll need to provide a full .mdp file for anyone to be able to tell anything. 
It looks like you're doing a free energy calculation, based on the numbers in 
LJC, and depending on the settings, free energy calculations may involve very 
long bonded interactions that make it difficult (or even impossible) to use DD, 
in which case you must use mdrun -ntmpi 1 to disable DD and rely only on OpenMP.

> Here also says "minimum initial size of 3.516 nm", but all of my cell size
> are higher than this as well.
>

"Cell size" refers to a DD cell, not the box vectors of your system.  Note that 
your system is nearly the same size as your limiting interactions, which may 
suggest that your box is too small to avoid periodicity problems, but that's an 
entirely separate issue.

-Justin

> ?
>
> Thanks,
> Regards,
> Alex
>
> On Tue, Jul 26, 2016 at 12:12 PM, Mark Abraham <mark.j.abraham at gmail.com>
> wrote:
>
>> Hi,
>>
>> So you know your cell dimensions, and mdrun is reporting that it can't
>> decompose because you have a bonded interaction that is almost the length
>> of the one of the cell dimensions. How big should that interaction distance
>> be, and what might you do about it?
>>
>> Probably mdrun should be smarter about pbc and use better periodic image
>> handling during DD setup, but you can fix that yourself before you call
>> grompp.
>>
>> Mark
>>
>>
>> On Tue, Jul 26, 2016 at 11:46 AM Alexander Alexander <
>> alexanderwien2k at gmail.com> wrote:
>>
>>> Dear gromacs user,
>>>
>>> Now is more than one week that I am engaging with the fatal error due to
>>> domain decomposition, and I have not been succeeded yet, and it is more
>>> painful when I have to test different number of cpu's to see which one
>>> works in a cluster with a long queuing time, means being two or three
>> days
>>> in the queue just to see again the fatal error in two minutes.
>>>
>>> These are the dimensions of the cell " 3.53633,   4.17674,   4.99285",
>>> and below is the log file of my test submitted on 2 nodes with total 128
>>> cores, I even reduced to 32 CPU's and even changed from "gmx_mpi mdrun"
>> to
>>> "gmx mdrun", but the problem is still surviving.
>>>
>>> Please do not refer me to this link (
>>>
>>>
>> http://www.gromacs.org/Documentation/Errors#There_is_no_domain_decomposition_for_n_nodes_that_is_compatible_with_the_given_box_and_a_minimum_cell_size_of_x_nm
>>> )
>>> as I know what is the problem but I can not solve it:
>>>
>>>
>>> Thanks,
>>>
>>> Regards,
>>> Alex
>>>
>>>
>>>
>>> Log file opened on Fri Jul 22 00:55:56 2016
>>> Host: node074  pid: 12281  rank ID: 0  number of ranks:  64
>>>
>>> GROMACS:      gmx mdrun, VERSION 5.1.2
>>> Executable:
>>> /home/fb_chem/chemsoft/lx24-amd64/gromacs-5.1.2-mpi/bin/gmx_mpi
>>> Data prefix:  /home/fb_chem/chemsoft/lx24-amd64/gromacs-5.1.2-mpi
>>> Command line:
>>>   gmx_mpi mdrun -ntomp 1 -deffnm min1.6 -s min1.6
>>>
>>> GROMACS version:    VERSION 5.1.2
>>> Precision:          single
>>> Memory model:       64 bit
>>> MPI library:        MPI
>>> OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 32)
>>> GPU support:        disabled
>>> OpenCL support:     disabled
>>> invsqrt routine:    gmx_software_invsqrt(x)
>>> SIMD instructions:  AVX_128_FMA
>>> FFT library:        fftw-3.2.1
>>> RDTSCP usage:       enabled
>>> C++11 compilation:  disabled
>>> TNG support:        enabled
>>> Tracing support:    disabled
>>> Built on:           Thu Jun 23 14:17:43 CEST 2016
>>> Built by:           reuter at marc2-h2 [CMAKE]
>>> Build OS/arch:      Linux 2.6.32-642.el6.x86_64 x86_64
>>> Build CPU vendor:   AuthenticAMD
>>> Build CPU brand:    AMD Opteron(TM) Processor 6276
>>> Build CPU family:   21   Model: 1   Stepping: 2
>>> Build CPU features: aes apic avx clfsh cmov cx8 cx16 fma4 htt lahf_lm
>>> misalignsse mmx msr nonstop_tsc pclmuldq pdpe1gb popcnt pse rdtscp sse2
>>> sse3 sse4a sse4.1 sse4.2 ssse3 xop
>>> C compiler:         /usr/lib64/ccache/cc GNU 4.4.7
>>> C compiler flags:    -mavx -mfma4 -mxop    -Wundef -Wextra
>>> -Wno-missing-field-initializers -Wno-sign-compare -Wpointer-arith -Wall
>>> -Wno-unused -Wunused-value -Wunused-parameter  -O3 -DNDEBUG
>>> -funroll-all-loops  -Wno-array-bounds
>>>
>>> C++ compiler:       /usr/lib64/ccache/c++ GNU 4.4.7
>>> C++ compiler flags:  -mavx -mfma4 -mxop    -Wundef -Wextra
>>> -Wno-missing-field-initializers -Wpointer-arith -Wall
>> -Wno-unused-function
>>> -O3 -DNDEBUG -funroll-all-loops  -Wno-array-bounds
>>> Boost version:      1.55.0 (internal)
>>>
>>>
>>> Running on 2 nodes with total 128 cores, 128 logical cores
>>>   Cores per node:           64
>>>   Logical cores per node:   64
>>> Hardware detected on host node074 (the node of MPI rank 0):
>>>   CPU info:
>>>     Vendor: AuthenticAMD
>>>     Brand:  AMD Opteron(TM) Processor 6276
>>>     Family: 21  model:  1  stepping:  2
>>>     CPU features: aes apic avx clfsh cmov cx8 cx16 fma4 htt lahf_lm
>>> misalignsse mmx msr nonstop_tsc pclmuldq pdpe1gb popcnt pse rdtscp sse2
>>> sse3 sse4a sse4.1 sse4.2 ssse3 xop
>>>     SIMD instructions most likely to fit this hardware: AVX_128_FMA
>>>     SIMD instructions selected at GROMACS compile time: AVX_128_FMA
>>> Initializing Domain Decomposition on 64 ranks
>>> Dynamic load balancing: off
>>> Will sort the charge groups at every domain (re)decomposition
>>> Initial maximum inter charge-group distances:
>>>     two-body bonded interactions: 3.196 nm, LJC Pairs NB, atoms 24 28
>>>   multi-body bonded interactions: 0.397 nm, Ryckaert-Bell., atoms 5 13
>>> Minimum cell size due to bonded interactions: 3.516 nm
>>> Maximum distance for 5 constraints, at 120 deg. angles, all-trans: 0.218
>> nm
>>> Estimated maximum distance required for P-LINCS: 0.218 nm
>>> Guess for relative PME load: 0.19
>>> Will use 48 particle-particle and 16 PME only ranks
>>> This is a guess, check the performance at the end of the log file
>>> Using 16 separate PME ranks, as guessed by mdrun
>>> Optimizing the DD grid for 48 cells with a minimum initial size of 3.516
>> nm
>>> The maximum allowed number of cells is: X 1 Y 1 Z 1
>>>
>>> -------------------------------------------------------
>>> Program gmx mdrun, VERSION 5.1.2
>>> Source code file: /home/alex/gromacs-5.1.2/src/gromacs/domdec/domdec.cpp,
>>> line: 6987
>>>
>>> Fatal error:
>>> There is no domain decomposition for 48 ranks that is compatible with the
>>> given box and a minimum cell size of 3.51565 nm
>>> Change the number of ranks or mdrun option -rdd
>>> Look in the log file for details on the domain decomposition
>>> For more information and tips for troubleshooting, please check the
>> GROMACS
>>> website at http://www.gromacs.org/Documentation/Errors
>>> -------------------------------------------------------
>>> --
>>> Gromacs Users mailing list
>>>
>>> * Please search the archive at
>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>> posting!
>>>
>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>
>>> * For (un)subscribe requests visit
>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>> send a mail to gmx-users-request at gromacs.org.
>>>
>> --
>> Gromacs Users mailing list
>>
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> send a mail to gmx-users-request at gromacs.org.
>>

-- 
==================================================

Justin A. Lemkul, Ph.D.
Ruth L. Kirschstein NRSA Postdoctoral Fellow

Department of Pharmaceutical Sciences
School of Pharmacy
Health Sciences Facility II, Room 629
University of Maryland, Baltimore
20 Penn St.
Baltimore, MD 21201

jalemkul at outerbanks.umaryland.edu | (410) 706-7441
http://mackerell.umaryland.edu/~jalemkul

==================================================


More information about the gromacs.org_gmx-users mailing list