[gmx-users] domain decomposition error

Mark Abraham Mark.Abraham at anu.edu.au
Thu Nov 18 00:23:30 CET 2010


On 18/11/2010 12:07 AM, Fabio Affinito wrote:
> Hi,
> I'm trying to run a simulation with 4.5.3 (double precision) on bluegene.

Double precision comes with a big performance penalty, particularly on 
BlueGene. Don't use it unless you know you need it.

> I get this error:
>
>
>> NOTE: Turning on dynamic load balancing
>>
>> vol 0.35! imb F 54% pme/F 2.06 step 100, remaining runtime:     3 s
>> vol 0.33! imb F 48% pme/F 2.09 step 200, remaining runtime:     2 s
>> vol 0.35! imb F 28% pme/F 2.42 step 300, remaining runtime:     1 s
>> vol 0.34! imb F 26% pme/F 2.46 step 400, remaining runtime:     0 s
>>
>> -------------------------------------------------------
>> Program mdrun_mpi_bg_d, VERSION 4.5.3
>> Source code file: domdec.c, line: 3581
>>
>> Fatal error:
>> Step 490: The X-size (0.799998) times the triclinic skew factor (1.000000) is smaller than the smallest allowed cell size (0.800000) for domain decomposition grid cell 4 2 2
>> For more information and tips for troubleshooting, please check the GROMACS
>> website at http://www.gromacs.org/Documentation/Errors
>> -------------------------------------------------------
>>
>> "Everything's formed from particles" (Van der Graaf Generator)
>>
>> Error on node 188, will try to stop all the nodes
>> Halting parallel program mdrun_mpi_bg_d on CPU 188 out of 256

I guess that your system is too small for 256 processors, or that if 
it's large enough, that it's 
www.gromacs.org/Documentation/Terminology/Blowing_Up

The nature of the interactions in your system sets a minimum domain 
decomposition cell size, of which GROMACS reports the calculation in the 
.log file before the simulation proper starts. Each processor must have 
at least one of those cells, so your system needs to be large enough 
that you can have 256 such cells. Basically, the engineering in GROMACS 
parallelism is not able to run small systems on large numbers of 
processors - which would be inefficient to use anyway. If your BlueGene 
can't offer fewer processors, then you need to change the simulation 
system or the machine.

Mark



More information about the gromacs.org_gmx-users mailing list