Subject: Re: [gmx-users] Issue with domain decomposition between v4.5.5 and 4.6.1

Stephanie Teich-McGoldrick stephanietm at gmail.com
Mon Apr 15 21:16:02 CEST 2013


Hello Justin,

Thank you for the reply, and I am glad to hear that this is normal output.
Unfortunately, my simulations crash  almost immediately when I used v4.6,
and I was assuming it has something to do with the load balancing because
that is the last line in my md.log file.

I have run with the flag "mdrun -debug 1" and find the error:
"mdrun_mpi:13106 terminated with signal 11 at PC=2abd88a03934
SP=7fff6343f170.  Backtrace:
/apps/x86_64/mpi/openmpi/intel-12.1-2011.7.256/openmpi-1.4.3_oobpr/lib/libmpi.so.0[0x2abd88a03934]"


I know this is rather vague, but do you have any suggestions on where I
should start tracking down this error? When I use particle decomposition my
simulations run fine.

Thanks in advance!
Stephanie




Message: 3
Date: Mon, 15 Apr 2013 06:08:13 -0400
From: Justin Lemkul <jalemkul at vt.edu>
Subject: Re: [gmx-users] Issue with domain decomposition between
        v4.5.5 and      4.6.1
To: Discussion list for GROMACS users <gmx-users at gromacs.org>
Message-ID: <516BD18D.8000803 at vt.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed



On 4/14/13 11:23 PM, Stephanie Teich-McGoldrick wrote:
> Dear all,
>
> I am running a NPT simulation of 33,534 tip4P waters, and I am using
domain
> decomposition as the parallelization scheme. Previously, I had been using
> Gromacs version 4.5.5 but have recently installed and switched to Gromacs
> version 4.6.1. Using Gromacs 4.5.5 I can successfully run my water box
> using domain decomposition over many different processor numbers. However
> the same simulation returns the following error when I try Gromacs 4.6.1
>
> "The initial number of communication pulses is: X 1 Y 1 Z 1
> The initial domain decomposition cell size is: X 2.48 nm Y 2.48 nm Z 1.46
nm
>
> When dynamic load balancing gets turned on, these settings will change to:
> The maximum number of communication pulses is: X 1 Y 1 Z 1
> The minimum size for domain decomposition cells is 1.000 nm
> The requested allowed shrink of DD cells (option -dds) is: 0.80
> The allowed shrink of domain decomposition cells is: X 0.40 Y 0.40 Z 0.68
> "
> The above error occurred running over 16 nodes / 128 processors. The
system
> runs for version 4.6.1 for 1,8, and 16 processors but not for 32,64, or
128
> processors.
>
> I have tried other systems (including NVT, Berendsen/PR barostats,
> anisotropic/isotropic ) at the higher number of processors using both
> version 4.5.5 and 4.6.1 and get the same result - v4.5.5 runs fine while
> v4.6.1 returns the error type listed above.
>
> Is anyone else having a similar issue? Is there something I am not
> considering? Any help would be greatly appreciated! The details I have
used
> to compile each code are below. My log files indicate that I am indeed
> calling the correct executable at run time.
>

Based on what you've posted, I don't see any error.  All of the above is
normal
output.

-Justin

--
==============================
==========

Justin A. Lemkul, Ph.D.
Research Scientist
Department of Biochemistry
Virginia Tech
Blacksburg, VA
jalemkul[at]vt.edu | (540) 231-9080
http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin



More information about the gromacs.org_gmx-users mailing list