[gmx-users] replica exchange: >4 processors
Mark Abraham
Mark.Abraham at anu.edu.au
Fri Dec 7 06:43:59 CET 2007
Paul Whitford wrote:
> I am using 3.3.2 and 3.3.1 and I get the following problem with both of
> them.
>
> If I run replica exchange on >4 processors (2 and 4 are fine), the
> simulations finish, but mpi gives the following errors, thus the job
> never terminates
>
>
> this is the end of my log file
>
> -----------------------------------------------------------------------
>
> NODE (s) Real (s) (%)
> Time: 158483.430 159636.000 99.3
> 1d20h01:23
> (Mnbf/s) (MFlops) (ns/day) (hour/ns)
> Performance: 18.919 818.029 2.726 8.805
> p13_15442: p4_error: Timeout in establishing connection to remote
> process: 0
> p12_15407: p4_error: Timeout in establishing connection to remote
> process: 0
> Broken pipe
> p11_2364: p4_error: Timeout in establishing connection to remote process: 0
> p9_20588: p4_error: Timeout in establishing connection to remote
> process: 0
> p10_2329: p4_error: Timeout in establishing connection to remote process: 0
> Broken pipe
> Broken pipe
> Broken pipe
> Broken pipe
> p6_24137: p4_error: Timeout in establishing connection to remote process: 0
> p7_24172: p4_error: Timeout in establishing connection to remote process: 0
> Broken pipe
> Broken pipe
These are problems with the memory-management for MPICH. There's a good
track record of such problems with MPICH. Try LAM instead.
Mark
More information about the gromacs.org_gmx-users
mailing list