[gmx-users] replica exchange: >4 processors

Mark Abraham Mark.Abraham at anu.edu.au
Fri Dec 7 06:43:59 CET 2007


Paul Whitford wrote:
> I am using 3.3.2 and 3.3.1 and I get the following problem with both of 
> them.
> 
> If I run replica exchange on >4 processors (2 and 4 are fine), the 
> simulations finish, but mpi gives the following errors, thus the job 
> never terminates
> 
> 
> this is the end of my log file
> 
> -----------------------------------------------------------------------
> 
>                NODE (s)   Real (s)      (%)
>        Time: 158483.430 159636.000     99.3
>                        1d20h01:23
>                (Mnbf/s)   (MFlops)   (ns/day)  (hour/ns)
> Performance:     18.919    818.029      2.726      8.805
> p13_15442:  p4_error: Timeout in establishing connection to remote 
> process: 0
> p12_15407:  p4_error: Timeout in establishing connection to remote 
> process: 0
> Broken pipe
> p11_2364:  p4_error: Timeout in establishing connection to remote process: 0
> p9_20588:  p4_error: Timeout in establishing connection to remote 
> process: 0
> p10_2329:  p4_error: Timeout in establishing connection to remote process: 0
> Broken pipe
> Broken pipe
> Broken pipe
> Broken pipe
> p6_24137:  p4_error: Timeout in establishing connection to remote process: 0
> p7_24172:  p4_error: Timeout in establishing connection to remote process: 0
> Broken pipe
> Broken pipe

These are problems with the memory-management for MPICH. There's a good 
track record of such problems with MPICH. Try LAM instead.

Mark



More information about the gromacs.org_gmx-users mailing list