[gmx-developers] Replica exchange deadlock

Berk Hess hessb at mpip-mainz.mpg.de
Wed Feb 4 10:19:50 CET 2009


Mark Abraham wrote:
> Roland Schulz wrote:
>> Hi,
>>
>> I think the function replica_exchange is using the wrong MPI 
>> communicator. I think it should be changed according to this diff:
>>
>> --- repl_ex.c   16 Jan 2009 13:05:34 -0000      1.22.2.1
>> +++ repl_ex.c   3 Feb 2009 22:28:09 -0000
>> @@ -567,7 +567,7 @@
>>    if (PAR(cr)) {
>>  #ifdef GMX_MPI
>>      MPI_Bcast(&bExchanged,sizeof(
>> bool),MPI_BYTE,MASTERRANK(cr),
>> -             cr->mpi_comm_mysim);
>> +             cr->mpi_comm_mygroup);
>>  #endif
>>    }
>>
>> For me this remove a deadlock I have with certain number of PME nodes.
>
> Yes that looks correct. The old form would provoke an MPI deadlock 
> because this function is called from do_md, which is only called by PP 
> nodes. mpi_comm_mysim == mpi_comm_mygroup except under certain DD 
> conditions with separate PME nodes.
>
> Mark
Indeed.
I fixed it for 4.0.4.

Berk




More information about the gromacs.org_gmx-developers mailing list