[gmx-developers] MPI_ERR_COMM on 4.5.5-patches

Berk Hess hess at kth.se
Tue Aug 28 22:41:23 CEST 2012


Hi,

I think I might have found it already.
Could you try the fix below and report back if this solved the problem?

Cheers,

Berk


index 735c0e8..e00fa6f 100644
--- a/src/mdlib/pme.c
+++ b/src/mdlib/pme.c
@@ -1814,8 +1814,11 @@ static void init_atomcomm(gmx_pme_t 
pme,pme_atomcomm_t *atc, t_commrec *cr,
      if (pme->nnodes > 1)
      {
          atc->mpi_comm = pme->mpi_comm_d[dimind];
-        MPI_Comm_size(atc->mpi_comm,&atc->nslab);
-        MPI_Comm_rank(atc->mpi_comm,&atc->nodeid);
+        if (atc->mpi_comm != MPI_COMM_NULL)
+        {
+            MPI_Comm_size(atc->mpi_comm,&atc->nslab);
+            MPI_Comm_rank(atc->mpi_comm,&atc->nodeid);
+        }
      }
      if (debug)
      {



On 08/28/2012 10:34 PM, Berk Hess wrote:
> Hi,
>
> This seems to be a bug in Gromacs.
> As this is not in a Gromacs release yet, we could resolve this without 
> a bug report.
>
> A you skilled enough that you can run this in a debugger and tell me 
> which MPI_comm_size
> call in Gromacs is causing this?
>
> Cheers,
>
> Berk
>
> On 08/28/2012 07:39 PM, Alexander Schlaich wrote:
>> Dear Gromacs team,
>>
>> I just tried to install the release-4.5.5_patches branch with 
>> --enable-mpi on our cluster (OpemMPI-1.4.2), resulting in an error 
>> when calling mdrun whith pme enabled:
>>
>> Reading file topol.tpr, VERSION 4.5.5-dev-20120810-2859895 (single 
>> precision)
>> [sheldon:22663] *** An error occurred in MPI_comm_size
>> [sheldon:22663] *** on communicator MPI_COMM_WORLD
>> [sheldon:22663] *** MPI_ERR_COMM: invalid communicator
>> [sheldon:22663] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
>>
>> This seems to be related to a recent post on the list, however I 
>> could not find a solution:
>> http://lists.gromacs.org/pipermail/gmx-users/2012-July/073316.html
>> However, the 4.5.5 release version works fine.
>>
>> Taking a closer look I found commit 
>> dcf8b67e2801f994dae56374382b9e330833de30, "changed PME MPI_Comm 
>> comparisions to MPI_COMM_NULL, fixes #931" (Berk Hess). Apparently 
>> here the communicators were changed such that the initialization 
>> fails on my system. Reverting this single commit on the head of the 
>> release-4.5.5 branch solved the issue for me.
>>
>> As I am no MPI expert I would like to know if my MPI implementation 
>> is misbehaving here, if I made a configuration mistake or if I should 
>> file a bug report?
>>
>> Thanks for your help,
>>
>> Alex
>>
>>
>>
>




More information about the gromacs.org_gmx-developers mailing list