[gmx-developers] MPI_ERR_COMM on 4.5.5-patches

Alexander Schlaich alexander.schlaich at fu-berlin.de
Wed Aug 29 12:46:58 CEST 2012


Just an addition: 
I just realized that only the case of running the MPI version on a single core seems affected. So this would correspond to invalid communicator in my previous mail.

Am 29.08.2012 um 12:17 schrieb Alexander Schlaich:

> Hi Berk,
> 
> your patch didn't fix the problem.
> Following the program execution with a debugger, I found the MPI error
> is thrown at src/mdlib/fft5d.c, line 196
> 
> fft5d_plan fft5d_plan_3d(int NG, int MG, int KG, MPI_Comm comm[2], int
> flags, t_complex** rlin, t_complex** rlout)
> 
> 192     /* comm, prank and P are in the order of the decomposition
> (plan->cart is in the order of transposes) */
> 193 #ifdef GMX_MPI
> 194     if (GMX_PARALLEL_ENV_INITIALIZED && comm[0] != MPI_COMM_NULL)
> 195     {
> 196  ->     MPI_Comm_size(comm[0],&P[0]);
> 197         MPI_Comm_rank(comm[0],&prank[0]);
> 198     }
> 199     else
> 
> It seems to me like the symbol MPI_COMM_NULL is not initialized (at
> least it is not zero like I expected). Adding a #define MPI_COMM_NULL 0
> (see the original discussion #931) solves all the problems, though I
> know this is not the solution.
> I think the MPI_COMM_NULL should be included by #include<mpi.h> and its
> type and value be handled by the MPI implementation? So I don't have an
> obvious fix...
> 
> 
> Thanks for your help,
> 
> Alex
> 
> 
> 
> Am Dienstag, den 28.08.2012, 22:41 +0200 schrieb Berk Hess:
>> Hi,
>> 
>> I think I might have found it already.
>> Could you try the fix below and report back if this solved the problem?
>> 
>> Cheers,
>> 
>> Berk
>> 
>> 
>> index 735c0e8..e00fa6f 100644
>> --- a/src/mdlib/pme.c
>> +++ b/src/mdlib/pme.c
>> @@ -1814,8 +1814,11 @@ static void init_atomcomm(gmx_pme_t 
>> pme,pme_atomcomm_t *atc, t_commrec *cr,
>>      if (pme->nnodes > 1)
>>      {
>>          atc->mpi_comm = pme->mpi_comm_d[dimind];
>> -        MPI_Comm_size(atc->mpi_comm,&atc->nslab);
>> -        MPI_Comm_rank(atc->mpi_comm,&atc->nodeid);
>> +        if (atc->mpi_comm != MPI_COMM_NULL)
>> +        {
>> +            MPI_Comm_size(atc->mpi_comm,&atc->nslab);
>> +            MPI_Comm_rank(atc->mpi_comm,&atc->nodeid);
>> +        }
>>      }
>>      if (debug)
>>      {
>> 
>> 
>> 
>> On 08/28/2012 10:34 PM, Berk Hess wrote:
>>> Hi,
>>> 
>>> This seems to be a bug in Gromacs.
>>> As this is not in a Gromacs release yet, we could resolve this without 
>>> a bug report.
>>> 
>>> A you skilled enough that you can run this in a debugger and tell me 
>>> which MPI_comm_size
>>> call in Gromacs is causing this?
>>> 
>>> Cheers,
>>> 
>>> Berk
>>> 
>>> On 08/28/2012 07:39 PM, Alexander Schlaich wrote:
>>>> Dear Gromacs team,
>>>> 
>>>> I just tried to install the release-4.5.5_patches branch with 
>>>> --enable-mpi on our cluster (OpemMPI-1.4.2), resulting in an error 
>>>> when calling mdrun whith pme enabled:
>>>> 
>>>> Reading file topol.tpr, VERSION 4.5.5-dev-20120810-2859895 (single 
>>>> precision)
>>>> [sheldon:22663] *** An error occurred in MPI_comm_size
>>>> [sheldon:22663] *** on communicator MPI_COMM_WORLD
>>>> [sheldon:22663] *** MPI_ERR_COMM: invalid communicator
>>>> [sheldon:22663] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
>>>> 
>>>> This seems to be related to a recent post on the list, however I 
>>>> could not find a solution:
>>>> http://lists.gromacs.org/pipermail/gmx-users/2012-July/073316.html
>>>> However, the 4.5.5 release version works fine.
>>>> 
>>>> Taking a closer look I found commit 
>>>> dcf8b67e2801f994dae56374382b9e330833de30, "changed PME MPI_Comm 
>>>> comparisions to MPI_COMM_NULL, fixes #931" (Berk Hess). Apparently 
>>>> here the communicators were changed such that the initialization 
>>>> fails on my system. Reverting this single commit on the head of the 
>>>> release-4.5.5 branch solved the issue for me.
>>>> 
>>>> As I am no MPI expert I would like to know if my MPI implementation 
>>>> is misbehaving here, if I made a configuration mistake or if I should 
>>>> file a bug report?
>>>> 
>>>> Thanks for your help,
>>>> 
>>>> Alex
> 
> 
> -- 
> gmx-developers mailing list
> gmx-developers at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-developers
> Please don't post (un)subscribe requests to the list. Use the 
> www interface or send it to gmx-developers-request at gromacs.org.




More information about the gromacs.org_gmx-developers mailing list