[gmx-developers] Re: broadcast of zero-length arrays

Mark Abraham Mark.Abraham at anu.edu.au
Wed Nov 25 04:28:30 CET 2009


Mark Abraham wrote:
> Mathias PUETZ wrote:
>> Hi,
>> 
>> on BG/L zero length broadcasts were buggy on MPI_Bcast with length
>> zero. I reported a defect to IBM BG development about a year ago
>> and it was fixed shortly thereafter. By the MPI standard null
>> pointers are legal for zero length buffers. The bug is in the BG
>> MPI argument checking, not in the actual routine that does the
>> broadcast..
>> 
>> I suggest upgrading your BG/L driver software to the latest level.
>> This should take care of it. If the problem really should persist,
>> please contact your IBM customer support and open a new defect
>> report (perhaps - I hope not - the original fix was lost in a newer
>> driver version).
> 
> Thanks - I'll look into that.

My sysadmin replies:

> unfortunately, we _are_ running the latest BlueGene MPI code.  It's
> V1R3M4, dating from July 29 2008, and it's the last major release
> there will be of the BlueGene/L code.  The inference I draw is that
> this bug might not have been fixed on the BlueGene/L  :-(   I have
> written to our IBM BG/L contact to ask about this.

So, assuming there will be no BG/L updates that incorporate a fix for 
this issue, then prior to 4.0.6/4.1, GROMACS will require suitable 
patching to run on BG/L.

Regards,

Mark

> Brian Smith of IBM suggested off-list that setting BGLMPI_BCAST=MPICH
>  would be a useful diagnostic. Setting that allowed GROMACS to
> continue past the previous crash point, indicating that the issue is
> probably in the optimized version.
> 
> Per Brian's request, here's a stack trace with the above variable not
>  set. Core was dumped on many of the 64 MPI processes.
> 
> [12:54][bgfen1-c:timing_prebugfix]$  tail -n 20 core.0 | addr2line -f
> -e 
> /hpc/home/mja163/builds/gromacs_builds/git/pre-bugfix/mpi_debug/src/kernel/mdrun
> 
> 
> ?? ??:0 ?? ??:0 ?? ??:0 ?? ??:0 ?? ??:0 BGLMP_TreeBcastPacketDispatch
>  
> /bglhome/usr6/bgbuild/V1R3M4_300_2008-080728/ppc/src/bglsw/comm/sys/msglayer/util/BGLMLVNMutil.h:135
> 
> 
> BGLML_Messager_tree_advance 
> /bglhome/usr6/bgbuild/V1R3M4_300_2008-080728/ppc/src/bglsw/comm/sys/msglayer/base/advance/BGLML_advance.h:295
> 
> 
> BGLMP_TreeBcast 
> /bglhome/usr6/bgbuild/V1R3M4_300_2008-080728/ppc/src/bglsw/comm/sys/msglayer/proto/collectives/TreeBcast/BGLMP_TreeBcast.c:161
> 
> 
> MPIDI_BGLTR_Bcast 
> /bglhome/usr6/bgbuild/V1R3M4_300_2008-080728/ppc/src/bglsw/comm/lib/mpi/mpich2/src/mpid/bgltorus5/src/coll/mpidi_bgltr/mpidi_bgltr_bcast.c:62
> 
> 
> MPIDI_Coll_Comm_Bcast_wrapper 
> /bglhome/usr6/bgbuild/V1R3M4_300_2008-080728/ppc/src/bglsw/comm/lib/mpi/mpich2/src/mpid/bgltorus5/src/coll/mpid_collectives.c:1022
> 
> 
> PMPI_Bcast 
> /bglhome/usr6/bgbuild/V1R3M4_300_2008-080728/ppc/src/bglsw/comm/lib/mpi/mpich2/src/mpi/coll/bcast.c:767
> 
> 
> gmx_bcast ../../../src/gmxlib/network.c:363 bc_grpopts 
> ../../../src/gmxlib/mvdata.c:341 bc_inputrec 
> ../../../src/gmxlib/mvdata.c:402 bcast_ir_mtop 
> ../../../src/gmxlib/mvdata.c:449 init_parallel 
> ../../../src/mdlib/init.c:166 mdrunner ../../../src/kernel/md.c:165 
> main ../../../src/kernel/mdrun.c:496 _start_blrts 
> ../sysdeps/blrts/start.c:107 ?? ??:0
> 
> That looks very much like Mathias's bugfix is required on my system.
> 
> Thanks to all for the prompt discussion.
> 
> Mark
> 
> 
>> Mit freundlichen Grüßen / Kind regards Dr. Mathias Puetz
>> 
>> Application Performance Specialist IBM Sales & Distribution, STG
>> Sales / Industries Deep Computing FTSS 
>> ------------------------------------------------------------------------------------------------------------------------------------------
>> 
>> 
>> 
>> IBM Deutschland Hechtsheimer Str. 2 55131 Mainz Phone:
>> +49-160-7120602 Mobile: +49-(0)160-7120602 E-Mail:
>> mpuetz at de.ibm.com 
>> -------------------------------------------------------------------------------------------------------------------------------------------
>> 
>> 
>> 
>> IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Erich
>> Clementi Geschäftsführung: Martin Jetter (Vorsitzender), Reinhard
>> Reschke, Christoph Grandpierre,Matthias Hartmann, Michael Diemer 
>> Sitz der Gesellschaft: Stuttgart / Registergericht: Amtsgericht 
>> Stuttgart, HRB 14562 WEEE-Reg.-Nr. DE 99369940
>> 
>> 
>> 
>> 
>> gmx-developers-re
>> 
>> quest at gromacs.org Sent by:
>> To gmx-developers-bo         gmx-developers at gromacs.org
>> 
>> unces at gromacs.org                                          cc
>> 
>> 
>> Subject              11/23/2009 12:00          gmx-developers
>> Digest, Vol 67,                   PM                        Issue 
>> 16
>> 
>> 
>> Please respond to
>>  gmx-developers at gr
>> 
>> omacs.org
>> 
>> 
>> 
>> 
>> 
>> 
>> Send gmx-developers mailing list submissions to 
>> gmx-developers at gromacs.org
>> 
>> To subscribe or unsubscribe via the World Wide Web, visit 
>> http://lists.gromacs.org/mailman/listinfo/gmx-developers or, via
>> email, send a message with subject or body 'help' to 
>> gmx-developers-request at gromacs.org
>> 
>> You can reach the person managing the list at 
>> gmx-developers-owner at gromacs.org
>> 
>> When replying, please edit your Subject line so it is more specific
>>  than "Re: Contents of gmx-developers digest..."
>> 
>> 
>> Today's Topics:
>> 
>> 1. broadcast of zero-length arrays (Mark Abraham) 2. Re: broadcast
>> of zero-length arrays (Roland Schulz) 3. Re: broadcast of
>> zero-length arrays (hess at sbc.su.se)
>> 
>> 
>> ----------------------------------------------------------------------
>> 
>> 
>> Message: 1 Date: Mon, 23 Nov 2009 14:11:32 +1100 From: Mark Abraham
>> <Mark.Abraham at anu.edu.au> Subject: [gmx-developers] broadcast of
>> zero-length arrays To: Gromacs Developers
>> <gmx-developers at gromacs.org> Message-ID:
>> <4B09FD64.1020407 at anu.edu.au> Content-Type: text/plain;
>> charset=ISO-8859-1; format=flowed
>> 
>> Hi,
>> 
>> During src/gmxlib/mvdata.c bc_grpopts(), my BlueGene/L segfaults
>> during the broadcasts of the QMMM stuff. The lines that break are
>> attempts to broadcast arrays of zero length. Adding a check for
>> non-zero length into the definition of nblock_bc fixes the problem.
>> Presumably a null pointer is being dereferenced inside the MPI
>> library.
>> 
>> I'm not sure whether this observation is indicative of (this
>> version of) IBM's MPI library not having implemented the full
>> standard, the standard not specifying behaviour in this case, or
>> GROMACS not being sufficiently defensive. I haven't found anything
>> useful in the MPI documentation I have to hand. You could argue
>> cases either way - the implementors of the library want to avoid
>> such checks to speed performance, and the users of the library
>> expect it either to take care of such housekeeping for them, or not
>> dereference pointers unnecessarily (think buffering)...
>> 
>> Does anyone know what expected behaviour is here?
>> 
>> Cheers,
>> 
>> Mark
>> 
>> 
>> ------------------------------
>> 
>> Message: 2 Date: Mon, 23 Nov 2009 00:24:30 -0500 From: Roland
>> Schulz <roland at utk.edu> Subject: Re: [gmx-developers] broadcast of
>> zero-length arrays To: Discussion list for GROMACS development 
>> <gmx-developers at gromacs.org>,             Brian Smith 
>> <smithbr at us.ibm.com> Message-ID: 
>> <c93c21390911222124n717712fkaa683b0d6a40226 at mail.gmail.com> 
>> Content-Type: text/plain; charset="iso-8859-1"
>> 
>> Hi Brian,
>> 
>> you could help before with a Segfault in the BlueGene MPI layer (in
>>  scatterv). Do you consider the below described segfault a bug in
>> the MPI layer or in Gromacs?
>> 
>> Roland
>> 
>> ---------- Forwarded message ---------- From: Mark Abraham
>> <Mark.Abraham at anu.edu.au> Date: Sun, Nov 22, 2009 at 10:11 PM 
>> Subject: [gmx-developers] broadcast of zero-length arrays To:
>> Gromacs Developers <gmx-developers at gromacs.org>
>> 
>> 
>> Hi,
>> 
>> During src/gmxlib/mvdata.c bc_grpopts(), my BlueGene/L segfaults 
>> during the broadcasts of the QMMM stuff. The lines that break are
>> attempts to broadcast arrays of zero length. Adding a check for
>> non-zero length into the definition of nblock_bc fixes the problem.
>> Presumably a null pointer is being dereferenced inside the MPI
>> library.
>> 
>> I'm not sure whether this observation is indicative of (this
>> version of) IBM's MPI library not having implemented the full
>> standard, the standard not specifying behaviour in this case, or
>> GROMACS not being sufficiently defensive. I haven't found anything
>> useful in the MPI documentation I have to hand. You could argue
>> cases either way - the implementors of the library want to avoid
>> such checks to speed performance, and the users of the library 
>> expect it either to take care of such housekeeping for them, or not
>>  dereference pointers unnecessarily (think buffering)...
>> 
>> Does anyone know what expected behaviour is here?
>> 
>> Cheers,
>> 
>> Mark -- gmx-developers mailing list gmx-developers at gromacs.org 
>> http://lists.gromacs.org/mailman/listinfo/gmx-developers Please
>> don't post (un)subscribe requests to the list. Use the www 
>> interface or send it to gmx-developers-request at gromacs.org.
>> 
>> 
>> 
>> -- ORNL/UT Center for Molecular Biophysics cmb.ornl.gov 
>> 865-241-1537, ORNL PO BOX 2008 MS6309 -------------- next part
>> -------------- An HTML attachment was scrubbed... URL: 
>> http://lists.gromacs.org/pipermail/gmx-developers/attachments/20091123/5797bd81/attachment-0001.html
>> 
>> 
>> 
>> 
>> ------------------------------
>> 
>> Message: 3 Date: Mon, 23 Nov 2009 09:20:41 +0100 (CET) From:
>> hess at sbc.su.se Subject: Re: [gmx-developers] broadcast of
>> zero-length arrays To: "Discussion list for GROMACS development" 
>> <gmx-developers at gromacs.org> Cc: Brian Smith <smithbr at us.ibm.com> 
>> Message-ID:
>> <37050.90.163.29.105.1258964441.squirrel at mail.sbc.su.se> 
>> Content-Type: text/plain;charset=iso-8859-1
>> 
>> Hi,
>> 
>> I can remember have such issues before, I think also on an IBM and
>> discussing this with somebody from IBM. I thought I had removed all
>> MPI calls with NULL pointers, but apparently this is not the case. 
>> I committed fixes for nblock_bc in mvdata for 4.0.6 and git master.
>> 
>> 
>> Berk
>> 
>>> Hi Brian,
>>> 
>>> you could help before with a Segfault in the BlueGene MPI layer
>>> (in scatterv). Do you consider the below described segfault a bug
>>> in the MPI layer or in Gromacs?
>>> 
>>> Roland
>>> 
>>> ---------- Forwarded message ---------- From: Mark Abraham
>>> <Mark.Abraham at anu.edu.au> Date: Sun, Nov 22, 2009 at 10:11 PM 
>>> Subject: [gmx-developers] broadcast of zero-length arrays To:
>>> Gromacs Developers <gmx-developers at gromacs.org>
>>> 
>>> 
>>> Hi,
>>> 
>>> During src/gmxlib/mvdata.c bc_grpopts(), my BlueGene/L segfaults
>>> during the broadcasts of the QMMM stuff. The lines that break are
>>> attempts to broadcast arrays of zero length. Adding a check for
>>> non-zero length into the definition of nblock_bc fixes the
>>> problem. Presumably a null pointer is being dereferenced inside
>>> the MPI library.
>>> 
>>> I'm not sure whether this observation is indicative of (this
>>> version of) IBM's MPI library not having implemented the full
>>> standard, the standard not specifying behaviour in this case, or
>>> GROMACS not being sufficiently defensive. I haven't found
>>> anything useful in the MPI documentation I
>> have
>>> to hand. You could argue cases either way - the implementors of
>>> the library want to avoid such checks to speed performance, and
>>> the users of the library expect it either to take care of such
>>> housekeeping for them, or not dereference pointers unnecessarily
>>> (think buffering)...
>>> 
>>> Does anyone know what expected behaviour is here?
>>> 
>>> Cheers,
>>> 
>>> Mark -- gmx-developers mailing list gmx-developers at gromacs.org 
>>> http://lists.gromacs.org/mailman/listinfo/gmx-developers Please
>>> don't post (un)subscribe requests to the list. Use the www 
>>> interface or send it to gmx-developers-request at gromacs.org.
>>> 
>>> 
>>> 
>>> -- ORNL/UT Center for Molecular Biophysics cmb.ornl.gov 
>>> 865-241-1537, ORNL PO BOX 2008 MS6309 -- gmx-developers mailing
>>> list gmx-developers at gromacs.org 
>>> http://lists.gromacs.org/mailman/listinfo/gmx-developers Please
>>> don't post (un)subscribe requests to the list. Use the www
>>> interface or send it to gmx-developers-request at gromacs.org.
>> 
>> 
>> 
>> ------------------------------
>> 
>> -- gmx-developers mailing list gmx-developers at gromacs.org 
>> http://lists.gromacs.org/mailman/listinfo/gmx-developers
>> 
>> 
>> End of gmx-developers Digest, Vol 67, Issue 16 
>> **********************************************
>> 
>> 



More information about the gromacs.org_gmx-developers mailing list