[gmx-developers] Re: broadcast of zero-length arrays
Mark Abraham
Mark.Abraham at anu.edu.au
Wed Nov 25 04:28:30 CET 2009
Mark Abraham wrote:
> Mathias PUETZ wrote:
>> Hi,
>>
>> on BG/L zero length broadcasts were buggy on MPI_Bcast with length
>> zero. I reported a defect to IBM BG development about a year ago
>> and it was fixed shortly thereafter. By the MPI standard null
>> pointers are legal for zero length buffers. The bug is in the BG
>> MPI argument checking, not in the actual routine that does the
>> broadcast..
>>
>> I suggest upgrading your BG/L driver software to the latest level.
>> This should take care of it. If the problem really should persist,
>> please contact your IBM customer support and open a new defect
>> report (perhaps - I hope not - the original fix was lost in a newer
>> driver version).
>
> Thanks - I'll look into that.
My sysadmin replies:
> unfortunately, we _are_ running the latest BlueGene MPI code. It's
> V1R3M4, dating from July 29 2008, and it's the last major release
> there will be of the BlueGene/L code. The inference I draw is that
> this bug might not have been fixed on the BlueGene/L :-( I have
> written to our IBM BG/L contact to ask about this.
So, assuming there will be no BG/L updates that incorporate a fix for
this issue, then prior to 4.0.6/4.1, GROMACS will require suitable
patching to run on BG/L.
Regards,
Mark
> Brian Smith of IBM suggested off-list that setting BGLMPI_BCAST=MPICH
> would be a useful diagnostic. Setting that allowed GROMACS to
> continue past the previous crash point, indicating that the issue is
> probably in the optimized version.
>
> Per Brian's request, here's a stack trace with the above variable not
> set. Core was dumped on many of the 64 MPI processes.
>
> [12:54][bgfen1-c:timing_prebugfix]$ tail -n 20 core.0 | addr2line -f
> -e
> /hpc/home/mja163/builds/gromacs_builds/git/pre-bugfix/mpi_debug/src/kernel/mdrun
>
>
> ?? ??:0 ?? ??:0 ?? ??:0 ?? ??:0 ?? ??:0 BGLMP_TreeBcastPacketDispatch
>
> /bglhome/usr6/bgbuild/V1R3M4_300_2008-080728/ppc/src/bglsw/comm/sys/msglayer/util/BGLMLVNMutil.h:135
>
>
> BGLML_Messager_tree_advance
> /bglhome/usr6/bgbuild/V1R3M4_300_2008-080728/ppc/src/bglsw/comm/sys/msglayer/base/advance/BGLML_advance.h:295
>
>
> BGLMP_TreeBcast
> /bglhome/usr6/bgbuild/V1R3M4_300_2008-080728/ppc/src/bglsw/comm/sys/msglayer/proto/collectives/TreeBcast/BGLMP_TreeBcast.c:161
>
>
> MPIDI_BGLTR_Bcast
> /bglhome/usr6/bgbuild/V1R3M4_300_2008-080728/ppc/src/bglsw/comm/lib/mpi/mpich2/src/mpid/bgltorus5/src/coll/mpidi_bgltr/mpidi_bgltr_bcast.c:62
>
>
> MPIDI_Coll_Comm_Bcast_wrapper
> /bglhome/usr6/bgbuild/V1R3M4_300_2008-080728/ppc/src/bglsw/comm/lib/mpi/mpich2/src/mpid/bgltorus5/src/coll/mpid_collectives.c:1022
>
>
> PMPI_Bcast
> /bglhome/usr6/bgbuild/V1R3M4_300_2008-080728/ppc/src/bglsw/comm/lib/mpi/mpich2/src/mpi/coll/bcast.c:767
>
>
> gmx_bcast ../../../src/gmxlib/network.c:363 bc_grpopts
> ../../../src/gmxlib/mvdata.c:341 bc_inputrec
> ../../../src/gmxlib/mvdata.c:402 bcast_ir_mtop
> ../../../src/gmxlib/mvdata.c:449 init_parallel
> ../../../src/mdlib/init.c:166 mdrunner ../../../src/kernel/md.c:165
> main ../../../src/kernel/mdrun.c:496 _start_blrts
> ../sysdeps/blrts/start.c:107 ?? ??:0
>
> That looks very much like Mathias's bugfix is required on my system.
>
> Thanks to all for the prompt discussion.
>
> Mark
>
>
>> Mit freundlichen Grüßen / Kind regards Dr. Mathias Puetz
>>
>> Application Performance Specialist IBM Sales & Distribution, STG
>> Sales / Industries Deep Computing FTSS
>> ------------------------------------------------------------------------------------------------------------------------------------------
>>
>>
>>
>> IBM Deutschland Hechtsheimer Str. 2 55131 Mainz Phone:
>> +49-160-7120602 Mobile: +49-(0)160-7120602 E-Mail:
>> mpuetz at de.ibm.com
>> -------------------------------------------------------------------------------------------------------------------------------------------
>>
>>
>>
>> IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Erich
>> Clementi Geschäftsführung: Martin Jetter (Vorsitzender), Reinhard
>> Reschke, Christoph Grandpierre,Matthias Hartmann, Michael Diemer
>> Sitz der Gesellschaft: Stuttgart / Registergericht: Amtsgericht
>> Stuttgart, HRB 14562 WEEE-Reg.-Nr. DE 99369940
>>
>>
>>
>>
>> gmx-developers-re
>>
>> quest at gromacs.org Sent by:
>> To gmx-developers-bo gmx-developers at gromacs.org
>>
>> unces at gromacs.org cc
>>
>>
>> Subject 11/23/2009 12:00 gmx-developers
>> Digest, Vol 67, PM Issue
>> 16
>>
>>
>> Please respond to
>> gmx-developers at gr
>>
>> omacs.org
>>
>>
>>
>>
>>
>>
>> Send gmx-developers mailing list submissions to
>> gmx-developers at gromacs.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>> http://lists.gromacs.org/mailman/listinfo/gmx-developers or, via
>> email, send a message with subject or body 'help' to
>> gmx-developers-request at gromacs.org
>>
>> You can reach the person managing the list at
>> gmx-developers-owner at gromacs.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of gmx-developers digest..."
>>
>>
>> Today's Topics:
>>
>> 1. broadcast of zero-length arrays (Mark Abraham) 2. Re: broadcast
>> of zero-length arrays (Roland Schulz) 3. Re: broadcast of
>> zero-length arrays (hess at sbc.su.se)
>>
>>
>> ----------------------------------------------------------------------
>>
>>
>> Message: 1 Date: Mon, 23 Nov 2009 14:11:32 +1100 From: Mark Abraham
>> <Mark.Abraham at anu.edu.au> Subject: [gmx-developers] broadcast of
>> zero-length arrays To: Gromacs Developers
>> <gmx-developers at gromacs.org> Message-ID:
>> <4B09FD64.1020407 at anu.edu.au> Content-Type: text/plain;
>> charset=ISO-8859-1; format=flowed
>>
>> Hi,
>>
>> During src/gmxlib/mvdata.c bc_grpopts(), my BlueGene/L segfaults
>> during the broadcasts of the QMMM stuff. The lines that break are
>> attempts to broadcast arrays of zero length. Adding a check for
>> non-zero length into the definition of nblock_bc fixes the problem.
>> Presumably a null pointer is being dereferenced inside the MPI
>> library.
>>
>> I'm not sure whether this observation is indicative of (this
>> version of) IBM's MPI library not having implemented the full
>> standard, the standard not specifying behaviour in this case, or
>> GROMACS not being sufficiently defensive. I haven't found anything
>> useful in the MPI documentation I have to hand. You could argue
>> cases either way - the implementors of the library want to avoid
>> such checks to speed performance, and the users of the library
>> expect it either to take care of such housekeeping for them, or not
>> dereference pointers unnecessarily (think buffering)...
>>
>> Does anyone know what expected behaviour is here?
>>
>> Cheers,
>>
>> Mark
>>
>>
>> ------------------------------
>>
>> Message: 2 Date: Mon, 23 Nov 2009 00:24:30 -0500 From: Roland
>> Schulz <roland at utk.edu> Subject: Re: [gmx-developers] broadcast of
>> zero-length arrays To: Discussion list for GROMACS development
>> <gmx-developers at gromacs.org>, Brian Smith
>> <smithbr at us.ibm.com> Message-ID:
>> <c93c21390911222124n717712fkaa683b0d6a40226 at mail.gmail.com>
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>> Hi Brian,
>>
>> you could help before with a Segfault in the BlueGene MPI layer (in
>> scatterv). Do you consider the below described segfault a bug in
>> the MPI layer or in Gromacs?
>>
>> Roland
>>
>> ---------- Forwarded message ---------- From: Mark Abraham
>> <Mark.Abraham at anu.edu.au> Date: Sun, Nov 22, 2009 at 10:11 PM
>> Subject: [gmx-developers] broadcast of zero-length arrays To:
>> Gromacs Developers <gmx-developers at gromacs.org>
>>
>>
>> Hi,
>>
>> During src/gmxlib/mvdata.c bc_grpopts(), my BlueGene/L segfaults
>> during the broadcasts of the QMMM stuff. The lines that break are
>> attempts to broadcast arrays of zero length. Adding a check for
>> non-zero length into the definition of nblock_bc fixes the problem.
>> Presumably a null pointer is being dereferenced inside the MPI
>> library.
>>
>> I'm not sure whether this observation is indicative of (this
>> version of) IBM's MPI library not having implemented the full
>> standard, the standard not specifying behaviour in this case, or
>> GROMACS not being sufficiently defensive. I haven't found anything
>> useful in the MPI documentation I have to hand. You could argue
>> cases either way - the implementors of the library want to avoid
>> such checks to speed performance, and the users of the library
>> expect it either to take care of such housekeeping for them, or not
>> dereference pointers unnecessarily (think buffering)...
>>
>> Does anyone know what expected behaviour is here?
>>
>> Cheers,
>>
>> Mark -- gmx-developers mailing list gmx-developers at gromacs.org
>> http://lists.gromacs.org/mailman/listinfo/gmx-developers Please
>> don't post (un)subscribe requests to the list. Use the www
>> interface or send it to gmx-developers-request at gromacs.org.
>>
>>
>>
>> -- ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
>> 865-241-1537, ORNL PO BOX 2008 MS6309 -------------- next part
>> -------------- An HTML attachment was scrubbed... URL:
>> http://lists.gromacs.org/pipermail/gmx-developers/attachments/20091123/5797bd81/attachment-0001.html
>>
>>
>>
>>
>> ------------------------------
>>
>> Message: 3 Date: Mon, 23 Nov 2009 09:20:41 +0100 (CET) From:
>> hess at sbc.su.se Subject: Re: [gmx-developers] broadcast of
>> zero-length arrays To: "Discussion list for GROMACS development"
>> <gmx-developers at gromacs.org> Cc: Brian Smith <smithbr at us.ibm.com>
>> Message-ID:
>> <37050.90.163.29.105.1258964441.squirrel at mail.sbc.su.se>
>> Content-Type: text/plain;charset=iso-8859-1
>>
>> Hi,
>>
>> I can remember have such issues before, I think also on an IBM and
>> discussing this with somebody from IBM. I thought I had removed all
>> MPI calls with NULL pointers, but apparently this is not the case.
>> I committed fixes for nblock_bc in mvdata for 4.0.6 and git master.
>>
>>
>> Berk
>>
>>> Hi Brian,
>>>
>>> you could help before with a Segfault in the BlueGene MPI layer
>>> (in scatterv). Do you consider the below described segfault a bug
>>> in the MPI layer or in Gromacs?
>>>
>>> Roland
>>>
>>> ---------- Forwarded message ---------- From: Mark Abraham
>>> <Mark.Abraham at anu.edu.au> Date: Sun, Nov 22, 2009 at 10:11 PM
>>> Subject: [gmx-developers] broadcast of zero-length arrays To:
>>> Gromacs Developers <gmx-developers at gromacs.org>
>>>
>>>
>>> Hi,
>>>
>>> During src/gmxlib/mvdata.c bc_grpopts(), my BlueGene/L segfaults
>>> during the broadcasts of the QMMM stuff. The lines that break are
>>> attempts to broadcast arrays of zero length. Adding a check for
>>> non-zero length into the definition of nblock_bc fixes the
>>> problem. Presumably a null pointer is being dereferenced inside
>>> the MPI library.
>>>
>>> I'm not sure whether this observation is indicative of (this
>>> version of) IBM's MPI library not having implemented the full
>>> standard, the standard not specifying behaviour in this case, or
>>> GROMACS not being sufficiently defensive. I haven't found
>>> anything useful in the MPI documentation I
>> have
>>> to hand. You could argue cases either way - the implementors of
>>> the library want to avoid such checks to speed performance, and
>>> the users of the library expect it either to take care of such
>>> housekeeping for them, or not dereference pointers unnecessarily
>>> (think buffering)...
>>>
>>> Does anyone know what expected behaviour is here?
>>>
>>> Cheers,
>>>
>>> Mark -- gmx-developers mailing list gmx-developers at gromacs.org
>>> http://lists.gromacs.org/mailman/listinfo/gmx-developers Please
>>> don't post (un)subscribe requests to the list. Use the www
>>> interface or send it to gmx-developers-request at gromacs.org.
>>>
>>>
>>>
>>> -- ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
>>> 865-241-1537, ORNL PO BOX 2008 MS6309 -- gmx-developers mailing
>>> list gmx-developers at gromacs.org
>>> http://lists.gromacs.org/mailman/listinfo/gmx-developers Please
>>> don't post (un)subscribe requests to the list. Use the www
>>> interface or send it to gmx-developers-request at gromacs.org.
>>
>>
>>
>> ------------------------------
>>
>> -- gmx-developers mailing list gmx-developers at gromacs.org
>> http://lists.gromacs.org/mailman/listinfo/gmx-developers
>>
>>
>> End of gmx-developers Digest, Vol 67, Issue 16
>> **********************************************
>>
>>
More information about the gromacs.org_gmx-developers
mailing list