[gmx-developers] Re: broadcast of zero-length arrays

Mark Abraham Mark.Abraham at anu.edu.au
Tue Nov 24 01:12:14 CET 2009


Mathias PUETZ wrote:
> Hi,
> 
> on BG/L zero length broadcasts were buggy on MPI_Bcast with length zero.
> I reported a defect to IBM BG development about a year ago and it was fixed
> shortly thereafter.
> By the MPI standard null pointers are legal for zero length buffers.
> The bug is in the BG MPI argument checking, not in the actual routine that
> does the broadcast..
> 
> I suggest upgrading your BG/L driver software to the latest level. This
> should take care of it.
> If the problem really should persist, please contact your IBM customer
> support and
> open a new defect report (perhaps - I hope not - the original fix was lost
> in a newer driver version).

Thanks - I'll look into that.

Brian Smith of IBM suggested off-list that setting BGLMPI_BCAST=MPICH 
would be a useful diagnostic. Setting that allowed GROMACS to continue 
past the previous crash point, indicating that the issue is probably in 
the optimized version.

Per Brian's request, here's a stack trace with the above variable not 
set. Core was dumped on many of the 64 MPI processes.

[12:54][bgfen1-c:timing_prebugfix]$  tail -n 20 core.0 | addr2line -f -e 
/hpc/home/mja163/builds/gromacs_builds/git/pre-bugfix/mpi_debug/src/kernel/mdrun
??
??:0
??
??:0
??
??:0
??
??:0
??
??:0
BGLMP_TreeBcastPacketDispatch
/bglhome/usr6/bgbuild/V1R3M4_300_2008-080728/ppc/src/bglsw/comm/sys/msglayer/util/BGLMLVNMutil.h:135
BGLML_Messager_tree_advance
/bglhome/usr6/bgbuild/V1R3M4_300_2008-080728/ppc/src/bglsw/comm/sys/msglayer/base/advance/BGLML_advance.h:295
BGLMP_TreeBcast
/bglhome/usr6/bgbuild/V1R3M4_300_2008-080728/ppc/src/bglsw/comm/sys/msglayer/proto/collectives/TreeBcast/BGLMP_TreeBcast.c:161
MPIDI_BGLTR_Bcast
/bglhome/usr6/bgbuild/V1R3M4_300_2008-080728/ppc/src/bglsw/comm/lib/mpi/mpich2/src/mpid/bgltorus5/src/coll/mpidi_bgltr/mpidi_bgltr_bcast.c:62
MPIDI_Coll_Comm_Bcast_wrapper
/bglhome/usr6/bgbuild/V1R3M4_300_2008-080728/ppc/src/bglsw/comm/lib/mpi/mpich2/src/mpid/bgltorus5/src/coll/mpid_collectives.c:1022
PMPI_Bcast
/bglhome/usr6/bgbuild/V1R3M4_300_2008-080728/ppc/src/bglsw/comm/lib/mpi/mpich2/src/mpi/coll/bcast.c:767
gmx_bcast
../../../src/gmxlib/network.c:363
bc_grpopts
../../../src/gmxlib/mvdata.c:341
bc_inputrec
../../../src/gmxlib/mvdata.c:402
bcast_ir_mtop
../../../src/gmxlib/mvdata.c:449
init_parallel
../../../src/mdlib/init.c:166
mdrunner
../../../src/kernel/md.c:165
main
../../../src/kernel/mdrun.c:496
_start_blrts
../sysdeps/blrts/start.c:107
??
??:0

That looks very much like Mathias's bugfix is required on my system.

Thanks to all for the prompt discussion.

Mark


> Mit freundlichen Grüßen / Kind regards
> Dr. Mathias Puetz
> 
> Application Performance Specialist
> IBM Sales & Distribution, STG Sales / Industries Deep Computing FTSS
> ------------------------------------------------------------------------------------------------------------------------------------------
> 
> IBM Deutschland
> Hechtsheimer Str. 2
> 55131 Mainz
> Phone: +49-160-7120602
> Mobile: +49-(0)160-7120602
> E-Mail: mpuetz at de.ibm.com
> -------------------------------------------------------------------------------------------------------------------------------------------
> 
> IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Erich Clementi
> Geschäftsführung: Martin Jetter (Vorsitzender), Reinhard Reschke, Christoph
> Grandpierre,Matthias Hartmann, Michael Diemer
> Sitz der Gesellschaft: Stuttgart / Registergericht: Amtsgericht Stuttgart,
> HRB 14562 WEEE-Reg.-Nr. DE 99369940
> 
> 
>                                                                            
>              gmx-developers-re                                             
>              quest at gromacs.org                                             
>              Sent by:                                                   To 
>              gmx-developers-bo         gmx-developers at gromacs.org          
>              unces at gromacs.org                                          cc 
>                                                                            
>                                                                    Subject 
>              11/23/2009 12:00          gmx-developers Digest, Vol 67,      
>              PM                        Issue 16                            
>                                                                            
>                                                                            
>              Please respond to                                             
>              gmx-developers at gr                                             
>                  omacs.org                                                 
>                                                                            
>                                                                            
> 
> 
> 
> 
> Send gmx-developers mailing list submissions to
>              gmx-developers at gromacs.org
> 
> To subscribe or unsubscribe via the World Wide Web, visit
>              http://lists.gromacs.org/mailman/listinfo/gmx-developers
> or, via email, send a message with subject or body 'help' to
>              gmx-developers-request at gromacs.org
> 
> You can reach the person managing the list at
>              gmx-developers-owner at gromacs.org
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of gmx-developers digest..."
> 
> 
> Today's Topics:
> 
>    1. broadcast of zero-length arrays (Mark Abraham)
>    2. Re: broadcast of zero-length arrays (Roland Schulz)
>    3. Re: broadcast of zero-length arrays (hess at sbc.su.se)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Mon, 23 Nov 2009 14:11:32 +1100
> From: Mark Abraham <Mark.Abraham at anu.edu.au>
> Subject: [gmx-developers] broadcast of zero-length arrays
> To: Gromacs Developers <gmx-developers at gromacs.org>
> Message-ID: <4B09FD64.1020407 at anu.edu.au>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> 
> Hi,
> 
> During src/gmxlib/mvdata.c bc_grpopts(), my BlueGene/L segfaults during
> the broadcasts of the QMMM stuff. The lines that break are attempts to
> broadcast arrays of zero length. Adding a check for non-zero length into
> the definition of nblock_bc fixes the problem. Presumably a null pointer
> is being dereferenced inside the MPI library.
> 
> I'm not sure whether this observation is indicative of (this version of)
> IBM's MPI library not having implemented the full standard, the standard
> not specifying behaviour in this case, or GROMACS not being sufficiently
> defensive. I haven't found anything useful in the MPI documentation I
> have to hand. You could argue cases either way - the implementors of the
> library want to avoid such checks to speed performance, and the users of
> the library expect it either to take care of such housekeeping for them,
> or not dereference pointers unnecessarily (think buffering)...
> 
> Does anyone know what expected behaviour is here?
> 
> Cheers,
> 
> Mark
> 
> 
> ------------------------------
> 
> Message: 2
> Date: Mon, 23 Nov 2009 00:24:30 -0500
> From: Roland Schulz <roland at utk.edu>
> Subject: Re: [gmx-developers] broadcast of zero-length arrays
> To: Discussion list for GROMACS development
>              <gmx-developers at gromacs.org>,             Brian Smith
> <smithbr at us.ibm.com>
> Message-ID:
>              <c93c21390911222124n717712fkaa683b0d6a40226 at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
> 
> Hi Brian,
> 
> you could help before with a Segfault in the BlueGene MPI layer (in
> scatterv). Do you consider the below described segfault a bug in the MPI
> layer or in Gromacs?
> 
> Roland
> 
> ---------- Forwarded message ----------
> From: Mark Abraham <Mark.Abraham at anu.edu.au>
> Date: Sun, Nov 22, 2009 at 10:11 PM
> Subject: [gmx-developers] broadcast of zero-length arrays
> To: Gromacs Developers <gmx-developers at gromacs.org>
> 
> 
> Hi,
> 
> During src/gmxlib/mvdata.c bc_grpopts(), my BlueGene/L segfaults during the
> broadcasts of the QMMM stuff. The lines that break are attempts to
> broadcast
> arrays of zero length. Adding a check for non-zero length into the
> definition of nblock_bc fixes the problem. Presumably a null pointer is
> being dereferenced inside the MPI library.
> 
> I'm not sure whether this observation is indicative of (this version of)
> IBM's MPI library not having implemented the full standard, the standard
> not
> specifying behaviour in this case, or GROMACS not being sufficiently
> defensive. I haven't found anything useful in the MPI documentation I have
> to hand. You could argue cases either way - the implementors of the library
> want to avoid such checks to speed performance, and the users of the
> library
> expect it either to take care of such housekeeping for them, or not
> dereference pointers unnecessarily (think buffering)...
> 
> Does anyone know what expected behaviour is here?
> 
> Cheers,
> 
> Mark
> --
> gmx-developers mailing list
> gmx-developers at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-developers
> Please don't post (un)subscribe requests to the list. Use the www interface
> or send it to gmx-developers-request at gromacs.org.
> 
> 
> 
> --
> ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
> 865-241-1537, ORNL PO BOX 2008 MS6309
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> http://lists.gromacs.org/pipermail/gmx-developers/attachments/20091123/5797bd81/attachment-0001.html
> 
> 
> ------------------------------
> 
> Message: 3
> Date: Mon, 23 Nov 2009 09:20:41 +0100 (CET)
> From: hess at sbc.su.se
> Subject: Re: [gmx-developers] broadcast of zero-length arrays
> To: "Discussion list for GROMACS development"
>              <gmx-developers at gromacs.org>
> Cc: Brian Smith <smithbr at us.ibm.com>
> Message-ID: <37050.90.163.29.105.1258964441.squirrel at mail.sbc.su.se>
> Content-Type: text/plain;charset=iso-8859-1
> 
> Hi,
> 
> I can remember have such issues before, I think also on an IBM
> and discussing this with somebody from IBM.
> I thought I had removed all MPI calls with NULL pointers,
> but apparently this is not the case.
> I committed fixes for nblock_bc in mvdata for 4.0.6 and git master.
> 
> Berk
> 
>> Hi Brian,
>>
>> you could help before with a Segfault in the BlueGene MPI layer (in
>> scatterv). Do you consider the below described segfault a bug in the MPI
>> layer or in Gromacs?
>>
>> Roland
>>
>> ---------- Forwarded message ----------
>> From: Mark Abraham <Mark.Abraham at anu.edu.au>
>> Date: Sun, Nov 22, 2009 at 10:11 PM
>> Subject: [gmx-developers] broadcast of zero-length arrays
>> To: Gromacs Developers <gmx-developers at gromacs.org>
>>
>>
>> Hi,
>>
>> During src/gmxlib/mvdata.c bc_grpopts(), my BlueGene/L segfaults during
>> the
>> broadcasts of the QMMM stuff. The lines that break are attempts to
>> broadcast
>> arrays of zero length. Adding a check for non-zero length into the
>> definition of nblock_bc fixes the problem. Presumably a null pointer is
>> being dereferenced inside the MPI library.
>>
>> I'm not sure whether this observation is indicative of (this version of)
>> IBM's MPI library not having implemented the full standard, the standard
>> not
>> specifying behaviour in this case, or GROMACS not being sufficiently
>> defensive. I haven't found anything useful in the MPI documentation I
> have
>> to hand. You could argue cases either way - the implementors of the
>> library
>> want to avoid such checks to speed performance, and the users of the
>> library
>> expect it either to take care of such housekeeping for them, or not
>> dereference pointers unnecessarily (think buffering)...
>>
>> Does anyone know what expected behaviour is here?
>>
>> Cheers,
>>
>> Mark
>> --
>> gmx-developers mailing list
>> gmx-developers at gromacs.org
>> http://lists.gromacs.org/mailman/listinfo/gmx-developers
>> Please don't post (un)subscribe requests to the list. Use the www
>> interface
>> or send it to gmx-developers-request at gromacs.org.
>>
>>
>>
>> --
>> ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
>> 865-241-1537, ORNL PO BOX 2008 MS6309
>> --
>> gmx-developers mailing list
>> gmx-developers at gromacs.org
>> http://lists.gromacs.org/mailman/listinfo/gmx-developers
>> Please don't post (un)subscribe requests to the list. Use the
>> www interface or send it to gmx-developers-request at gromacs.org.
> 
> 
> 
> ------------------------------
> 
> --
> gmx-developers mailing list
> gmx-developers at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-developers
> 
> 
> End of gmx-developers Digest, Vol 67, Issue 16
> **********************************************
> 
> 



More information about the gromacs.org_gmx-developers mailing list