[gmx-developers] MPI and broadcasting state->x

Vedran Miletić vedran at miletic.net
Thu Aug 31 19:03:14 CEST 2017


Hi,

first off, I'm aware that global state vectors aren't broadcasted by
default and that probably shouldn't be otherwise. However, let's for the
moment assume that a particular prototype code requires broadcasting
global state in the main MD loop, i.e. that I'm doing something like:

while (!bLastStep)
{
...

if (DOMAINDECOMP(cr))
{
    if (!MASTER(cr))
    {
        srenew(state_global->x, state_global->natoms);
    }
    gmx_bcast(state_global->natoms * sizeof(rvec), state_global->x, cr);
}

...
}

Two questions:

1) The code above works fine with 16, 32, 48 MPI processes, but at 64
processes the following happens:

[fiji:24559] *** Process received signal ***
[fiji:24559] Signal: Segmentation fault (11)
[fiji:24559] Signal code: Address not mapped (1)
[fiji:24559] Failing at address: 0x4
[fiji:24559] [ 0] /lib64/libpthread.so.0(+0x122c0)[0x7f923e0c52c0]
[fiji:24559] [ 1]
/home/miletivn/software/lib64/libgromacs_mpi.so.2(_Z21check_stop_conditionsP8_IO_FILEP9t_commrecP10t_condstopPK7t_stateS7_+0x4ee)[0x7f9241913e6e]
[fiji:24559] [ 2]
gmx_mpi(_ZN3gmx5do_mdEP8_IO_FILEP9t_commreciPK8t_filenmPK16gmx_output_env_tiiP11gmx_vsite_tP10gmx_constriP10t_inputrecP10gmx_mtop_tP8t_fcdataP7t_stateP9t_mdatomsP6t_nrnbP13gmx_wallcycleP9gmx_edsamP10t_forcereciiiP12gmx_membed_tffimP23gmx_walltime_accounting+0x1272)[0x40e922]
[fiji:24559] [ 3]
gmx_mpi(_ZN3gmx8mdrunnerEP12gmx_hw_opt_tP8_IO_FILEP9t_commreciPK8t_filenmPK16gmx_output_env_tiiPiiiffPKcfSE_SE_SE_SE_iliiiiiifffim+0x151e)[0x42475e]
[fiji:24559] [ 4] gmx_mpi(_Z9gmx_mdruniPPc+0x15e1)[0x4154e1]
[fiji:24559] [ 5]
/home/miletivn/software/lib64/libgromacs_mpi.so.2(_ZN3gmx24CommandLineModuleManager3runEiPPc+0x237)[0x7f924150bb37]
[fiji:24559] [ 6] gmx_mpi(main+0x7c)[0x40c47c]
[fiji:24559] [ 7] /lib64/libc.so.6(__libc_start_main+0xea)[0x7f923d21e50a]
[fiji:24559] [ 8] gmx_mpi(_start+0x2a)[0x40c54a]
[fiji:24559] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 40 with PID 0 on node fiji exited on
signal 11 (Segmentation fault).
--------------------------------------------------------------------------

Why? I don't see any obvious mistake that could kick in at 64 MPI processes.

2) Should I use gmx_bcast or gmx_bcast_sim? In general, how does one
decide between mpi_comm_mygroup and mpi_comm_mysim?

Thanks,
Vedran

-- 
Vedran Miletić
vedran.miletic.net


More information about the gromacs.org_gmx-developers mailing list