[gmx-developers] mdrun_mpi not able to reach "rank"

Mark Abraham mark.j.abraham at gmail.com
Wed Jun 5 08:31:39 CEST 2019


Hi,

In your two cases, the form of parallelism is different. In the latter, if
you are using two ranks with thread-MPI, then you cannot be using multisim,
so there is more than one rank for the single simulation in use.

The PAR(cr) macro (sadly, misnamed for historical reasons) reflects whether
there is more than one rank per simulation, so you should be check that,
before using e.g. the functions in gromacs/gmxlib/network.h to gather some
information to the ranks that are master of each simulation. There's other
functions for communicating between master ranks of multi-simulations (e.g.
see the REMD code)

Mark

On Wed, 5 Jun 2019 at 07:54, 1004753465 <1004753465 at qq.com> wrote:

> Hi everyone,
>
> I am currently trying to run two Gromacs 2018 parallel processes by using
>
> mpirun -np 2 ...(some path)/mdrun_mpi -v -multidir sim[01]
>
> During the simulation, I need to collect some information to the two
> master nodes, just like the function "dd_gather". Therefore, I need to
> reach (cr->dd) for each rank. However, whenever I want to print
> "cr->dd->rank" or "cr->dd->nnodes"or some thing like that, it just shows
>
> [c15:31936] *** Process received signal ***
> [c15:31936] Signal: Segmentation fault (11)
> [c15:31936] Signal code: Address not mapped (1)
> [c15:31936] Failing at address: 0x30
> [c15:31936] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)
> [0x7f7f9e374340]
> [c15:31936] [ 1]
> /home/hudan/wow/ngromacs-2018/gromacs-2018/build/bin/mdrun_mpi() [0x468cfb]
> [c15:31936] [ 2]
> /home/hudan/wow/ngromacs-2018/gromacs-2018/build/bin/mdrun_mpi() [0x40dd65]
> [c15:31936] [ 3]
> /home/hudan/wow/ngromacs-2018/gromacs-2018/build/bin/mdrun_mpi() [0x42ca93]
> [c15:31936] [ 4]
> /home/hudan/wow/ngromacs-2018/gromacs-2018/build/bin/mdrun_mpi() [0x416f7d]
> [c15:31936] [ 5]
> /home/hudan/wow/ngromacs-2018/gromacs-2018/build/bin/mdrun_mpi() [0x41792c]
> [c15:31936] [ 6]
> /home/hudan/wow/ngromacs-2018/gromacs-2018/build/bin/mdrun_mpi() [0x438756]
> [c15:31936] [ 7]
> /home/hudan/wow/ngromacs-2018/gromacs-2018/build/bin/mdrun_mpi() [0x438b3e]
> [c15:31936] [ 8]
> /home/hudan/wow/ngromacs-2018/gromacs-2018/build/bin/mdrun_mpi() [0x439a97]
> [c15:31936] [ 9] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)
> [0x7f7f9d591ec5]
> [c15:31936] [10]
> /home/hudan/wow/ngromacs-2018/gromacs-2018/build/bin/mdrun_mpi() [0x40b93e]
> [c15:31936] *** End of error message ***
> step 0[c15:31935] *** Process received signal ***
> [c15:31935] Signal: Segmentation fault (11)
> [c15:31935] Signal code: Address not mapped (1)
> [c15:31935] Failing at address: 0x30
> [c15:31935] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)
> [0x7fb64892e340]
> [c15:31935] [ 1]
> /home/hudan/wow/ngromacs-2018/gromacs-2018/build/bin/mdrun_mpi() [0x468cfb]
> [c15:31935] [ 2]
> /home/hudan/wow/ngromacs-2018/gromacs-2018/build/bin/mdrun_mpi() [0x40dd65]
> [c15:31935] [ 3]
> /home/hudan/wow/ngromacs-2018/gromacs-2018/build/bin/mdrun_mpi() [0x42ca93]
> [c15:31935] [ 4]
> /home/hudan/wow/ngromacs-2018/gromacs-2018/build/bin/mdrun_mpi() [0x416f7d]
> [c15:31935] [ 5]
> /home/hudan/wow/ngromacs-2018/gromacs-2018/build/bin/mdrun_mpi() [0x41792c]
> [c15:31935] [ 6]
> /home/hudan/wow/ngromacs-2018/gromacs-2018/build/bin/mdrun_mpi() [0x438756]
> [c15:31935] [ 7]
> /home/hudan/wow/ngromacs-2018/gromacs-2018/build/bin/mdrun_mpi() [0x438b3e]
> [c15:31935] [ 8]
> /home/hudan/wow/ngromacs-2018/gromacs-2018/build/bin/mdrun_mpi() [0x439a97]
> [c15:31935] [ 9] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)
> [0x7fb647b4bec5]
> [c15:31935] [10]
> /home/hudan/wow/ngromacs-2018/gromacs-2018/build/bin/mdrun_mpi() [0x40b93e]
> [c15:31935] *** End of error message ***
> --------------------------------------------------------------------------
> mpirun noticed that process rank 0 with PID 31935 on node c15.dynstar
> exited on signal 11 (Segmentation fault).
> --------------------------------------------------------------------------
>
> However, if I install the package without flag -DGMX_MPI=on, the single
> program(mdrun) runs smoothly. and all the domain decomposition rank can be
> printed out and used conveniently.
>
> It is pretty wierd to me that, with mdrun_mpi, although domain
> decomposition can be done, their rank can neither be printed out nor
> available through struct cr->dd. I wonder whether they were saved in other
> form, but I do not know what it is.
>
> I will appreciate it if someone can help. Thank you very much!!!
> Best,
> Huan
> --
> Gromacs Developers mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
> or send a mail to gmx-developers-request at gromacs.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20190605/0f93c01b/attachment-0002.html>


More information about the gromacs.org_gmx-developers mailing list