[gmx-users] Replica exchange problem on IBM BlueGene/Q

Mark Abraham mark.j.abraham at gmail.com
Thu Nov 28 00:14:57 CET 2013


Hi,

A full stack trace might be instructive, but probably will not lead to a
fix. As a guess, an old GROMACS assumption about how MPI hostnames
typically work is violated by BG/Q. This is fixed in only 4.6.4.

Even without that, I would strongly encourage the use of 4.6.4 - you should
get a factor of about 3 higher throughput just for using 4 OpenMP threads
per rank with the Verlet scheme, and probably a further factor of 2-3 from
the new SIMD Verlet kernels. And if there's a still an REMD or MPI bug, it
has a chance to get fixed in the 4.6 branch! :-)

Mark


On Wed, Nov 27, 2013 at 11:19 PM, Ippoliti, Emiliano
<e.ippoliti at grs-sim.de>wrote:

> Dear all,
>
> we are trying to run a replica exchange simulation on an IBM BlueGene/Q
> system with gromacs 4.5.5.
>
> We have already run the same simulation on a more traditional Intel based
> cluster without problem.
>
> We have tested the version of gromacs 4.5.5 installed on our IBM
> BlueGene/Q machine by running a standard simulation with the same system as
> the one in the replica exchange simulation.
>
> When we switch to the replica exchange calculation, gromacs stops almost
> immediately after the first messages where the options for mdrun are
> echoed. with the following generic MPI error:
>
> Abort(1) on node 19 (rank 19 in comm 1140850688): Fatal error in
> MPI_Allreduce: Invalid communicator, error stack:
> MPI_Allreduce(855): MPI_Allreduce(sbuf=MPI_IN_PLACE, rbuf=0x19c5818d20,
> count=32, MPI_INT, MPI_SUM, MPI_COMM_NULL) failed
> MPI_Allreduce(780): Null communicator
>
> Some of you has experienced the same problem with an IBM BlueGene/Q
> machine in combination with replica exchange? Do you have any suggestion to
> overcome di issue?
>
> The execution line in the batch script we used is:
>
> runjob --ranks-per-node 1 --exe <path to backend node version of
> mdrun>/mdrun_mpi_dh_bg --args "-s" --args "rex04_.tpr" --args "-v" --args
> "-cpi" --args "rex03_.cpt" --args "-multi" --args "32" --args "-replex"
> --args "500"
>
> Thanks in advance for any suggestion.
>
> Best regards,
> Emiliano and Francesco
>
>
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>


More information about the gromacs.org_gmx-users mailing list