[gmx-users] Replica exchange problem on IBM BlueGene/Q

Ippoliti, Emiliano e.ippoliti at grs-sim.de
Wed Nov 27 23:20:04 CET 2013


Dear all,

we are trying to run a replica exchange simulation on an IBM BlueGene/Q system with gromacs 4.5.5.

We have already run the same simulation on a more traditional Intel based cluster without problem.

We have tested the version of gromacs 4.5.5 installed on our IBM BlueGene/Q machine by running a standard simulation with the same system as the one in the replica exchange simulation.

When we switch to the replica exchange calculation, gromacs stops almost immediately after the first messages where the options for mdrun are echoed. with the following generic MPI error:

Abort(1) on node 19 (rank 19 in comm 1140850688): Fatal error in MPI_Allreduce: Invalid communicator, error stack: 
MPI_Allreduce(855): MPI_Allreduce(sbuf=MPI_IN_PLACE, rbuf=0x19c5818d20, count=32, MPI_INT, MPI_SUM, MPI_COMM_NULL) failed
MPI_Allreduce(780): Null communicator

Some of you has experienced the same problem with an IBM BlueGene/Q machine in combination with replica exchange? Do you have any suggestion to overcome di issue?

The execution line in the batch script we used is:

runjob --ranks-per-node 1 --exe <path to backend node version of mdrun>/mdrun_mpi_dh_bg --args "-s" --args "rex04_.tpr" --args "-v" --args "-cpi" --args "rex03_.cpt" --args "-multi" --args "32" --args "-replex" --args "500"

Thanks in advance for any suggestion.

Best regards,
Emiliano and Francesco





More information about the gromacs.org_gmx-users mailing list