[gmx-users] Multi-node Replica Exchange Segfault

jkrieger at mrc-lmb.cam.ac.uk jkrieger at mrc-lmb.cam.ac.uk
Fri Oct 30 09:59:10 CET 2015


You could try using a mixture of openmpi and thread-mpi. I have found when
linked replicas in multi-walker metadynamics that it only works if the
replicas have the same allocation of the cluster. In Sun Grid Engine, I'd
have the following in my submit scripts:

#$ -pe openmpi 2
#$ -l dedicated=20
export OMP_NUM_THREADS=20
mpirun -np $NSLOTS mdrun_mpi -deffnm metad -cpi metad -plumed plumed.dat
-multi 2

It's probably slightly different with PBS but you could try the equivalent
without the plumed and then with replex.

Best wishes
James

> Hi,
>
> I've never heard of such. You could try a multisim without -replex, to
> help
> diagnose.
>
> Mark
>
> On Fri, 30 Oct 2015 03:33 Barnett, James W <jbarnet4 at tulane.edu> wrote:
>
>> Good evening here,
>>
>> I get a segmentation fault with my GROMACS 5.1 install only for replica
>> exchange
>> simulations right at the first successful exchange on a multi-node run.
>> Normal
>> simulations across multiple nodes work fine, and replica exchange
>> simulations on
>> one node work fine.
>>
>> I've reproduced the problem with just 2 replicas on 2 nodes with GPU's
>> disabled
>> (-nb cpu). Each node has 20 CPU's so I'm using 20 MPI ranks on each
>> (OpenMPI).
>>
>> I get a segfault right when the first exchange is successful.
>>
>> The only other error I get sometimes is that the Infiniband connection
>> timed out
>> retrying the communication between nodes at the exact same moment as the
>> segfault, but I don't get that every time, and it's usually with all
>> replicas
>> going (my goal is to do 30 replicas on 120 cpus). No other error logs,
>> and
>> mdrun's log does not indicate an error.
>>
>> PBS log: http://bit.ly/1P8Vs49
>> mdrun log: http://bit.ly/1RD0ViQ
>>
>> I'm currently troubleshooting this some with the sysadmin, but I wanted
>> to
>> check
>> to see if anyone has had a similar issue or any further steps to
>> troubleshoot.
>> I've also searched the mailing list and used my Google-fu, but it has
>> failed me
>> so far.
>>
>> Thanks for your help.
>>
>> --
>> James "Wes" Barnett, Ph.D. Candidate
>> Louisiana Board of Regents Fellow
>>
>> Chemical and Biomolecular Engineering
>> Tulane University
>> 341-B Lindy Boggs Center for Energy and Biotechnology
>> 6823 St. Charles Ave
>> New Orleans, Louisiana 70118-5674
>> jbarnet4 at tulane.edu
>> --
>> Gromacs Users mailing list
>>
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> send a mail to gmx-users-request at gromacs.org.
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send
> a mail to gmx-users-request at gromacs.org.
>




More information about the gromacs.org_gmx-users mailing list