[gmx-users] REMD stall out

Daniel Burns dburns at iastate.edu
Thu Feb 20 22:10:33 CET 2020


Hi again,

It seems including our openmp module was responsible for the issue the
whole time.  When I submit the job only loading pmix and gromacs, replica
exchange proceeds.

Thank you,

Dan

On Mon, Feb 17, 2020 at 9:09 AM Mark Abraham <mark.j.abraham at gmail.com>
wrote:

> Hi,
>
> That could be caused by configuration of the parallel file system or MPI on
> your cluster. If only one file descriptor is available per node to an MPI
> job, then your symptoms are explained. Some kinds of compute jobs follow
> such a model, so maybe someone optimized something for that.
>
> Mark
>
> On Mon, 17 Feb 2020 at 15:56, Daniel Burns <dburns at iastate.edu> wrote:
>
> > HI Szilard,
> >
> > I've deleted all my output but all the writing to the log and console
> stops
> > around the step noting the domain decomposition (or other preliminary
> > task).  It is the same with or without Plumed - the TREMD with Gromacs
> only
> > was the first thing to present this issue.
> >
> > I've discovered that if each replica is assigned its own node, the
> > simulations proceed.  If I try to run several replicas on each node
> > (divided evenly), the simulations stall out before any trajectories get
> > written.
> >
> > I have tried many different -np and -ntomp options as well as several
> slurm
> > job submission scripts with node/ thread configurations but multiple
> > simulations per node will not work.  I need to be able to run several
> > replicas on the same node to get enough data since it's hard to get more
> > than 8 nodes (and as a result, replicas).
> >
> > Thanks for your reply.
> >
> > -Dan
> >
> > On Tue, Feb 11, 2020 at 12:56 PM Daniel Burns <dburns at iastate.edu>
> wrote:
> >
> > > Hi,
> > >
> > > I continue to have trouble getting an REMD job to run.  It never makes
> it
> > > to the point that it generates trajectory files but it never gives any
> > > error either.
> > >
> > > I have switched from a large TREMD with 72 replicas to the Plumed
> > > Hamiltonian method with only 6 replicas.  Everything is now on one node
> > and
> > > each replica has 6 cores.  I've turned off the dynamic load balancing
> on
> > > this attempt per the recommendation from the Plumed site.
> > >
> > > Any ideas on how to troubleshoot?
> > >
> > > Thank you,
> > >
> > > Dan
> > >
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> >
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>


More information about the gromacs.org_gmx-users mailing list