[gmx-users] REMD stall out

Daniel Burns dburns at iastate.edu
Fri Feb 21 16:48:25 CET 2020


This was not actually the solution.  Wanted to follow up in case
someone else is experiencing this problem.  We are reinstalling the openmp
version.

On Thu, Feb 20, 2020 at 3:10 PM Daniel Burns <dburns at iastate.edu> wrote:

> Hi again,
>
> It seems including our openmp module was responsible for the issue the
> whole time.  When I submit the job only loading pmix and gromacs, replica
> exchange proceeds.
>
> Thank you,
>
> Dan
>
> On Mon, Feb 17, 2020 at 9:09 AM Mark Abraham <mark.j.abraham at gmail.com>
> wrote:
>
>> Hi,
>>
>> That could be caused by configuration of the parallel file system or MPI
>> on
>> your cluster. If only one file descriptor is available per node to an MPI
>> job, then your symptoms are explained. Some kinds of compute jobs follow
>> such a model, so maybe someone optimized something for that.
>>
>> Mark
>>
>> On Mon, 17 Feb 2020 at 15:56, Daniel Burns <dburns at iastate.edu> wrote:
>>
>> > HI Szilard,
>> >
>> > I've deleted all my output but all the writing to the log and console
>> stops
>> > around the step noting the domain decomposition (or other preliminary
>> > task).  It is the same with or without Plumed - the TREMD with Gromacs
>> only
>> > was the first thing to present this issue.
>> >
>> > I've discovered that if each replica is assigned its own node, the
>> > simulations proceed.  If I try to run several replicas on each node
>> > (divided evenly), the simulations stall out before any trajectories get
>> > written.
>> >
>> > I have tried many different -np and -ntomp options as well as several
>> slurm
>> > job submission scripts with node/ thread configurations but multiple
>> > simulations per node will not work.  I need to be able to run several
>> > replicas on the same node to get enough data since it's hard to get more
>> > than 8 nodes (and as a result, replicas).
>> >
>> > Thanks for your reply.
>> >
>> > -Dan
>> >
>> > On Tue, Feb 11, 2020 at 12:56 PM Daniel Burns <dburns at iastate.edu>
>> wrote:
>> >
>> > > Hi,
>> > >
>> > > I continue to have trouble getting an REMD job to run.  It never
>> makes it
>> > > to the point that it generates trajectory files but it never gives any
>> > > error either.
>> > >
>> > > I have switched from a large TREMD with 72 replicas to the Plumed
>> > > Hamiltonian method with only 6 replicas.  Everything is now on one
>> node
>> > and
>> > > each replica has 6 cores.  I've turned off the dynamic load balancing
>> on
>> > > this attempt per the recommendation from the Plumed site.
>> > >
>> > > Any ideas on how to troubleshoot?
>> > >
>> > > Thank you,
>> > >
>> > > Dan
>> > >
>> > --
>> > Gromacs Users mailing list
>> >
>> > * Please search the archive at
>> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> > posting!
>> >
>> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >
>> > * For (un)subscribe requests visit
>> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> > send a mail to gmx-users-request at gromacs.org.
>> >
>> --
>> Gromacs Users mailing list
>>
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> send a mail to gmx-users-request at gromacs.org.
>>
>


More information about the gromacs.org_gmx-users mailing list