[gmx-users] REMD stall out

Daniel Burns dburns at iastate.edu
Mon Feb 17 15:55:42 CET 2020


HI Szilard,

I've deleted all my output but all the writing to the log and console stops
around the step noting the domain decomposition (or other preliminary
task).  It is the same with or without Plumed - the TREMD with Gromacs only
was the first thing to present this issue.

I've discovered that if each replica is assigned its own node, the
simulations proceed.  If I try to run several replicas on each node
(divided evenly), the simulations stall out before any trajectories get
written.

I have tried many different -np and -ntomp options as well as several slurm
job submission scripts with node/ thread configurations but multiple
simulations per node will not work.  I need to be able to run several
replicas on the same node to get enough data since it's hard to get more
than 8 nodes (and as a result, replicas).

Thanks for your reply.

-Dan

On Tue, Feb 11, 2020 at 12:56 PM Daniel Burns <dburns at iastate.edu> wrote:

> Hi,
>
> I continue to have trouble getting an REMD job to run.  It never makes it
> to the point that it generates trajectory files but it never gives any
> error either.
>
> I have switched from a large TREMD with 72 replicas to the Plumed
> Hamiltonian method with only 6 replicas.  Everything is now on one node and
> each replica has 6 cores.  I've turned off the dynamic load balancing on
> this attempt per the recommendation from the Plumed site.
>
> Any ideas on how to troubleshoot?
>
> Thank you,
>
> Dan
>


More information about the gromacs.org_gmx-users mailing list