[gmx-users] REMD and distance restraints problem in gmx 4.6.7

Christopher Neale chris.neale at alum.utoronto.ca
Fri Sep 18 06:27:05 CEST 2015


Dear Users:

I have a system with many distance restraints, designed to maintain helical character, e.g.:
[ distance_restraints ]
90 33 1 1 2 2.745541e-01 3.122595e-01 999 1.0
97 57 1 2 2 2.876300e-01 2.892921e-01 999 1.0
114 73 1 3 2 2.704403e-01 2.929642e-01 999 1.0
...

Distance restraints are properly turned on in the .mdp file with:
disre=simple
disre-fc=1000

The run works fine on a single node (gmx 4.6.7 here and for all that follows):
mdrun -nt 24 ...

The run also works fine on two nodes:
ibrun -np 48 mdrun_mpi ...

However, if I try to do temperature replica exchange (REMD), with two replicas and two nodes like this:
ibrun -np 48 mdrun_mpi -multi 2 -replex 200 ...

then I get the error message:
Fatal error:
Time or ensemble averaged or multiple pair distance restraints do not work (yet) with domain decomposition, use particle decomposition (mdrun option -pd)

Aside: I tried particle decomposition, but when I do that without the REMD, simply running the 48-core job that worked fine with domain decomposition, I get LINCS errors and quickly a crash (note that without -pd I have 25 ns and counting of run without error):
Step 0, time 0 (ps)  LINCS WARNING in simulation 0
relative constraint deviation after LINCS:
rms 5.774043, max 48.082966 (between atoms 21554 and 21555)
bonds that rotated more than 30 degrees:
 atom 1 atom 2  angle  previous, current, constraint length
...

So I am stuck with an error message that is not entirely helpful because (a) the -pd option does not solve the issue even without REMD and also (b) the issue seems to be related to REMD (because without REMD I can run on multiple nodes) though that is not mentioned in the error message.

I note that Mark Abraham mentioned here: http://redmine.gromacs.org/issues/1117 that:
"You can use MPI, you just can't have more than one domain (= MPI rank) per simulation. For a multi-simulation with distance restraints and not replica-exchange, you thus must have as many MPI ranks as simulations, so that each simulation has one rank and thus one domain."

I have trouble interpreting this, as I have always thought that running MPI across multiple nodes requires multiple domains (apparently = MPI ranks), so I am confused as to why that is possible without REMD but gets messy with REMD.

Final note: I am not trying to do "Time or ensemble averaged" distance restraints, and I think that I am not trying to do "multiple pair distance restraints", unless that simply means having more than one  simple distance restraint. So at the very least I think that the error message that I get is confusing.

If the solution or source of error is obvious then sorry.. maybe I just don't get MPI well enough.

Thank you for your suggestions,
Chris.



More information about the gromacs.org_gmx-users mailing list