[gmx-users] REMD and distance restraints problem in gmx 4.6.7
Christopher Neale
chris.neale at alum.utoronto.ca
Fri Sep 18 06:27:05 CEST 2015
Dear Users:
I have a system with many distance restraints, designed to maintain helical character, e.g.:
[ distance_restraints ]
90 33 1 1 2 2.745541e-01 3.122595e-01 999 1.0
97 57 1 2 2 2.876300e-01 2.892921e-01 999 1.0
114 73 1 3 2 2.704403e-01 2.929642e-01 999 1.0
...
Distance restraints are properly turned on in the .mdp file with:
disre=simple
disre-fc=1000
The run works fine on a single node (gmx 4.6.7 here and for all that follows):
mdrun -nt 24 ...
The run also works fine on two nodes:
ibrun -np 48 mdrun_mpi ...
However, if I try to do temperature replica exchange (REMD), with two replicas and two nodes like this:
ibrun -np 48 mdrun_mpi -multi 2 -replex 200 ...
then I get the error message:
Fatal error:
Time or ensemble averaged or multiple pair distance restraints do not work (yet) with domain decomposition, use particle decomposition (mdrun option -pd)
Aside: I tried particle decomposition, but when I do that without the REMD, simply running the 48-core job that worked fine with domain decomposition, I get LINCS errors and quickly a crash (note that without -pd I have 25 ns and counting of run without error):
Step 0, time 0 (ps) LINCS WARNING in simulation 0
relative constraint deviation after LINCS:
rms 5.774043, max 48.082966 (between atoms 21554 and 21555)
bonds that rotated more than 30 degrees:
atom 1 atom 2 angle previous, current, constraint length
...
So I am stuck with an error message that is not entirely helpful because (a) the -pd option does not solve the issue even without REMD and also (b) the issue seems to be related to REMD (because without REMD I can run on multiple nodes) though that is not mentioned in the error message.
I note that Mark Abraham mentioned here: http://redmine.gromacs.org/issues/1117 that:
"You can use MPI, you just can't have more than one domain (= MPI rank) per simulation. For a multi-simulation with distance restraints and not replica-exchange, you thus must have as many MPI ranks as simulations, so that each simulation has one rank and thus one domain."
I have trouble interpreting this, as I have always thought that running MPI across multiple nodes requires multiple domains (apparently = MPI ranks), so I am confused as to why that is possible without REMD but gets messy with REMD.
Final note: I am not trying to do "Time or ensemble averaged" distance restraints, and I think that I am not trying to do "multiple pair distance restraints", unless that simply means having more than one simple distance restraint. So at the very least I think that the error message that I get is confusing.
If the solution or source of error is obvious then sorry.. maybe I just don't get MPI well enough.
Thank you for your suggestions,
Chris.
More information about the gromacs.org_gmx-users
mailing list