[gmx-users] crystal waters crash parallel but not serial
chris.neale at utoronto.ca
chris.neale at utoronto.ca
Mon Sep 25 15:24:32 CEST 2006
My run crashes in parallel but not in serial. I have narrowed it down
to the inclusion of crystal waters, but can't imagine what the problem
could be that would occur only on parallel. I have done three reps of
each situation in order to be sure that it is not a fluke. It's easy
enough to take a pass on parallel runs for now, but I am worried about
my system and basically I am asking the question: "does anybody know
that something like this would only happen if a system is otherwise
incorrectly made or could this be a quirk of multiple water groups /
ordering of groups / pressure coupling / something else across
multiple processors?"
My system here is an opls-aa protein in a POPE/DMPE membrane with
tip4p waters. Single precision. I am using the double-pairlist
inclusion method to scale the lipid 1-4 interactions. It runs fine
through initial protein heavy atom position restraints (0.5ns) and
then fine again with no topological restraints (3.5ns).
When I repeat this setup procedure, this time also with the crystal
waters as moleculetype xtip4p.itp/residue XSOL so that I can restrain
their positions as well, it runs fine for 0.5ns.
However, if I try to repeat this in parallel it crashes after ~40ps. I
can extend this time by increasing the number of EM steps or not
restraining the crystal waters (worrysome), but still it eventually
crashes. I have repeated this 3 times with parallel on 4 procs and 3
times running on single processors. I also did one additional replica
in double precision that crashed on parallel (8ps) and ran fine for
500ps on serial.
All of my crystal waters and my protein (the only things that are ever
restrained) are located on the first processor. I don't use shuffle
since I use position restraints. I have ordered the topology Protein,
crystal water, membrane, bulk water (and the includes in the same
order) and have temperature coupling groups that are 1)protein,
2)lipid, 3)crystal + bulk water.
I use lincs.
When the run crashes in parallel, the frame prior to the crash shows a
single exploding water molecule (bulk water not crystal water).
However this was only 2 times (the others gave me a silent
compute-forever death even with kill -HUP so I didn't get to see the
frame) and the large number of bulk waters means that this might just
be by chance.
Thanks for any comments.
Chris.
More information about the gromacs.org_gmx-users
mailing list