[gmx-users] Re: parallel job crash for large system

Dr. Vitaly V. Chaban vvchaban at gmail.com
Tue Aug 23 03:02:21 CEST 2011

> Your density seems to be about 70% of what I would expect. Are you
> sure that this is not just a normal case of a poorly equilibrated
> system crashing? That matches with what you say about the density
> growing (although perhaps it has more to do with poor equilibration
> than with mixing, as you suggest)?

The density should grow, that's the subject of study. The subsystems
are equilibrated separately. Again, the crash occurs only after
several hundreds thousands of steps. Bad geometries die during the
first ps.

> In any event, I'd suggest simplifying your system and making it
> smaller to see of you can reproduce the problem with a system that
> will run quickly in serial.

The bad thing is that it works with smaller, however, not so "long"
system... Actually I am sure that the problem is in my input molecular
configuration. But it is interesting where it is anyway.

Interestingly, if I retrieve the last configuration before crash from
the TRAJ.XTC and start from it, initializing velocities again, it
passes the old crash point. If I start from checkpoint, it crashes.


> On 23/08/2011 8:44 AM, Dr. Vitaly V. Chaban wrote:
>> In the below issue, the barostat is setup semiisotropically and works
>> only along the "long" direction. The density of the system slowly
>> grows due to mixing. If this can be useful....
> Does a different barostat work?
> Mark
>> On Mon, Aug 22, 2011 at 5:32 PM, Dr. Vitaly V. Chaban
>> <vvchaban at gmail.com>  wrote:
>>> We are running the system consisting of 84000 atoms in
>>> parallelepipedic box, 6x6x33nm. The starting geometry, etc are OK and
>>> evolution of trajectory is reasonable but after several hundred
>>> thousands of steps it suddenly crashes. Mysteriously, each time it
>>> crashes at different time-steps, but it always occurs. The parts of
>>> this system were equilibrated separately and did not crash. The system
>>> is not in equilibrium but without external forces. The
>>> Parrinello-Rahman barostat is turned on. The md.log does not show any
>>> problems, the PDB configurations are not written down before crash,
>>> the constaints are absent, the time-step is 1fs that is OK for
>>> separate components (in separate boxes).
>>> With serial gromacs, the error is not yet observed, but given the size
>>> the run is very slow.
>>> What can it be? Can it be somehow connected with the very (oblongated) box?

