[gmx-users] parallel run stops writing in output files
Mark Abraham
Mark.Abraham at anu.edu.au
Sat Nov 7 11:28:25 CET 2009
elena.mol at mail.com wrote:
>
> Dear gmx users
> i'm trying to run gromacs 4.0.5 (mdrun with replica exchange) in
> parallel on a cluster, with the command:
>
> mpirun -np n mdrun_mpi -multi m (.........other mdrun options....)
>
> (where n is the number of “processes” and m the n. of replicas)
>
> and after some time it stops writing in the output files, .log, .edr,
> .xtc...., although from “top” it seems it's still running. It stops
> writing “simultaneously” (after the same n. of steps) in all these
> files, I checked it by plotting energies etc from .edr, and rmsd from .xtc.
> I have done many tests changing the number of processes, their
> distribution among nodes, the nst.... options (how frequently to write
> in output files, in order to check if the problem was due to file
> size..), but it always shows the same kind of behaviour.
> The number of steps performed before stopping is not constant, but it
> corresponds to a cpu time of some minutes, while the same calculations
> run on a single processor work fine (for much longer times).
>
> Does anyone have ideas on how to solve this problem? ...which input
> parameters to change...?
Simplify your problem in order to trouble-shoot. Can you run "m by 1"
replica exchange? A non-REMD n-process mdrun? Any parallel mdrun? Any
non-parallel mdrun?
Mark
More information about the gromacs.org_gmx-users
mailing list