[gmx-users] parallel run stops writing in output files

Mark Abraham Mark.Abraham at anu.edu.au
Sat Nov 7 11:28:25 CET 2009


elena.mol at mail.com wrote:
> 
> Dear gmx users
> i'm trying to run gromacs 4.0.5 (mdrun with replica exchange) in 
> parallel on a cluster, with the command:
> 
> mpirun -np n mdrun_mpi -multi m (.........other mdrun options....)
> 
> (where n is the number of “processes” and m the n. of replicas)
> 
> and after some time it stops writing in the output files, .log, .edr, 
> .xtc...., although from “top” it seems it's still running. It stops 
> writing “simultaneously” (after the same n. of steps) in  all these 
> files, I checked it by plotting energies etc from .edr, and rmsd from .xtc.
> I have done many tests changing the number of processes, their 
> distribution among nodes, the nst.... options (how frequently to write 
> in output files, in order to check if the problem was due to file 
> size..), but it always shows the same kind of behaviour.
> The number of steps performed before stopping is not constant, but it 
> corresponds to a cpu time of some minutes, while the same calculations 
> run on a single processor work fine (for much longer times).
> 
> Does anyone have ideas on how to solve this problem? ...which input 
> parameters to change...?

Simplify your problem in order to trouble-shoot. Can you run "m by 1" 
replica exchange? A non-REMD n-process mdrun? Any parallel mdrun? Any 
non-parallel mdrun?

Mark



More information about the gromacs.org_gmx-users mailing list