[gmx-users] problem with running mdrun in parallel
map110+ at pitt.edu
map110+ at pitt.edu
Thu Jan 8 21:03:37 CET 2009
Hi there,
I am trying to run an MD simulation on a 13 residue peptide with distance
restraints. Before, I ran into a problem with this system when an error
message occurred during mdrun concerning distance restraints and domain
decomposition. There was apparently a bug in mshift.c and the bug was
fixed. However, at this point, my simulation works properly only if I use
the serial version of mdrun. When I try to run it in parallel, I get this
error message:
NNODES=4, MYRANK=0, HOSTNAME=chong06.chem.pitt.edu
NNODES=4, MYRANK=2, HOSTNAME=chong06.chem.pitt.edu
NNODES=4, MYRANK=3, HOSTNAME=chong06.chem.pitt.edu
NNODES=4, MYRANK=1, HOSTNAME=chong06.chem.pitt.edu
NODEID=0 argc=13
NODEID=3 argc=13
NODEID=2 argc=13
NODEID=1 argc=13
:-) G R O M A C S (-:
Grunge ROck MAChoS
:-) VERSION 4.0.2 (-:
Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2008, The GROMACS development team,
check out http://www.gromacs.org for more information.
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License
as published by the Free Software Foundation; either version 2
of the License, or (at your option) any later version.
:-) /home/map110/gromacs.4.0.2patched/bin/mdrunmpi (-:
Option Filename Type Description
------------------------------------------------------------
-s md.tpr Input Run input file: tpr tpb tpa
-o md.trr Output Full precision trajectory: trr trj cpt
-x md.xtc Output, Opt! Compressed trajectory (portable xdr format)
-cpi state.cpt Input, Opt. Checkpoint file
-cpo state.cpt Output, Opt. Checkpoint file
-c md.gro Output Structure file: gro g96 pdb
-e md.edr Output Energy file: edr ene
-g md.log Output Log file
-dgdl dgdl.xvg Output, Opt. xvgr/xmgr file
-field field.xvg Output, Opt. xvgr/xmgr file
-table table.xvg Input, Opt. xvgr/xmgr file
-tablep tablep.xvg Input, Opt. xvgr/xmgr file
-tableb table.xvg Input, Opt. xvgr/xmgr file
-rerun rerun.xtc Input, Opt. Trajectory: xtc trr trj gro g96 pdb cpt
-tpi tpi.xvg Output, Opt. xvgr/xmgr file
-tpid tpidist.xvg Output, Opt. xvgr/xmgr file
-ei sam.edi Input, Opt. ED sampling input
-eo sam.edo Output, Opt. ED sampling output
-j wham.gct Input, Opt. General coupling stuff
-jo bam.gct Output, Opt. General coupling stuff
-ffout gct.xvg Output, Opt. xvgr/xmgr file
-devout deviatie.xvg Output, Opt. xvgr/xmgr file
-runav runaver.xvg Output, Opt. xvgr/xmgr file
-px pullx.xvg Output, Opt. xvgr/xmgr file
-pf pullf.xvg Output, Opt. xvgr/xmgr file
-mtx nm.mtx Output, Opt. Hessian matrix
-dn dipole.ndx Output, Opt. Index file
Option Type Value Description
------------------------------------------------------
-[no]h bool no Print help info and quit
-nice int 0 Set the nicelevel
-deffnm string Set the default filename for all file options
-[no]xvgr bool yes Add specific codes (legends etc.) in the output
xvg files for the xmgrace program
-[no]pd bool no Use particle decompostion
-dd vector 0 0 0 Domain decomposition grid, 0 is optimize
-npme int -1 Number of separate nodes to be used for PME, -1
is guess
-ddorder enum interleave DD node order: interleave, pp_pme or
cartesian
-[no]ddcheck bool yes Check for all bonded interactions with DD
-rdd real 0 The maximum distance for bonded interactions with
DD (nm), 0 is determine from initial coordinates
-rcon real 0 Maximum distance for P-LINCS (nm), 0 is estimate
-dlb enum auto Dynamic load balancing (with DD): auto, no or yes
-dds real 0.8 Minimum allowed dlb scaling of the DD cell size
-[no]sum bool yes Sum the energies at every step
-[no]v bool no Be loud and noisy
-[no]compact bool yes Write a compact log file
-[no]seppot bool no Write separate V and dVdl terms for each
interaction type and node to the log file(s)
-pforce real -1 Print all forces larger than this (kJ/mol nm)
-[no]reprod bool no Try to avoid optimizations that affect binary
reproducibility
-cpt real 15 Checkpoint interval (minutes)
-[no]append bool no Append to previous output files when restarting
from checkpoint
-maxh real -1 Terminate after 0.99 times this time (hours)
-multi int 0 Do multiple simulations in parallel
-replex int 0 Attempt replica exchange every # steps
-reseed int -1 Seed for replica exchange, -1 is generate a seed
-[no]glas bool no Do glass simulation with special long range
corrections
-[no]ionize bool no Do a simulation including the effect of an X-Ray
bombardment on your system
Reading file md.tpr, VERSION 4.0 (single precision)
NOTE: atoms involved in distance restraints should be within the longest
cut-off distance, if this is not the case mdrun generates a fatal error,
in that case use particle decomposition (mdrun option -pd)
WARNING: Can not write distance restraint data to energy file with domain
decomposition
-------------------------------------------------------
Program mdrunmpi, VERSION 4.0.2
Source code file: domdec.c, line: 5842
Fatal error:
There is no domain decomposition for 4 nodes that is compatible with the
given box and a minimum cell size of 3.03524 nm
Change the number of nodes or mdrun option -rdd or -dds
Look in the log file for details on the domain decomposition
-------------------------------------------------------
"It's Bicycle Repair Man !" (Monty Python)
Error on node 0, will try to stop all the nodes
Halting parallel program mdrunmpi on CPU 0 out of 4
-------------------------------------------------------
Program mdrunmpi, VERSION 4.0.2
Source code file: domdec.c, line: 5860
Fatal error:
The size of the domain decomposition grid (0) does not match the number of
nodes (4). The total number of nodes is 4
-------------------------------------------------------
"It's Bicycle Repair Man !" (Monty Python)
Error on node 1, will try to stop all the nodes
Halting parallel program mdrunmpi on CPU 1 out of 4
gcq#205: "It's Bicycle Repair Man !" (Monty Python)
[cli_0]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0
gcq#205: "It's Bicycle Repair Man !" (Monty Python)
[cli_1]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 1
rank 1 in job 66 chong06.chem.pitt.edu_35438 caused collective abort of
all ranks
exit status of rank 1: killed by signal 9
rank 0 in job 66 chong06.chem.pitt.edu_35438 caused collective abort of
all ranks
exit status of rank 0: killed by signal 9
Can anybody explain to me what could be the problem? I was thinking there
is a possibility that there might be another bug.
Thanks in advance!
Maria
More information about the gromacs.org_gmx-users
mailing list