[gmx-users] MPICH or LAM/MPI
Arneh Babakhani
ababakha at mccammon.ucsd.edu
Tue Jun 27 07:30:49 CEST 2006
Hi All,
Ok, I've successfully created the mpi version of mdrun. Am now trying to
run my simulation on 32 processors. After processing with grompp and the
option -np 32, I use mdrun with the following script (where CONF is the
input file, NPROC is the number of processors):
/opt/mpich/intel/bin/mpirun -v -np $NPROC -machinefile \$TMPDIR/machines
~/gromacs-mpi/bin/mdrun -np $NPROC -s $CONF -o $CONF -c After$CONF -e
$CONF -g $CONF >& $CONF.job
Everything seems to start up ok, but then GMX stalls (it never actually
starts the simulation. It stalls for about 7 minutes then completely
aborts). I've pasted the log file below, which shows that the
simulation stalls at Step 0, but there's no discernible error (only
claims that AMD 3D Now support is not available, which makes sense b/c
I'm not running on AMD).
If you scroll further down, I've also pasted the job file, FullMD7.job,
which is normally empty if everything is running smoothly. There seems
to be some errors at the end, but they're rather cryptic to me, nor am I
sure if this is a cause or effect. If anyone has any suggestions, I'd
love to hear them.
Thanks,
Arneh
*****FullMD70.log******
Log file opened on Mon Jun 26 21:51:55 2006
Host: compute-0-1.local pid: 13353 nodeid: 0 nnodes: 32
The Gromacs distribution was built Wed Jun 21 16:01:01 PDT 2006 by
ababakha at chemcca40.ucsd.edu (Linux 2.6.9-22.ELsmp i686)
:-) G R O M A C S (-:
Groningen Machine for Chemical Simulation
:-) VERSION 3.3.1 (-:
Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2006, The GROMACS development team,
check out http://www.gromacs.org for more information.
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License
as published by the Free Software Foundation; either version 2
of the License, or (at your option) any later version.
:-) /home/ababakha/gromacs-mpi/bin/mdrun (-:
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
E. Lindahl and B. Hess and D. van der Spoel
GROMACS 3.0: A package for molecular simulation and trajectory analysis
J. Mol. Mod. 7 (2001) pp. 306-317
-------- -------- --- Thank You --- -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
H. J. C. Berendsen, D. van der Spoel and R. van Drunen
GROMACS: A message-passing parallel molecular dynamics implementation
Comp. Phys. Comm. 91 (1995) pp. 43-56
-------- -------- --- Thank You --- -------- --------
CPU= 0, lastcg= 515, targetcg= 5799, myshift= 14
CPU= 1, lastcg= 1055, targetcg= 6339, myshift= 15
CPU= 2, lastcg= 1595, targetcg= 6879, myshift= 16
CPU= 3, lastcg= 2135, targetcg= 7419, myshift= 17
CPU= 4, lastcg= 2675, targetcg= 7959, myshift= 18
CPU= 5, lastcg= 3215, targetcg= 8499, myshift= 19
CPU= 6, lastcg= 3755, targetcg= 9039, myshift= 20
CPU= 7, lastcg= 4112, targetcg= 9396, myshift= 20
CPU= 8, lastcg= 4381, targetcg= 9665, myshift= 20
CPU= 9, lastcg= 4650, targetcg= 9934, myshift= 20
CPU= 10, lastcg= 4919, targetcg=10203, myshift= 20
CPU= 11, lastcg= 5188, targetcg=10472, myshift= 20
CPU= 12, lastcg= 5457, targetcg= 174, myshift= 20
CPU= 13, lastcg= 5726, targetcg= 443, myshift= 19
CPU= 14, lastcg= 5995, targetcg= 712, myshift= 19
CPU= 15, lastcg= 6264, targetcg= 981, myshift= 18
CPU= 16, lastcg= 6533, targetcg= 1250, myshift= 18
CPU= 17, lastcg= 6802, targetcg= 1519, myshift= 17
CPU= 18, lastcg= 7071, targetcg= 1788, myshift= 17
CPU= 19, lastcg= 7340, targetcg= 2057, myshift= 16
CPU= 20, lastcg= 7609, targetcg= 2326, myshift= 16
CPU= 21, lastcg= 7878, targetcg= 2595, myshift= 15
CPU= 22, lastcg= 8147, targetcg= 2864, myshift= 15
CPU= 23, lastcg= 8416, targetcg= 3133, myshift= 14
CPU= 24, lastcg= 8685, targetcg= 3402, myshift= 14
CPU= 25, lastcg= 8954, targetcg= 3671, myshift= 13
CPU= 26, lastcg= 9223, targetcg= 3940, myshift= 13
CPU= 27, lastcg= 9492, targetcg= 4209, myshift= 13
CPU= 28, lastcg= 9761, targetcg= 4478, myshift= 13
CPU= 29, lastcg=10029, targetcg= 4746, myshift= 13
CPU= 30, lastcg=10298, targetcg= 5015, myshift= 13
CPU= 31, lastcg=10566, targetcg= 5283, myshift= 13
nsb->shift = 20, nsb->bshift= 0
Listing Scalars
nsb->nodeid: 0
nsb->nnodes: 32
nsb->cgtotal: 10567
nsb->natoms: 25925
nsb->shift: 20
nsb->bshift: 0
Nodeid index homenr cgload workload
0 0 788 516 516
1 788 828 1056 1056
2 1616 828 1596 1596
3 2444 828 2136 2136
4 3272 828 2676 2676
5 4100 828 3216 3216
6 4928 828 3756 3756
7 5756 807 4113 4113
8 6563 807 4382 4382
9 7370 807 4651 4651
10 8177 807 4920 4920
11 8984 807 5189 5189
12 9791 807 5458 5458
13 10598 807 5727 5727
14 11405 807 5996 5996
15 12212 807 6265 6265
16 13019 807 6534 6534
17 13826 807 6803 6803
18 14633 807 7072 7072
19 15440 807 7341 7341
20 16247 807 7610 7610
21 17054 807 7879 7879
22 17861 807 8148 8148
23 18668 807 8417 8417
24 19475 807 8686 8686
25 20282 807 8955 8955
26 21089 807 9224 9224
27 21896 807 9493 9493
28 22703 807 9762 9762
29 23510 804 10030 10030
30 24314 807 10299 10299
31 25121 804 10567 10567
parameters of the run:
integrator = md
nsteps = 1500000
init_step = 0
ns_type = Grid
nstlist = 10
ndelta = 2
bDomDecomp = FALSE
decomp_dir = 0
nstcomm = 1
comm_mode = Linear
nstcheckpoint = 1000
nstlog = 10
nstxout = 500
nstvout = 1000
nstfout = 0
nstenergy = 10
nstxtcout = 0
init_t = 0
delta_t = 0.002
xtcprec = 1000
nkx = 64
nky = 64
nkz = 80
pme_order = 6
ewald_rtol = 1e-05
ewald_geometry = 0
epsilon_surface = 0
optimize_fft = TRUE
ePBC = xyz
bUncStart = FALSE
bShakeSOR = FALSE
etc = Berendsen
epc = Berendsen
epctype = Semiisotropic
tau_p = 1
ref_p (3x3):
ref_p[ 0]={ 1.00000e+00, 0.00000e+00, 0.00000e+00}
ref_p[ 1]={ 0.00000e+00, 1.00000e+00, 0.00000e+00}
ref_p[ 2]={ 0.00000e+00, 0.00000e+00, 1.00000e+00}
compress (3x3):
compress[ 0]={ 4.50000e-05, 0.00000e+00, 0.00000e+00}
compress[ 1]={ 0.00000e+00, 4.50000e-05, 0.00000e+00}
compress[ 2]={ 0.00000e+00, 0.00000e+00, 1.00000e-30}
andersen_seed = 815131
rlist = 0.9
coulombtype = PME
rcoulomb_switch = 0
rcoulomb = 0.9
vdwtype = Cut-off
rvdw_switch = 0
rvdw = 1.4
epsilon_r = 1
epsilon_rf = 1
tabext = 1
gb_algorithm = Still
nstgbradii = 1
rgbradii = 2
gb_saltconc = 0
implicit_solvent = No
DispCorr = No
fudgeQQ = 1
free_energy = no
init_lambda = 0
sc_alpha = 0
sc_power = 0
sc_sigma = 0.3
delta_lambda = 0
disre_weighting = Conservative
disre_mixed = FALSE
dr_fc = 1000
dr_tau = 0
nstdisreout = 100
orires_fc = 0
orires_tau = 0
nstorireout = 100
dihre-fc = 1000
dihre-tau = 0
nstdihreout = 100
em_stepsize = 0.01
em_tol = 10
niter = 20
fc_stepsize = 0
nstcgsteep = 1000
nbfgscorr = 10
ConstAlg = Lincs
shake_tol = 1e-04
lincs_order = 4
lincs_warnangle = 30
lincs_iter = 1
bd_fric = 0
ld_seed = 1993
cos_accel = 0
deform (3x3):
deform[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
deform[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
deform[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
userint1 = 0
userint2 = 0
userint3 = 0
userint4 = 0
userreal1 = 0
userreal2 = 0
userreal3 = 0
userreal4 = 0
grpopts:
nrdf: 11903.3 39783.7 285.983
ref_t: 310 310 310
tau_t: 0.1 0.1 0.1
anneal: No No No
ann_npoints: 0 0 0
acc: 0 0 0
nfreeze: N N N
energygrp_flags[ 0]: 0
efield-x:
n = 0
efield-xt:
n = 0
efield-y:
n = 0
efield-yt:
n = 0
efield-z:
n = 0
efield-zt:
n = 0
bQMMM = FALSE
QMconstraints = 0
QMMMscheme = 0
scalefactor = 1
qm_opts:
ngQM = 0
Max number of graph edges per atom is 4
Table routines are used for coulomb: TRUE
Table routines are used for vdw: FALSE
Using a Gaussian width (1/beta) of 0.288146 nm for Ewald
Cut-off's: NS: 0.9 Coulomb: 0.9 LJ: 1.4
System total charge: 0.000
Generated table with 1200 data points for Ewald.
Tabscale = 500 points/nm
Generated table with 1200 data points for LJ6.
Tabscale = 500 points/nm
Generated table with 1200 data points for LJ12.
Tabscale = 500 points/nm
Generated table with 500 data points for 1-4 COUL.
Tabscale = 500 points/nm
Generated table with 500 data points for 1-4 LJ6.
Tabscale = 500 points/nm
Generated table with 500 data points for 1-4 LJ12.
Tabscale = 500 points/nm
Enabling SPC water optimization for 6631 molecules.
Will do PME sum in reciprocal space.
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
U. Essman, L. Perela, M. L. Berkowitz, T. Darden, H. Lee and L. G. Pedersen
A smooth particle mesh Ewald method
J. Chem. Phys. 103 (1995) pp. 8577-8592
-------- -------- --- Thank You --- -------- --------
Parallelized PME sum used.
PARALLEL FFT DATA:
local_nx: 2 local_x_start: 0
local_ny_after_transpose: 2 local_y_start_after_transpose 0
Removing pbc first time
Done rmpbc
Center of mass motion removal mode is Linear
We have the following groups for center of mass motion removal:
0: rest, initial mass: 207860
There are: 788 Atoms
Constraining the starting coordinates (step -2)
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
H. J. C. Berendsen, J. P. M. Postma, A. DiNola and J. R. Haak
Molecular dynamics with coupling to an external bath
J. Chem. Phys. 81 (1984) pp. 3684-3690
-------- -------- --- Thank You --- -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
B. Hess and H. Bekker and H. J. C. Berendsen and J. G. E. M. Fraaije
LINCS: A Linear Constraint Solver for molecular simulations
J. Comp. Chem. 18 (1997) pp. 1463-1472
-------- -------- --- Thank You --- -------- --------
Initializing LINear Constraint Solver
number of constraints is 776
average number of constraints coupled to one constraint is 2.5
Rel. Constraint Deviation: Max between atoms RMS
Before LINCS 0.008664 87 88 0.003001
After LINCS 0.000036 95 96 0.000005
Constraining the coordinates at t0-dt (step -1)
Rel. Constraint Deviation: Max between atoms RMS
Before LINCS 0.093829 12 13 0.009919
After LINCS 0.000131 11 14 0.000021
Started mdrun on node 0 Mon Jun 26 21:52:34 2006
Initial temperature: 310.388 K
Step Time Lambda
0 0.00000 0.00000
Grid: 8 x 8 x 13 cells
Configuring nonbonded kernels...
Testing AMD 3DNow support... not present.
Testing ia32 SSE support... present.
********FullMD7.job***************
*running /home/ababakha/gromacs-mpi/bin/mdrun on 32 LINUX ch_p4 processors
Created /home/ababakha/SMDPeptideSimulation/CapParSMD/FullMD/PI12637
NNODES=32, MYRANK=0, HOSTNAME=compute-0-1.local
NNODES=32, MYRANK=1, HOSTNAME=compute-0-1.local
NNODES=32, MYRANK=30, HOSTNAME=compute-0-29.local
NNODES=32, MYRANK=24, HOSTNAME=compute-0-12.local
NNODES=32, MYRANK=28, HOSTNAME=compute-0-30.local
NNODES=32, MYRANK=3, HOSTNAME=compute-0-26.local
NNODES=32, MYRANK=14, HOSTNAME=compute-0-22.local
NNODES=32, MYRANK=6, HOSTNAME=compute-0-31.local
NNODES=32, MYRANK=8, HOSTNAME=compute-0-20.local
NNODES=32, MYRANK=7, HOSTNAME=compute-0-31.local
NNODES=32, MYRANK=18, HOSTNAME=compute-0-27.local
NNODES=32, MYRANK=2, HOSTNAME=compute-0-26.local
NNODES=32, MYRANK=23, HOSTNAME=compute-0-4.local
NNODES=32, MYRANK=31, HOSTNAME=compute-0-29.local
NNODES=32, MYRANK=5, HOSTNAME=compute-0-21.local
NNODES=32, MYRANK=27, HOSTNAME=compute-0-3.local
NNODES=32, MYRANK=4, HOSTNAME=compute-0-21.local
NNODES=32, MYRANK=20, HOSTNAME=compute-0-8.local
NNODES=32, MYRANK=11, HOSTNAME=compute-0-7.local
NNODES=32, MYRANK=9, HOSTNAME=compute-0-20.local
NNODES=32, MYRANK=12, HOSTNAME=compute-0-19.local
NNODES=32, MYRANK=13, HOSTNAME=compute-0-19.local
NNODES=32, MYRANK=21, HOSTNAME=compute-0-8.local
NNODES=32, MYRANK=22, HOSTNAME=compute-0-4.local
NNODES=32, MYRANK=10, HOSTNAME=compute-0-7.local
NNODES=32, MYRANK=17, HOSTNAME=compute-0-25.local
NNODES=32, MYRANK=25, HOSTNAME=compute-0-12.local
NNODES=32, MYRANK=15, HOSTNAME=compute-0-22.local
NNODES=32, MYRANK=29, HOSTNAME=compute-0-30.local
NNODES=32, MYRANK=19, HOSTNAME=compute-0-27.local
NNODES=32, MYRANK=26, HOSTNAME=compute-0-3.local
NNODES=32, MYRANK=16, HOSTNAME=compute-0-25.local
NODEID=26 argc=13
NODEID=25 argc=13
NODEID=24 argc=13
NODEID=23 argc=13
NODEID=22 argc=13
NODEID=21 argc=13
NODEID=20 argc=13
NODEID=19 argc=13
NODEID=18 argc=13
NODEID=13 argc=13
NODEID=17 argc=13
NODEID=15 argc=13
NODEID=14 argc=13
NODEID=16 argc=13
NODEID=0 argc=13
NODEID=12 argc=13
NODEID=6 argc=13
NODEID=11 argc=13
NODEID=1 argc=13
NODEID=10 argc=13
NODEID=5 argc=13
NODEID=30 argc=13
NODEID=7 argc=13
NODEID=27 argc=13
NODEID=31 argc=13
NODEID=2 argc=13
NODEID=9 argc=13
NODEID=28 argc=13
NODEID=4 argc=13
NODEID=29 argc=13
NODEID=8 argc=13
NODEID=3 argc=13
:-) G R O M A C S (-:
Groningen Machine for Chemical Simulation
:-) VERSION 3.3.1 (-:
Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2006, The GROMACS development team,
check out http://www.gromacs.org for more information.
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License
as published by the Free Software Foundation; either version 2
of the License, or (at your option) any later version.
:-) /home/ababakha/gromacs-mpi/bin/mdrun (-:
Option Filename Type Description
------------------------------------------------------------
-s FullMD7.tpr Input Generic run input: tpr tpb tpa xml
-o FullMD7.trr Output Full precision trajectory: trr trj
-x traj.xtc Output, Opt. Compressed trajectory (portable xdr
format)
-c AfterFullMD7.gro Output Generic structure: gro g96 pdb xml
-e FullMD7.edr Output Generic energy: edr ene
-g FullMD7.log Output Log file
-dgdl dgdl.xvg Output, Opt. xvgr/xmgr file
-field field.xvg Output, Opt. xvgr/xmgr file
-table table.xvg Input, Opt. xvgr/xmgr file
-tablep tablep.xvg Input, Opt. xvgr/xmgr file
-rerun rerun.xtc Input, Opt. Generic trajectory: xtc trr trj gro
g96 pdb
-tpi tpi.xvg Output, Opt. xvgr/xmgr file
-ei sam.edi Input, Opt. ED sampling input
-eo sam.edo Output, Opt. ED sampling output
-j wham.gct Input, Opt. General coupling stuff
-jo bam.gct Output, Opt. General coupling stuff
-ffout gct.xvg Output, Opt. xvgr/xmgr file
-devout deviatie.xvg Output, Opt. xvgr/xmgr file
-runav runaver.xvg Output, Opt. xvgr/xmgr file
-pi pull.ppa Input, Opt. Pull parameters
-po pullout.ppa Output, Opt. Pull parameters
-pd pull.pdo Output, Opt. Pull data output
-pn pull.ndx Input, Opt. Index file
-mtx nm.mtx Output, Opt. Hessian matrix
-dn dipole.ndx Output, Opt. Index file
Option Type Value Description
------------------------------------------------------
-[no]h bool no Print help info and quit
-[no]X bool no Use dialog box GUI to edit command line options
-nice int 19 Set the nicelevel
-deffnm string Set the default filename for all file options
-[no]xvgr bool yes Add specific codes (legends etc.) in the output
xvg files for the xmgrace program
-np int 32 Number of nodes, must be the same as used for
grompp
-nt int 1 Number of threads to start on each node
-[no]v bool no Be loud and noisy
-[no]compact bool yes Write a compact log file
-[no]sepdvdl bool no Write separate V and dVdl terms for each
interaction type and node to the log file(s)
-[no]multi bool no Do multiple simulations in parallel (only with
-np > 1)
-replex int 0 Attempt replica exchange every # steps
-reseed int -1 Seed for replica exchange, -1 is generate a seed
-[no]glas bool no Do glass simulation with special long range
corrections
-[no]ionize bool no Do a simulation including the effect of an X-Ray
bombardment on your system
Reading file FullMD7.tpr, VERSION 3.3.1 (single precision)
starting mdrun 'My membrane with peptides in water'
1500000 steps, 3000.0 ps.
p30_10831: p4_error: Timeout in establishing connection to remote
process: 0
rm_l_30_10832: (341.608281) net_send: could not write to fd=5, errno = 32
rm_l_31_10896: (341.269706) net_send: could not write to fd=5, errno = 32
p30_10831: (343.634411) net_send: could not write to fd=5, errno = 32
p31_10895: (343.296105) net_send: could not write to fd=5, errno = 32
p0_13353: p4_error: net_recv read: probable EOF on socket: 1
Killed by signal 2.
Killed by signal 2.
Killed by signal 2.
Killed by signal 2.
Killed by signal 2.
p0_13353: (389.926083) net_send: could not write to fd=4, errno = 32
More information about the gromacs.org_gmx-users
mailing list