[gmx-users] MPICH or LAM/MPI
Carsten Kutzner
ckutzne at gwdg.de
Tue Jun 27 10:19:44 CEST 2006
Hi Arneh,
do you have the same problem on less processors? Can you run on 1, 2 and 4
procs?
Carsten
Arneh Babakhani wrote:
> Hi All,
>
> Ok, I've successfully created the mpi version of mdrun. Am now trying to
> run my simulation on 32 processors. After processing with grompp and the
> option -np 32, I use mdrun with the following script (where CONF is the
> input file, NPROC is the number of processors):
>
>
> /opt/mpich/intel/bin/mpirun -v -np $NPROC -machinefile \$TMPDIR/machines
> ~/gromacs-mpi/bin/mdrun -np $NPROC -s $CONF -o $CONF -c After$CONF -e
> $CONF -g $CONF >& $CONF.job
>
>
> Everything seems to start up ok, but then GMX stalls (it never actually
> starts the simulation. It stalls for about 7 minutes then completely
> aborts). I've pasted the log file below, which shows that the
> simulation stalls at Step 0, but there's no discernible error (only
> claims that AMD 3D Now support is not available, which makes sense b/c
> I'm not running on AMD).
>
> If you scroll further down, I've also pasted the job file, FullMD7.job,
> which is normally empty if everything is running smoothly. There seems
> to be some errors at the end, but they're rather cryptic to me, nor am I
> sure if this is a cause or effect. If anyone has any suggestions, I'd
> love to hear them.
>
> Thanks,
>
> Arneh
>
>
> *****FullMD70.log******
>
> Log file opened on Mon Jun 26 21:51:55 2006
> Host: compute-0-1.local pid: 13353 nodeid: 0 nnodes: 32
> The Gromacs distribution was built Wed Jun 21 16:01:01 PDT 2006 by
> ababakha at chemcca40.ucsd.edu (Linux 2.6.9-22.ELsmp i686)
>
>
> :-) G R O M A C S (-:
>
> Groningen Machine for Chemical Simulation
>
> :-) VERSION 3.3.1 (-:
>
>
> Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
> Copyright (c) 1991-2000, University of Groningen, The Netherlands.
> Copyright (c) 2001-2006, The GROMACS development team,
> check out http://www.gromacs.org for more information.
>
> This program is free software; you can redistribute it and/or
> modify it under the terms of the GNU General Public License
> as published by the Free Software Foundation; either version 2
> of the License, or (at your option) any later version.
>
> :-) /home/ababakha/gromacs-mpi/bin/mdrun (-:
>
>
> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> E. Lindahl and B. Hess and D. van der Spoel
> GROMACS 3.0: A package for molecular simulation and trajectory analysis
> J. Mol. Mod. 7 (2001) pp. 306-317
> -------- -------- --- Thank You --- -------- --------
>
>
> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> H. J. C. Berendsen, D. van der Spoel and R. van Drunen
> GROMACS: A message-passing parallel molecular dynamics implementation
> Comp. Phys. Comm. 91 (1995) pp. 43-56
> -------- -------- --- Thank You --- -------- --------
>
> CPU= 0, lastcg= 515, targetcg= 5799, myshift= 14
> CPU= 1, lastcg= 1055, targetcg= 6339, myshift= 15
> CPU= 2, lastcg= 1595, targetcg= 6879, myshift= 16
> CPU= 3, lastcg= 2135, targetcg= 7419, myshift= 17
> CPU= 4, lastcg= 2675, targetcg= 7959, myshift= 18
> CPU= 5, lastcg= 3215, targetcg= 8499, myshift= 19
> CPU= 6, lastcg= 3755, targetcg= 9039, myshift= 20
> CPU= 7, lastcg= 4112, targetcg= 9396, myshift= 20
> CPU= 8, lastcg= 4381, targetcg= 9665, myshift= 20
> CPU= 9, lastcg= 4650, targetcg= 9934, myshift= 20
> CPU= 10, lastcg= 4919, targetcg=10203, myshift= 20
> CPU= 11, lastcg= 5188, targetcg=10472, myshift= 20
> CPU= 12, lastcg= 5457, targetcg= 174, myshift= 20
> CPU= 13, lastcg= 5726, targetcg= 443, myshift= 19
> CPU= 14, lastcg= 5995, targetcg= 712, myshift= 19
> CPU= 15, lastcg= 6264, targetcg= 981, myshift= 18
> CPU= 16, lastcg= 6533, targetcg= 1250, myshift= 18
> CPU= 17, lastcg= 6802, targetcg= 1519, myshift= 17
> CPU= 18, lastcg= 7071, targetcg= 1788, myshift= 17
> CPU= 19, lastcg= 7340, targetcg= 2057, myshift= 16
> CPU= 20, lastcg= 7609, targetcg= 2326, myshift= 16
> CPU= 21, lastcg= 7878, targetcg= 2595, myshift= 15
> CPU= 22, lastcg= 8147, targetcg= 2864, myshift= 15
> CPU= 23, lastcg= 8416, targetcg= 3133, myshift= 14
> CPU= 24, lastcg= 8685, targetcg= 3402, myshift= 14
> CPU= 25, lastcg= 8954, targetcg= 3671, myshift= 13
> CPU= 26, lastcg= 9223, targetcg= 3940, myshift= 13
> CPU= 27, lastcg= 9492, targetcg= 4209, myshift= 13
> CPU= 28, lastcg= 9761, targetcg= 4478, myshift= 13
> CPU= 29, lastcg=10029, targetcg= 4746, myshift= 13
> CPU= 30, lastcg=10298, targetcg= 5015, myshift= 13
> CPU= 31, lastcg=10566, targetcg= 5283, myshift= 13
> nsb->shift = 20, nsb->bshift= 0
> Listing Scalars
> nsb->nodeid: 0
> nsb->nnodes: 32
> nsb->cgtotal: 10567
> nsb->natoms: 25925
> nsb->shift: 20
> nsb->bshift: 0
> Nodeid index homenr cgload workload
> 0 0 788 516 516
> 1 788 828 1056 1056
> 2 1616 828 1596 1596
> 3 2444 828 2136 2136
> 4 3272 828 2676 2676
> 5 4100 828 3216 3216
> 6 4928 828 3756 3756
> 7 5756 807 4113 4113
> 8 6563 807 4382 4382
> 9 7370 807 4651 4651
> 10 8177 807 4920 4920
> 11 8984 807 5189 5189
> 12 9791 807 5458 5458
> 13 10598 807 5727 5727
> 14 11405 807 5996 5996
> 15 12212 807 6265 6265
> 16 13019 807 6534 6534
> 17 13826 807 6803 6803
> 18 14633 807 7072 7072
> 19 15440 807 7341 7341
> 20 16247 807 7610 7610
> 21 17054 807 7879 7879
> 22 17861 807 8148 8148
> 23 18668 807 8417 8417
> 24 19475 807 8686 8686
> 25 20282 807 8955 8955
> 26 21089 807 9224 9224
> 27 21896 807 9493 9493
> 28 22703 807 9762 9762
> 29 23510 804 10030 10030
> 30 24314 807 10299 10299
> 31 25121 804 10567 10567
>
> parameters of the run:
> integrator = md
> nsteps = 1500000
> init_step = 0
> ns_type = Grid
> nstlist = 10
> ndelta = 2
> bDomDecomp = FALSE
> decomp_dir = 0
> nstcomm = 1
> comm_mode = Linear
> nstcheckpoint = 1000
> nstlog = 10
> nstxout = 500
> nstvout = 1000
> nstfout = 0
> nstenergy = 10
> nstxtcout = 0
> init_t = 0
> delta_t = 0.002
> xtcprec = 1000
> nkx = 64
> nky = 64
> nkz = 80
> pme_order = 6
> ewald_rtol = 1e-05
> ewald_geometry = 0
> epsilon_surface = 0
> optimize_fft = TRUE
> ePBC = xyz
> bUncStart = FALSE
> bShakeSOR = FALSE
> etc = Berendsen
> epc = Berendsen
> epctype = Semiisotropic
> tau_p = 1
> ref_p (3x3):
> ref_p[ 0]={ 1.00000e+00, 0.00000e+00, 0.00000e+00}
> ref_p[ 1]={ 0.00000e+00, 1.00000e+00, 0.00000e+00}
> ref_p[ 2]={ 0.00000e+00, 0.00000e+00, 1.00000e+00}
> compress (3x3):
> compress[ 0]={ 4.50000e-05, 0.00000e+00, 0.00000e+00}
> compress[ 1]={ 0.00000e+00, 4.50000e-05, 0.00000e+00}
> compress[ 2]={ 0.00000e+00, 0.00000e+00, 1.00000e-30}
> andersen_seed = 815131
> rlist = 0.9
> coulombtype = PME
> rcoulomb_switch = 0
> rcoulomb = 0.9
> vdwtype = Cut-off
> rvdw_switch = 0
> rvdw = 1.4
> epsilon_r = 1
> epsilon_rf = 1
> tabext = 1
> gb_algorithm = Still
> nstgbradii = 1
> rgbradii = 2
> gb_saltconc = 0
> implicit_solvent = No
> DispCorr = No
> fudgeQQ = 1
> free_energy = no
> init_lambda = 0
> sc_alpha = 0
> sc_power = 0
> sc_sigma = 0.3
> delta_lambda = 0
> disre_weighting = Conservative
> disre_mixed = FALSE
> dr_fc = 1000
> dr_tau = 0
> nstdisreout = 100
> orires_fc = 0
> orires_tau = 0
> nstorireout = 100
> dihre-fc = 1000
> dihre-tau = 0
> nstdihreout = 100
> em_stepsize = 0.01
> em_tol = 10
> niter = 20
> fc_stepsize = 0
> nstcgsteep = 1000
> nbfgscorr = 10
> ConstAlg = Lincs
> shake_tol = 1e-04
> lincs_order = 4
> lincs_warnangle = 30
> lincs_iter = 1
> bd_fric = 0
> ld_seed = 1993
> cos_accel = 0
> deform (3x3):
> deform[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
> deform[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
> deform[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
> userint1 = 0
> userint2 = 0
> userint3 = 0
> userint4 = 0
> userreal1 = 0
> userreal2 = 0
> userreal3 = 0
> userreal4 = 0
> grpopts:
> nrdf: 11903.3 39783.7 285.983
> ref_t: 310 310 310
> tau_t: 0.1 0.1 0.1
> anneal: No No No
> ann_npoints: 0 0 0
> acc: 0 0 0
> nfreeze: N N N
> energygrp_flags[ 0]: 0
> efield-x:
> n = 0
> efield-xt:
> n = 0
> efield-y:
> n = 0
> efield-yt:
> n = 0
> efield-z:
> n = 0
> efield-zt:
> n = 0
> bQMMM = FALSE
> QMconstraints = 0
> QMMMscheme = 0
> scalefactor = 1
> qm_opts:
> ngQM = 0
> Max number of graph edges per atom is 4
> Table routines are used for coulomb: TRUE
> Table routines are used for vdw: FALSE
> Using a Gaussian width (1/beta) of 0.288146 nm for Ewald
> Cut-off's: NS: 0.9 Coulomb: 0.9 LJ: 1.4
> System total charge: 0.000
> Generated table with 1200 data points for Ewald.
> Tabscale = 500 points/nm
> Generated table with 1200 data points for LJ6.
> Tabscale = 500 points/nm
> Generated table with 1200 data points for LJ12.
> Tabscale = 500 points/nm
> Generated table with 500 data points for 1-4 COUL.
> Tabscale = 500 points/nm
> Generated table with 500 data points for 1-4 LJ6.
> Tabscale = 500 points/nm
> Generated table with 500 data points for 1-4 LJ12.
> Tabscale = 500 points/nm
>
> Enabling SPC water optimization for 6631 molecules.
>
> Will do PME sum in reciprocal space.
>
> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> U. Essman, L. Perela, M. L. Berkowitz, T. Darden, H. Lee and L. G. Pedersen
> A smooth particle mesh Ewald method
> J. Chem. Phys. 103 (1995) pp. 8577-8592
> -------- -------- --- Thank You --- -------- --------
>
> Parallelized PME sum used.
> PARALLEL FFT DATA:
> local_nx: 2 local_x_start: 0
> local_ny_after_transpose: 2 local_y_start_after_transpose 0
> Removing pbc first time
> Done rmpbc
> Center of mass motion removal mode is Linear
> We have the following groups for center of mass motion removal:
> 0: rest, initial mass: 207860
> There are: 788 Atoms
>
> Constraining the starting coordinates (step -2)
>
> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> H. J. C. Berendsen, J. P. M. Postma, A. DiNola and J. R. Haak
> Molecular dynamics with coupling to an external bath
> J. Chem. Phys. 81 (1984) pp. 3684-3690
> -------- -------- --- Thank You --- -------- --------
>
>
> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> B. Hess and H. Bekker and H. J. C. Berendsen and J. G. E. M. Fraaije
> LINCS: A Linear Constraint Solver for molecular simulations
> J. Comp. Chem. 18 (1997) pp. 1463-1472
> -------- -------- --- Thank You --- -------- --------
>
>
> Initializing LINear Constraint Solver
> number of constraints is 776
> average number of constraints coupled to one constraint is 2.5
>
> Rel. Constraint Deviation: Max between atoms RMS
> Before LINCS 0.008664 87 88 0.003001
> After LINCS 0.000036 95 96 0.000005
>
>
> Constraining the coordinates at t0-dt (step -1)
> Rel. Constraint Deviation: Max between atoms RMS
> Before LINCS 0.093829 12 13 0.009919
> After LINCS 0.000131 11 14 0.000021
>
> Started mdrun on node 0 Mon Jun 26 21:52:34 2006
> Initial temperature: 310.388 K
> Step Time Lambda
> 0 0.00000 0.00000
>
> Grid: 8 x 8 x 13 cells
> Configuring nonbonded kernels...
> Testing AMD 3DNow support... not present.
> Testing ia32 SSE support... present.
>
>
>
>
>
>
> ********FullMD7.job***************
>
> *running /home/ababakha/gromacs-mpi/bin/mdrun on 32 LINUX ch_p4 processors
> Created /home/ababakha/SMDPeptideSimulation/CapParSMD/FullMD/PI12637
> NNODES=32, MYRANK=0, HOSTNAME=compute-0-1.local
> NNODES=32, MYRANK=1, HOSTNAME=compute-0-1.local
> NNODES=32, MYRANK=30, HOSTNAME=compute-0-29.local
> NNODES=32, MYRANK=24, HOSTNAME=compute-0-12.local
> NNODES=32, MYRANK=28, HOSTNAME=compute-0-30.local
> NNODES=32, MYRANK=3, HOSTNAME=compute-0-26.local
> NNODES=32, MYRANK=14, HOSTNAME=compute-0-22.local
> NNODES=32, MYRANK=6, HOSTNAME=compute-0-31.local
> NNODES=32, MYRANK=8, HOSTNAME=compute-0-20.local
> NNODES=32, MYRANK=7, HOSTNAME=compute-0-31.local
> NNODES=32, MYRANK=18, HOSTNAME=compute-0-27.local
> NNODES=32, MYRANK=2, HOSTNAME=compute-0-26.local
> NNODES=32, MYRANK=23, HOSTNAME=compute-0-4.local
> NNODES=32, MYRANK=31, HOSTNAME=compute-0-29.local
> NNODES=32, MYRANK=5, HOSTNAME=compute-0-21.local
> NNODES=32, MYRANK=27, HOSTNAME=compute-0-3.local
> NNODES=32, MYRANK=4, HOSTNAME=compute-0-21.local
> NNODES=32, MYRANK=20, HOSTNAME=compute-0-8.local
> NNODES=32, MYRANK=11, HOSTNAME=compute-0-7.local
> NNODES=32, MYRANK=9, HOSTNAME=compute-0-20.local
> NNODES=32, MYRANK=12, HOSTNAME=compute-0-19.local
> NNODES=32, MYRANK=13, HOSTNAME=compute-0-19.local
> NNODES=32, MYRANK=21, HOSTNAME=compute-0-8.local
> NNODES=32, MYRANK=22, HOSTNAME=compute-0-4.local
> NNODES=32, MYRANK=10, HOSTNAME=compute-0-7.local
> NNODES=32, MYRANK=17, HOSTNAME=compute-0-25.local
> NNODES=32, MYRANK=25, HOSTNAME=compute-0-12.local
> NNODES=32, MYRANK=15, HOSTNAME=compute-0-22.local
> NNODES=32, MYRANK=29, HOSTNAME=compute-0-30.local
> NNODES=32, MYRANK=19, HOSTNAME=compute-0-27.local
> NNODES=32, MYRANK=26, HOSTNAME=compute-0-3.local
> NNODES=32, MYRANK=16, HOSTNAME=compute-0-25.local
> NODEID=26 argc=13
> NODEID=25 argc=13
> NODEID=24 argc=13
> NODEID=23 argc=13
> NODEID=22 argc=13
> NODEID=21 argc=13
> NODEID=20 argc=13
> NODEID=19 argc=13
> NODEID=18 argc=13
> NODEID=13 argc=13
> NODEID=17 argc=13
> NODEID=15 argc=13
> NODEID=14 argc=13
> NODEID=16 argc=13
> NODEID=0 argc=13
> NODEID=12 argc=13
> NODEID=6 argc=13
> NODEID=11 argc=13
> NODEID=1 argc=13
> NODEID=10 argc=13
> NODEID=5 argc=13
> NODEID=30 argc=13
> NODEID=7 argc=13
> NODEID=27 argc=13
> NODEID=31 argc=13
> NODEID=2 argc=13
> NODEID=9 argc=13
> NODEID=28 argc=13
> NODEID=4 argc=13
> NODEID=29 argc=13
> NODEID=8 argc=13
> NODEID=3 argc=13
> :-) G R O M A C S (-:
>
> Groningen Machine for Chemical Simulation
>
> :-) VERSION 3.3.1 (-:
>
>
> Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
> Copyright (c) 1991-2000, University of Groningen, The Netherlands.
> Copyright (c) 2001-2006, The GROMACS development team,
> check out http://www.gromacs.org for more information.
>
> This program is free software; you can redistribute it and/or
> modify it under the terms of the GNU General Public License
> as published by the Free Software Foundation; either version 2
> of the License, or (at your option) any later version.
>
> :-) /home/ababakha/gromacs-mpi/bin/mdrun (-:
>
> Option Filename Type Description
> ------------------------------------------------------------
> -s FullMD7.tpr Input Generic run input: tpr tpb tpa xml
> -o FullMD7.trr Output Full precision trajectory: trr trj
> -x traj.xtc Output, Opt. Compressed trajectory (portable xdr
> format)
> -c AfterFullMD7.gro Output Generic structure: gro g96 pdb xml
> -e FullMD7.edr Output Generic energy: edr ene
> -g FullMD7.log Output Log file
> -dgdl dgdl.xvg Output, Opt. xvgr/xmgr file
> -field field.xvg Output, Opt. xvgr/xmgr file
> -table table.xvg Input, Opt. xvgr/xmgr file
> -tablep tablep.xvg Input, Opt. xvgr/xmgr file
> -rerun rerun.xtc Input, Opt. Generic trajectory: xtc trr trj gro
> g96 pdb
> -tpi tpi.xvg Output, Opt. xvgr/xmgr file
> -ei sam.edi Input, Opt. ED sampling input
> -eo sam.edo Output, Opt. ED sampling output
> -j wham.gct Input, Opt. General coupling stuff
> -jo bam.gct Output, Opt. General coupling stuff
> -ffout gct.xvg Output, Opt. xvgr/xmgr file
> -devout deviatie.xvg Output, Opt. xvgr/xmgr file
> -runav runaver.xvg Output, Opt. xvgr/xmgr file
> -pi pull.ppa Input, Opt. Pull parameters
> -po pullout.ppa Output, Opt. Pull parameters
> -pd pull.pdo Output, Opt. Pull data output
> -pn pull.ndx Input, Opt. Index file
> -mtx nm.mtx Output, Opt. Hessian matrix
> -dn dipole.ndx Output, Opt. Index file
>
> Option Type Value Description
> ------------------------------------------------------
> -[no]h bool no Print help info and quit
> -[no]X bool no Use dialog box GUI to edit command line options
> -nice int 19 Set the nicelevel
> -deffnm string Set the default filename for all file options
> -[no]xvgr bool yes Add specific codes (legends etc.) in the output
> xvg files for the xmgrace program
> -np int 32 Number of nodes, must be the same as used for
> grompp
> -nt int 1 Number of threads to start on each node
> -[no]v bool no Be loud and noisy
> -[no]compact bool yes Write a compact log file
> -[no]sepdvdl bool no Write separate V and dVdl terms for each
> interaction type and node to the log file(s)
> -[no]multi bool no Do multiple simulations in parallel (only with
> -np > 1)
> -replex int 0 Attempt replica exchange every # steps
> -reseed int -1 Seed for replica exchange, -1 is generate a seed
> -[no]glas bool no Do glass simulation with special long range
> corrections
> -[no]ionize bool no Do a simulation including the effect of an X-Ray
> bombardment on your system
>
> Reading file FullMD7.tpr, VERSION 3.3.1 (single precision)
> starting mdrun 'My membrane with peptides in water'
> 1500000 steps, 3000.0 ps.
>
> p30_10831: p4_error: Timeout in establishing connection to remote
> process: 0
> rm_l_30_10832: (341.608281) net_send: could not write to fd=5, errno = 32
> rm_l_31_10896: (341.269706) net_send: could not write to fd=5, errno = 32
> p30_10831: (343.634411) net_send: could not write to fd=5, errno = 32
> p31_10895: (343.296105) net_send: could not write to fd=5, errno = 32
> p0_13353: p4_error: net_recv read: probable EOF on socket: 1
> Killed by signal 2.
> Killed by signal 2.
> Killed by signal 2.
> Killed by signal 2.
> Killed by signal 2.
> p0_13353: (389.926083) net_send: could not write to fd=4, errno = 32
>
>
> _______________________________________________
> gmx-users mailing list gmx-users at gromacs.org
> http://www.gromacs.org/mailman/listinfo/gmx-users
> Please don't post (un)subscribe requests to the list. Use the www
> interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/mailing_lists/users.php
--
Dr. Carsten Kutzner
Max Planck Institute for Biophysical Chemistry
Theoretical and Computational Biophysics Department
Am Fassberg 11
37077 Goettingen, Germany
Tel. +49-551-2012313, Fax: +49-551-2012302
http://www.mpibpc.mpg.de/research/dep/grubmueller/
http://www.gwdg.de/~ckutzne
More information about the gromacs.org_gmx-users
mailing list