[gmx-users] MPICH or LAM/MPI

Carsten Kutzner ckutzne at gwdg.de
Tue Jun 27 10:19:44 CEST 2006


Hi Arneh,

do you have the same problem on less processors? Can you run on 1, 2 and 4
procs?

Carsten


Arneh Babakhani wrote:
> Hi All,
> 
> Ok, I've successfully created the mpi version of mdrun. Am now trying to
> run my simulation on 32 processors. After processing with grompp and the
> option -np 32, I use mdrun with the following script (where CONF is the
> input file, NPROC is the number of processors):
> 
> 
> /opt/mpich/intel/bin/mpirun -v -np $NPROC -machinefile \$TMPDIR/machines
> ~/gromacs-mpi/bin/mdrun -np $NPROC -s $CONF -o $CONF -c After$CONF -e
> $CONF -g $CONF >& $CONF.job
> 
> 
> Everything seems to start up ok, but then GMX stalls (it never actually
> starts the simulation. It stalls for about 7 minutes then completely
> aborts).  I've pasted the log file below, which shows that the
> simulation stalls at Step 0, but there's no discernible error (only
> claims that AMD 3D Now support is not available, which makes sense b/c
> I'm not running on AMD).
> 
> If you scroll further down, I've also pasted the job file, FullMD7.job,
> which is normally empty if everything is running smoothly.  There seems
> to be some errors at the end, but they're rather cryptic to me, nor am I
> sure if this is a cause or effect.  If anyone has any suggestions, I'd
> love to hear them.
> 
> Thanks,
> 
> Arneh
> 
> 
> *****FullMD70.log******
> 
> Log file opened on Mon Jun 26 21:51:55 2006
> Host: compute-0-1.local  pid: 13353  nodeid: 0  nnodes:  32
> The Gromacs distribution was built Wed Jun 21 16:01:01 PDT 2006 by
> ababakha at chemcca40.ucsd.edu (Linux 2.6.9-22.ELsmp i686)
> 
> 
>                         :-)  G  R  O  M  A  C  S  (-:
> 
>                   Groningen Machine for Chemical Simulation
> 
>                            :-)  VERSION 3.3.1  (-:
> 
> 
>      Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
>       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
>             Copyright (c) 2001-2006, The GROMACS development team,
>            check out http://www.gromacs.org for more information.
> 
>         This program is free software; you can redistribute it and/or
>          modify it under the terms of the GNU General Public License
>         as published by the Free Software Foundation; either version 2
>             of the License, or (at your option) any later version.
> 
>                 :-)  /home/ababakha/gromacs-mpi/bin/mdrun  (-:
> 
> 
> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> E. Lindahl and B. Hess and D. van der Spoel
> GROMACS 3.0: A package for molecular simulation and trajectory analysis
> J. Mol. Mod. 7 (2001) pp. 306-317
> -------- -------- --- Thank You --- -------- --------
> 
> 
> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> H. J. C. Berendsen, D. van der Spoel and R. van Drunen
> GROMACS: A message-passing parallel molecular dynamics implementation
> Comp. Phys. Comm. 91 (1995) pp. 43-56
> -------- -------- --- Thank You --- -------- --------
> 
> CPU=  0, lastcg=  515, targetcg= 5799, myshift=   14
> CPU=  1, lastcg= 1055, targetcg= 6339, myshift=   15
> CPU=  2, lastcg= 1595, targetcg= 6879, myshift=   16
> CPU=  3, lastcg= 2135, targetcg= 7419, myshift=   17
> CPU=  4, lastcg= 2675, targetcg= 7959, myshift=   18
> CPU=  5, lastcg= 3215, targetcg= 8499, myshift=   19
> CPU=  6, lastcg= 3755, targetcg= 9039, myshift=   20
> CPU=  7, lastcg= 4112, targetcg= 9396, myshift=   20
> CPU=  8, lastcg= 4381, targetcg= 9665, myshift=   20
> CPU=  9, lastcg= 4650, targetcg= 9934, myshift=   20
> CPU= 10, lastcg= 4919, targetcg=10203, myshift=   20
> CPU= 11, lastcg= 5188, targetcg=10472, myshift=   20
> CPU= 12, lastcg= 5457, targetcg=  174, myshift=   20
> CPU= 13, lastcg= 5726, targetcg=  443, myshift=   19
> CPU= 14, lastcg= 5995, targetcg=  712, myshift=   19
> CPU= 15, lastcg= 6264, targetcg=  981, myshift=   18
> CPU= 16, lastcg= 6533, targetcg= 1250, myshift=   18
> CPU= 17, lastcg= 6802, targetcg= 1519, myshift=   17
> CPU= 18, lastcg= 7071, targetcg= 1788, myshift=   17
> CPU= 19, lastcg= 7340, targetcg= 2057, myshift=   16
> CPU= 20, lastcg= 7609, targetcg= 2326, myshift=   16
> CPU= 21, lastcg= 7878, targetcg= 2595, myshift=   15
> CPU= 22, lastcg= 8147, targetcg= 2864, myshift=   15
> CPU= 23, lastcg= 8416, targetcg= 3133, myshift=   14
> CPU= 24, lastcg= 8685, targetcg= 3402, myshift=   14
> CPU= 25, lastcg= 8954, targetcg= 3671, myshift=   13
> CPU= 26, lastcg= 9223, targetcg= 3940, myshift=   13
> CPU= 27, lastcg= 9492, targetcg= 4209, myshift=   13
> CPU= 28, lastcg= 9761, targetcg= 4478, myshift=   13
> CPU= 29, lastcg=10029, targetcg= 4746, myshift=   13
> CPU= 30, lastcg=10298, targetcg= 5015, myshift=   13
> CPU= 31, lastcg=10566, targetcg= 5283, myshift=   13
> nsb->shift =  20, nsb->bshift=  0
> Listing Scalars
> nsb->nodeid:         0
> nsb->nnodes:     32
> nsb->cgtotal: 10567
> nsb->natoms:  25925
> nsb->shift:      20
> nsb->bshift:      0
> Nodeid   index  homenr  cgload  workload
>     0       0     788     516       516
>     1     788     828    1056      1056
>     2    1616     828    1596      1596
>     3    2444     828    2136      2136
>     4    3272     828    2676      2676
>     5    4100     828    3216      3216
>     6    4928     828    3756      3756
>     7    5756     807    4113      4113
>     8    6563     807    4382      4382
>     9    7370     807    4651      4651
>    10    8177     807    4920      4920
>    11    8984     807    5189      5189
>    12    9791     807    5458      5458
>    13   10598     807    5727      5727
>    14   11405     807    5996      5996
>    15   12212     807    6265      6265
>    16   13019     807    6534      6534
>    17   13826     807    6803      6803
>    18   14633     807    7072      7072
>    19   15440     807    7341      7341
>    20   16247     807    7610      7610
>    21   17054     807    7879      7879
>    22   17861     807    8148      8148
>    23   18668     807    8417      8417
>    24   19475     807    8686      8686
>    25   20282     807    8955      8955
>    26   21089     807    9224      9224
>    27   21896     807    9493      9493
>    28   22703     807    9762      9762
>    29   23510     804   10030     10030
>    30   24314     807   10299     10299
>    31   25121     804   10567     10567
> 
> parameters of the run:
>   integrator           = md
>   nsteps               = 1500000
>   init_step            = 0
>   ns_type              = Grid
>   nstlist              = 10
>   ndelta               = 2
>   bDomDecomp           = FALSE
>   decomp_dir           = 0
>   nstcomm              = 1
>   comm_mode            = Linear
>   nstcheckpoint        = 1000
>   nstlog               = 10
>   nstxout              = 500
>   nstvout              = 1000
>   nstfout              = 0
>   nstenergy            = 10
>   nstxtcout            = 0
>   init_t               = 0
>   delta_t              = 0.002
>   xtcprec              = 1000
>   nkx                  = 64
>   nky                  = 64
>   nkz                  = 80
>   pme_order            = 6
>   ewald_rtol           = 1e-05
>   ewald_geometry       = 0
>   epsilon_surface      = 0
>   optimize_fft         = TRUE
>   ePBC                 = xyz
>   bUncStart            = FALSE
>   bShakeSOR            = FALSE
>   etc                  = Berendsen
>   epc                  = Berendsen
>   epctype              = Semiisotropic
>   tau_p                = 1
>   ref_p (3x3):
>      ref_p[    0]={ 1.00000e+00,  0.00000e+00,  0.00000e+00}
>      ref_p[    1]={ 0.00000e+00,  1.00000e+00,  0.00000e+00}
>      ref_p[    2]={ 0.00000e+00,  0.00000e+00,  1.00000e+00}
>   compress (3x3):
>      compress[    0]={ 4.50000e-05,  0.00000e+00,  0.00000e+00}
>      compress[    1]={ 0.00000e+00,  4.50000e-05,  0.00000e+00}
>      compress[    2]={ 0.00000e+00,  0.00000e+00,  1.00000e-30}
>   andersen_seed        = 815131
>   rlist                = 0.9
>   coulombtype          = PME
>   rcoulomb_switch      = 0
>   rcoulomb             = 0.9
>   vdwtype              = Cut-off
>   rvdw_switch          = 0
>   rvdw                 = 1.4
>   epsilon_r            = 1
>   epsilon_rf           = 1
>   tabext               = 1
>   gb_algorithm         = Still
>   nstgbradii           = 1
>   rgbradii             = 2
>   gb_saltconc          = 0
>   implicit_solvent     = No
>   DispCorr             = No
>   fudgeQQ              = 1
>   free_energy          = no
>   init_lambda          = 0
>   sc_alpha             = 0
>   sc_power             = 0
>   sc_sigma             = 0.3
>   delta_lambda         = 0
>   disre_weighting      = Conservative
>   disre_mixed          = FALSE
>   dr_fc                = 1000
>   dr_tau               = 0
>   nstdisreout          = 100
>   orires_fc            = 0
>   orires_tau           = 0
>   nstorireout          = 100
>   dihre-fc             = 1000
>   dihre-tau            = 0
>   nstdihreout          = 100
>   em_stepsize          = 0.01
>   em_tol               = 10
>   niter                = 20
>   fc_stepsize          = 0
>   nstcgsteep           = 1000
>   nbfgscorr            = 10
>   ConstAlg             = Lincs
>   shake_tol            = 1e-04
>   lincs_order          = 4
>   lincs_warnangle      = 30
>   lincs_iter           = 1
>   bd_fric              = 0
>   ld_seed              = 1993
>   cos_accel            = 0
>   deform (3x3):
>      deform[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
>      deform[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
>      deform[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
>   userint1             = 0
>   userint2             = 0
>   userint3             = 0
>   userint4             = 0
>   userreal1            = 0
>   userreal2            = 0
>   userreal3            = 0
>   userreal4            = 0
> grpopts:
>   nrdf:         11903.3     39783.7     285.983
>   ref_t:             310         310         310
>   tau_t:             0.1         0.1         0.1
> anneal:                  No          No          No
> ann_npoints:               0           0           0
>   acc:               0           0           0
>   nfreeze:           N           N           N
>   energygrp_flags[  0]: 0
>   efield-x:
>      n = 0
>   efield-xt:
>      n = 0
>   efield-y:
>      n = 0
>   efield-yt:
>      n = 0
>   efield-z:
>      n = 0
>   efield-zt:
>      n = 0
>   bQMMM                = FALSE
>   QMconstraints        = 0
>   QMMMscheme           = 0
>   scalefactor          = 1
> qm_opts:
>   ngQM                 = 0
> Max number of graph edges per atom is 4
> Table routines are used for coulomb: TRUE
> Table routines are used for vdw:     FALSE
> Using a Gaussian width (1/beta) of 0.288146 nm for Ewald
> Cut-off's:   NS: 0.9   Coulomb: 0.9   LJ: 1.4
> System total charge: 0.000
> Generated table with 1200 data points for Ewald.
> Tabscale = 500 points/nm
> Generated table with 1200 data points for LJ6.
> Tabscale = 500 points/nm
> Generated table with 1200 data points for LJ12.
> Tabscale = 500 points/nm
> Generated table with 500 data points for 1-4 COUL.
> Tabscale = 500 points/nm
> Generated table with 500 data points for 1-4 LJ6.
> Tabscale = 500 points/nm
> Generated table with 500 data points for 1-4 LJ12.
> Tabscale = 500 points/nm
> 
> Enabling SPC water optimization for 6631 molecules.
> 
> Will do PME sum in reciprocal space.
> 
> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> U. Essman, L. Perela, M. L. Berkowitz, T. Darden, H. Lee and L. G. Pedersen
> A smooth particle mesh Ewald method
> J. Chem. Phys. 103 (1995) pp. 8577-8592
> -------- -------- --- Thank You --- -------- --------
> 
> Parallelized PME sum used.
> PARALLEL FFT DATA:
>   local_nx:                   2  local_x_start:                   0
>   local_ny_after_transpose:   2  local_y_start_after_transpose    0
> Removing pbc first time
> Done rmpbc
> Center of mass motion removal mode is Linear
> We have the following groups for center of mass motion removal:
>  0:  rest, initial mass: 207860
> There are: 788 Atoms
> 
> Constraining the starting coordinates (step -2)
> 
> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> H. J. C. Berendsen, J. P. M. Postma, A. DiNola and J. R. Haak
> Molecular dynamics with coupling to an external bath
> J. Chem. Phys. 81 (1984) pp. 3684-3690
> -------- -------- --- Thank You --- -------- --------
> 
> 
> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> B. Hess and H. Bekker and H. J. C. Berendsen and J. G. E. M. Fraaije
> LINCS: A Linear Constraint Solver for molecular simulations
> J. Comp. Chem. 18 (1997) pp. 1463-1472
> -------- -------- --- Thank You --- -------- --------
> 
> 
> Initializing LINear Constraint Solver
>  number of constraints is 776
>  average number of constraints coupled to one constraint is 2.5
> 
>   Rel. Constraint Deviation:  Max    between atoms     RMS
>       Before LINCS         0.008664     87     88   0.003001
>        After LINCS         0.000036     95     96   0.000005
> 
> 
> Constraining the coordinates at t0-dt (step -1)
>   Rel. Constraint Deviation:  Max    between atoms     RMS
>       Before LINCS         0.093829     12     13   0.009919
>        After LINCS         0.000131     11     14   0.000021
> 
> Started mdrun on node 0 Mon Jun 26 21:52:34 2006
> Initial temperature: 310.388 K
>           Step           Time         Lambda
>              0        0.00000        0.00000
> 
> Grid: 8 x 8 x 13 cells
> Configuring nonbonded kernels...
> Testing AMD 3DNow support... not present.
> Testing ia32 SSE support... present.
> 
> 
> 
> 
> 
> 
> ********FullMD7.job***************
> 
> *running /home/ababakha/gromacs-mpi/bin/mdrun on 32 LINUX ch_p4 processors
> Created /home/ababakha/SMDPeptideSimulation/CapParSMD/FullMD/PI12637
> NNODES=32, MYRANK=0, HOSTNAME=compute-0-1.local
> NNODES=32, MYRANK=1, HOSTNAME=compute-0-1.local
> NNODES=32, MYRANK=30, HOSTNAME=compute-0-29.local
> NNODES=32, MYRANK=24, HOSTNAME=compute-0-12.local
> NNODES=32, MYRANK=28, HOSTNAME=compute-0-30.local
> NNODES=32, MYRANK=3, HOSTNAME=compute-0-26.local
> NNODES=32, MYRANK=14, HOSTNAME=compute-0-22.local
> NNODES=32, MYRANK=6, HOSTNAME=compute-0-31.local
> NNODES=32, MYRANK=8, HOSTNAME=compute-0-20.local
> NNODES=32, MYRANK=7, HOSTNAME=compute-0-31.local
> NNODES=32, MYRANK=18, HOSTNAME=compute-0-27.local
> NNODES=32, MYRANK=2, HOSTNAME=compute-0-26.local
> NNODES=32, MYRANK=23, HOSTNAME=compute-0-4.local
> NNODES=32, MYRANK=31, HOSTNAME=compute-0-29.local
> NNODES=32, MYRANK=5, HOSTNAME=compute-0-21.local
> NNODES=32, MYRANK=27, HOSTNAME=compute-0-3.local
> NNODES=32, MYRANK=4, HOSTNAME=compute-0-21.local
> NNODES=32, MYRANK=20, HOSTNAME=compute-0-8.local
> NNODES=32, MYRANK=11, HOSTNAME=compute-0-7.local
> NNODES=32, MYRANK=9, HOSTNAME=compute-0-20.local
> NNODES=32, MYRANK=12, HOSTNAME=compute-0-19.local
> NNODES=32, MYRANK=13, HOSTNAME=compute-0-19.local
> NNODES=32, MYRANK=21, HOSTNAME=compute-0-8.local
> NNODES=32, MYRANK=22, HOSTNAME=compute-0-4.local
> NNODES=32, MYRANK=10, HOSTNAME=compute-0-7.local
> NNODES=32, MYRANK=17, HOSTNAME=compute-0-25.local
> NNODES=32, MYRANK=25, HOSTNAME=compute-0-12.local
> NNODES=32, MYRANK=15, HOSTNAME=compute-0-22.local
> NNODES=32, MYRANK=29, HOSTNAME=compute-0-30.local
> NNODES=32, MYRANK=19, HOSTNAME=compute-0-27.local
> NNODES=32, MYRANK=26, HOSTNAME=compute-0-3.local
> NNODES=32, MYRANK=16, HOSTNAME=compute-0-25.local
> NODEID=26 argc=13
> NODEID=25 argc=13
> NODEID=24 argc=13
> NODEID=23 argc=13
> NODEID=22 argc=13
> NODEID=21 argc=13
> NODEID=20 argc=13
> NODEID=19 argc=13
> NODEID=18 argc=13
> NODEID=13 argc=13
> NODEID=17 argc=13
> NODEID=15 argc=13
> NODEID=14 argc=13
> NODEID=16 argc=13
> NODEID=0 argc=13
> NODEID=12 argc=13
> NODEID=6 argc=13
> NODEID=11 argc=13
> NODEID=1 argc=13
> NODEID=10 argc=13
> NODEID=5 argc=13
> NODEID=30 argc=13
> NODEID=7 argc=13
> NODEID=27 argc=13
> NODEID=31 argc=13
> NODEID=2 argc=13
> NODEID=9 argc=13
> NODEID=28 argc=13
> NODEID=4 argc=13
> NODEID=29 argc=13
> NODEID=8 argc=13
> NODEID=3 argc=13
>                         :-)  G  R  O  M  A  C  S  (-:
> 
>                   Groningen Machine for Chemical Simulation
> 
>                            :-)  VERSION 3.3.1  (-:
> 
> 
>      Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
>       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
>             Copyright (c) 2001-2006, The GROMACS development team,
>            check out http://www.gromacs.org for more information.
> 
>         This program is free software; you can redistribute it and/or
>          modify it under the terms of the GNU General Public License
>         as published by the Free Software Foundation; either version 2
>             of the License, or (at your option) any later version.
> 
>                 :-)  /home/ababakha/gromacs-mpi/bin/mdrun  (-:
> 
> Option     Filename  Type         Description
> ------------------------------------------------------------
>  -s    FullMD7.tpr  Input        Generic run input: tpr tpb tpa xml
>  -o    FullMD7.trr  Output       Full precision trajectory: trr trj
>  -x       traj.xtc  Output, Opt. Compressed trajectory (portable xdr
> format)
>  -c AfterFullMD7.gro  Output       Generic structure: gro g96 pdb xml
>  -e    FullMD7.edr  Output       Generic energy: edr ene
>  -g    FullMD7.log  Output       Log file
> -dgdl      dgdl.xvg  Output, Opt. xvgr/xmgr file
> -field    field.xvg  Output, Opt. xvgr/xmgr file
> -table    table.xvg  Input, Opt.  xvgr/xmgr file
> -tablep  tablep.xvg  Input, Opt.  xvgr/xmgr file
> -rerun    rerun.xtc  Input, Opt.  Generic trajectory: xtc trr trj gro
> g96 pdb
> -tpi        tpi.xvg  Output, Opt. xvgr/xmgr file
> -ei        sam.edi  Input, Opt.  ED sampling input
> -eo        sam.edo  Output, Opt. ED sampling output
>  -j       wham.gct  Input, Opt.  General coupling stuff
> -jo        bam.gct  Output, Opt. General coupling stuff
> -ffout      gct.xvg  Output, Opt. xvgr/xmgr file
> -devout   deviatie.xvg  Output, Opt. xvgr/xmgr file
> -runav  runaver.xvg  Output, Opt. xvgr/xmgr file
> -pi       pull.ppa  Input, Opt.  Pull parameters
> -po    pullout.ppa  Output, Opt. Pull parameters
> -pd       pull.pdo  Output, Opt. Pull data output
> -pn       pull.ndx  Input, Opt.  Index file
> -mtx         nm.mtx  Output, Opt. Hessian matrix
> -dn     dipole.ndx  Output, Opt. Index file
> 
>      Option   Type  Value  Description
> ------------------------------------------------------
>      -[no]h   bool     no  Print help info and quit
>      -[no]X   bool     no  Use dialog box GUI to edit command line options
>       -nice    int     19  Set the nicelevel
>     -deffnm string         Set the default filename for all file options
>   -[no]xvgr   bool    yes  Add specific codes (legends etc.) in the output
>                            xvg files for the xmgrace program
>         -np    int     32  Number of nodes, must be the same as used for
>                            grompp
>         -nt    int      1  Number of threads to start on each node
>      -[no]v   bool     no  Be loud and noisy
> -[no]compact   bool    yes  Write a compact log file
> -[no]sepdvdl   bool     no  Write separate V and dVdl terms for each
>                            interaction type and node to the log file(s)
>  -[no]multi   bool     no  Do multiple simulations in parallel (only with
>                            -np > 1)
>     -replex    int      0  Attempt replica exchange every # steps
>     -reseed    int     -1  Seed for replica exchange, -1 is generate a seed
>   -[no]glas   bool     no  Do glass simulation with special long range
>                            corrections
> -[no]ionize   bool     no  Do a simulation including the effect of an X-Ray
>                            bombardment on your system
> 
> Reading file FullMD7.tpr, VERSION 3.3.1 (single precision)
> starting mdrun 'My membrane with peptides in water'
> 1500000 steps,   3000.0 ps.
> 
> p30_10831:  p4_error: Timeout in establishing connection to remote
> process: 0
> rm_l_30_10832: (341.608281) net_send: could not write to fd=5, errno = 32
> rm_l_31_10896: (341.269706) net_send: could not write to fd=5, errno = 32
> p30_10831: (343.634411) net_send: could not write to fd=5, errno = 32
> p31_10895: (343.296105) net_send: could not write to fd=5, errno = 32
> p0_13353:  p4_error: net_recv read:  probable EOF on socket: 1
> Killed by signal 2.
> Killed by signal 2.
> Killed by signal 2.
> Killed by signal 2.
> Killed by signal 2.
> p0_13353: (389.926083) net_send: could not write to fd=4, errno = 32
> 
> 
> _______________________________________________
> gmx-users mailing list    gmx-users at gromacs.org
> http://www.gromacs.org/mailman/listinfo/gmx-users
> Please don't post (un)subscribe requests to the list. Use the www 
> interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/mailing_lists/users.php

-- 
Dr. Carsten Kutzner
Max Planck Institute for Biophysical Chemistry
Theoretical and Computational Biophysics Department
Am Fassberg 11
37077 Goettingen, Germany
Tel. +49-551-2012313, Fax: +49-551-2012302
http://www.mpibpc.mpg.de/research/dep/grubmueller/
http://www.gwdg.de/~ckutzne




More information about the gromacs.org_gmx-users mailing list