[gmx-users] gromacs-4.0.5 parallel run in 8 cpu: slow speed

Thu Jun 11 16:29:30 CEST 2009

Mark is correct.  You should see node information at the top of the md log
file if you are truly running in parallel.

Apparently the default host (or machines) file (which contains the list of
available nodes on your cluster) has not been /is not being populated
correctly.

Your can build your own hosts file and then rerun the job using the command
line:

mpirun -np 8 -hostfile hostfile ~/software/bin/mdrun_mpi -deffnm md

The content and structure of the hostfile will depend on what version of MPI
you are using.  Hopefully, you are not using MPICH 1 but instead are using
OpenMPI, MPICH2, or Intel MPI.

Jim

-----Original Message-----
From: gmx-users-bounces at gromacs.org [mailto:gmx-users-bounces at gromacs.org]
On Behalf Of Thamu
Sent: Thursday, June 11, 2009 9:13 AM
To: gmx-users at gromacs.org
Subject: [gmx-users] gromacs-4.0.5 parallel run in 8 cpu: slow speed

Hi Mark,

The top md.log is below. The mdrun command was "mpirun -np 8
~/software/bin/mdrun_mpi -deffnm md"

                         :-)  G  R  O  M  A  C  S  (-:

                      GROup of MAchos and Cynical Suckers

                            :-)  VERSION 4.0.5  (-:

      Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
             Copyright (c) 2001-2008, The GROMACS development team,
            check out http://www.gromacs.org for more information.

         This program is free software; you can redistribute it and/or
          modify it under the terms of the GNU General Public License
         as published by the Free Software Foundation; either version 2
             of the License, or (at your option) any later version.

                  :-)  /home/thamu/software/bin/mdrun_mpi  (-:

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable
molecular simulation
J. Chem. Theory Comput. 4 (2008) pp. 435-447
-------- -------- --- Thank You --- -------- --------

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C.
Berendsen
GROMACS: Fast, Flexible and Free
J. Comp. Chem. 26 (2005) pp. 1701-1719
-------- -------- --- Thank You --- -------- --------

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
E. Lindahl and B. Hess and D. van der Spoel
GROMACS 3.0: A package for molecular simulation and trajectory analysis
J. Mol. Mod. 7 (2001) pp. 306-317
-------- -------- --- Thank You --- -------- --------

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
H. J. C. Berendsen, D. van der Spoel and R. van Drunen
GROMACS: A message-passing parallel molecular dynamics implementation
Comp. Phys. Comm. 91 (1995) pp. 43-56
-------- -------- --- Thank You --- -------- --------

Input Parameters:
   integrator           = md
   nsteps               = 10000000
   init_step            = 0
   ns_type              = Grid
   nstlist              = 10
   ndelta               = 2
   nstcomm              = 1
   comm_mode            = Linear
   nstlog               = 100
   nstxout              = 1000
   nstvout              = 0
   nstfout              = 0
   nstenergy            = 100
   nstxtcout            = 0
   init_t               = 0
   delta_t              = 0.002
   xtcprec              = 1000
   nkx                  = 70
   nky                  = 70
   nkz                  = 70
   pme_order            = 4
   ewald_rtol           = 1e-05
   ewald_geometry       = 0
   epsilon_surface      = 0
   optimize_fft         = TRUE
   ePBC                 = xyz
   bPeriodicMols        = FALSE
   bContinuation        = FALSE
   bShakeSOR            = FALSE
   etc                  = V-rescale
   epc                  = Parrinello-Rahman
   epctype              = Isotropic
   tau_p                = 0.5
   ref_p (3x3):
      ref_p[    0]={ 1.00000e+00,  0.00000e+00,  0.00000e+00}
      ref_p[    1]={ 0.00000e+00,  1.00000e+00,  0.00000e+00}
      ref_p[    2]={ 0.00000e+00,  0.00000e+00,  1.00000e+00}
   compress (3x3):
      compress[    0]={ 4.50000e-05,  0.00000e+00,  0.00000e+00}
      compress[    1]={ 0.00000e+00,  4.50000e-05,  0.00000e+00}
      compress[    2]={ 0.00000e+00,  0.00000e+00,  4.50000e-05}
   refcoord_scaling     = No
   posres_com (3):
      posres_com[0]= 0.00000e+00
      posres_com[1]= 0.00000e+00
      posres_com[2]= 0.00000e+00
   posres_comB (3):
      posres_comB[0]= 0.00000e+00
      posres_comB[1]= 0.00000e+00
      posres_comB[2]= 0.00000e+00
   andersen_seed        = 815131
   rlist                = 1
   rtpi                 = 0.05
   coulombtype          = PME
   rcoulomb_switch      = 0
   rcoulomb             = 1
   vdwtype              = Cut-off
   rvdw_switch          = 0
   rvdw                 = 1.4
   epsilon_r            = 1
   epsilon_rf           = 1
   tabext               = 1
   implicit_solvent     = No
   gb_algorithm         = Still
   gb_epsilon_solvent   = 80
   nstgbradii           = 1
   rgbradii             = 2
   gb_saltconc          = 0
   gb_obc_alpha         = 1
   gb_obc_beta          = 0.8
   gb_obc_gamma         = 4.85
   sa_surface_tension   = 2.092
   DispCorr             = No
   free_energy          = no
   init_lambda          = 0
   sc_alpha             = 0
   sc_power             = 0
   sc_sigma             = 0.3
   delta_lambda         = 0
   nwall                = 0
   wall_type            = 9-3
   wall_atomtype[0]     = -1
   wall_atomtype[1]     = -1
   wall_density[0]      = 0
   wall_density[1]      = 0
   wall_ewald_zfac      = 3
   pull                 = no
   disre                = No
   disre_weighting      = Conservative
   disre_mixed          = FALSE
   dr_fc                = 1000
   dr_tau               = 0
   nstdisreout          = 100
   orires_fc            = 0
   orires_tau           = 0
   nstorireout          = 100
   dihre-fc             = 1000
   em_stepsize          = 0.01
   em_tol               = 10
   niter                = 20
   fc_stepsize          = 0
   nstcgsteep           = 1000
   nbfgscorr            = 10
   ConstAlg             = Lincs
   shake_tol            = 0.0001
   lincs_order          = 4
   lincs_warnangle      = 30
   lincs_iter           = 1
   bd_fric              = 0
   ld_seed              = 1993
   cos_accel            = 0
   deform (3x3):
      deform[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      deform[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      deform[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
   userint1             = 0
   userint2             = 0
   userint3             = 0
   userint4             = 0
   userreal1            = 0
   userreal2            = 0
   userreal3            = 0
   userreal4            = 0
grpopts:
   nrdf:     6706.82      106800
   ref_t:         300         300
   tau_t:         0.1         0.1
anneal:          No          No
ann_npoints:           0           0
   acc:               0           0           0
   nfreeze:           N           N           N
   energygrp_flags[  0]: 0 0 0
   energygrp_flags[  1]: 0 0 0
   energygrp_flags[  2]: 0 0 0
   efield-x:
      n = 0
   efield-xt:
      n = 0
   efield-y:
      n = 0
   efield-yt:
      n = 0
   efield-z:
      n = 0
   efield-zt:
      n = 0
   bQMMM                = FALSE
   QMconstraints        = 0
   QMMMscheme           = 0
   scalefactor          = 1
qm_opts:
   ngQM                 = 0
Table routines are used for coulomb: TRUE
Table routines are used for vdw:     FALSE
Will do PME sum in reciprocal space.

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
U. Essman, L. Perela, M. L. Berkowitz, T. Darden, H. Lee and L. G. Pedersen 
A smooth particle mesh Ewald method
J. Chem. Phys. 103 (1995) pp. 8577-8592
-------- -------- --- Thank You --- -------- --------

Using a Gaussian width (1/beta) of 0.320163 nm for Ewald
Cut-off's:   NS: 1   Coulomb: 1   LJ: 1.4
System total charge: -0.000
Generated table with 1200 data points for Ewald.
Tabscale = 500 points/nm
Generated table with 1200 data points for LJ6.
Tabscale = 500 points/nm
Generated table with 1200 data points for LJ12.
Tabscale = 500 points/nm
Generated table with 1200 data points for 1-4 COUL.
Tabscale = 500 points/nm
Generated table with 1200 data points for 1-4 LJ6.
Tabscale = 500 points/nm
Generated table with 1200 data points for 1-4 LJ12.
Tabscale = 500 points/nm

Enabling TIP4p water optimization for 17798 molecules.

Configuring nonbonded kernels...
Testing x86_64 SSE support... present.

Removing pbc first time

Initializing LINear Constraint Solver

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
B. Hess and H. Bekker and H. J. C. Berendsen and J. G. E. M. Fraaije
LINCS: A Linear Constraint Solver for molecular simulations
J. Comp. Chem. 18 (1997) pp. 1463-1472
-------- -------- --- Thank You --- -------- --------

The number of constraints is 3439

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Miyamoto and P. A. Kollman
SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for Rigid
Water Models
J. Comp. Chem. 13 (1992) pp. 952-962
-------- -------- --- Thank You --- -------- --------

Center of mass motion removal mode is Linear
We have the following groups for center of mass motion removal:
  0:  rest

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
G. Bussi, D. Donadio and M. Parrinello
Canonical sampling through velocity rescaling
J. Chem. Phys. 126 (2007) pp. 014101
-------- -------- --- Thank You --- -------- --------

There are: 56781 Atoms
There are: 17798 VSites
Max number of connections per atom is 59
Total number of connections is 216528
Max number of graph edges per atom is 4
Total number of graph edges is 113666

Constraining the starting coordinates (step 0)

Constraining the coordinates at t0-dt (step 0)
RMS relative constraint deviation after constraining: 3.77e-05
Initial temperature: 299.838 K

	> Recently I successfully installed the gromacs-4.0.5 mpi version.
	> I could run in 8 cpu. but the speed is very slow.
	> Total number of atoms in the system is 78424.
	> while running all 8 cpu showing 95-100% CPU.
	>
	> How to speed up the calculation.
	>
	> Thanks
	>
	>
	That's normal for a system that atoms/cpu ratio.
	What's your system and what mdp file are you using?
	--
	------------------------------------------------------
	You haven't given us any diagnostic information. The problem could
be
	that you're not running an MPI GROMACS (show us your configure line,
	your mdrun command line and the top 50 lines of your .log file).

	Mark