[gmx-users] How to tune number of CPUs for a run?
Carsten Kutzner
ckutzne at gwdg.de
Wed Nov 4 17:11:05 CET 2009
Hi Pablo,
the tool g_tune_pme helps to find the optimum setting on a given
number of
processors. If you do not want to use the newest git version of
gromacs, there is also a version for gromacs 4.0.5 available here:
http://www.mpibpc.mpg.de/home/grubmueller/projects/MethodAdvancements/Gromacs/
If grompp reports a lower PME/PP ratio than 0.25, it will be helpful
for the
scaling on large numbers of cores. On the other hand, having much more
than a third of all processors doing PME will very likely be very bad
if you want
to scale to a large number of processors.
Typical PME setups, like cutoffs at 1 nm, fourier grid spacing about
0.135 nm,
will result in PME/PP ratios of 0.25-0.33 though.
If you want to tune the number of CPUs for a run you need to think about
whether you want the highest performance possible or a decent
performance
without wasting CPU time due to bad scaling. For both settings it
helps a
lot to derive the performance numbers as a function of the number of
processors.
Carsten
On Nov 4, 2009, at 3:52 PM, Pablo Englebienne wrote:
> Hi all,
>
> I'm having some trouble running simulations with increasing number
> of CPUs. What parameters should I modify to make sure that the
> simulation would run with a specific number of processors? Or,
> having access to a large number of processors, how to select the
> number of CPUs to request?
>
> Besides this, should the PP/PME reported by grompp always fall in
> the range 0.25-0.33? What if it is lower (e.g., 0.16)?
>
> I'm attaching an mdrun logfile of a failed run.
>
> Thanks for suggestions,
> Pablo
>
> --
> Pablo Englebienne, PhD
> Institute of Complex Molecular Systems (ICMS)
> Eindhoven University of Technology, TU/e
> PO Box 513, HG -1.26
> 5600 MB Eindhoven, The Netherlands
> Tel +31 40 247 5349
>
> Log file opened on Mon Nov 2 18:23:16 2009
> Host: node052 pid: 22760 nodeid: 0 nnodes: 16
> The Gromacs distribution was built Thu Oct 29 14:19:59 CET 2009 by
> penglebie at ST-HPC-Main (Linux 2.6.18-128.7.1.el5 x86_64)
>
>
> :-) G R O M A C S (-:
>
> Good gRace! Old Maple Actually Chews Slate
>
> :-) VERSION 4.0.5 (-:
>
>
> Written by David van der Spoel, Erik Lindahl, Berk Hess, and
> others.
> Copyright (c) 1991-2000, University of Groningen, The
> Netherlands.
> Copyright (c) 2001-2008, The GROMACS development team,
> check out http://www.gromacs.org for more information.
>
> This program is free software; you can redistribute it and/or
> modify it under the terms of the GNU General Public License
> as published by the Free Software Foundation; either version 2
> of the License, or (at your option) any later version.
>
> :-) /home/penglebie/software/bin/mdrun_openmpi (double
> precision) (-:
>
>
> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
> GROMACS 4: Algorithms for highly efficient, load-balanced, and
> scalable
> molecular simulation
> J. Chem. Theory Comput. 4 (2008) pp. 435-447
> -------- -------- --- Thank You --- -------- --------
>
>
> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and
> H. J. C.
> Berendsen
> GROMACS: Fast, Flexible and Free
> J. Comp. Chem. 26 (2005) pp. 1701-1719
> -------- -------- --- Thank You --- -------- --------
>
>
> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> E. Lindahl and B. Hess and D. van der Spoel
> GROMACS 3.0: A package for molecular simulation and trajectory
> analysis
> J. Mol. Mod. 7 (2001) pp. 306-317
> -------- -------- --- Thank You --- -------- --------
>
>
> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> H. J. C. Berendsen, D. van der Spoel and R. van Drunen
> GROMACS: A message-passing parallel molecular dynamics implementation
> Comp. Phys. Comm. 91 (1995) pp. 43-56
> -------- -------- --- Thank You --- -------- --------
>
> parameters of the run:
> integrator = md
> nsteps = 50000
> init_step = 0
> ns_type = Grid
> nstlist = 5
> ndelta = 2
> nstcomm = 1
> comm_mode = Linear
> nstlog = 1000
> nstxout = 1000
> nstvout = 1000
> nstfout = 0
> nstenergy = 1000
> nstxtcout = 0
> init_t = 0
> delta_t = 0.002
> xtcprec = 1000
> nkx = 40
> nky = 40
> nkz = 40
> pme_order = 4
> ewald_rtol = 1e-05
> ewald_geometry = 0
> epsilon_surface = 0
> optimize_fft = FALSE
> ePBC = xyz
> bPeriodicMols = FALSE
> bContinuation = TRUE
> bShakeSOR = FALSE
> etc = V-rescale
> epc = Parrinello-Rahman
> epctype = Isotropic
> tau_p = 5
> ref_p (3x3):
> ref_p[ 0]={ 1.00000e+00, 0.00000e+00, 0.00000e+00}
> ref_p[ 1]={ 0.00000e+00, 1.00000e+00, 0.00000e+00}
> ref_p[ 2]={ 0.00000e+00, 0.00000e+00, 1.00000e+00}
> compress (3x3):
> compress[ 0]={ 1.00000e-04, 0.00000e+00, 0.00000e+00}
> compress[ 1]={ 0.00000e+00, 1.00000e-04, 0.00000e+00}
> compress[ 2]={ 0.00000e+00, 0.00000e+00, 1.00000e-04}
> refcoord_scaling = No
> posres_com (3):
> posres_com[0]= 0.00000e+00
> posres_com[1]= 0.00000e+00
> posres_com[2]= 0.00000e+00
> posres_comB (3):
> posres_comB[0]= 0.00000e+00
> posres_comB[1]= 0.00000e+00
> posres_comB[2]= 0.00000e+00
> andersen_seed = 815131
> rlist = 1.4
> rtpi = 0.05
> coulombtype = PME
> rcoulomb_switch = 0
> rcoulomb = 1.4
> vdwtype = Cut-off
> rvdw_switch = 0
> rvdw = 1.4
> epsilon_r = 1
> epsilon_rf = 1
> tabext = 1
> implicit_solvent = No
> gb_algorithm = Still
> gb_epsilon_solvent = 80
> nstgbradii = 1
> rgbradii = 2
> gb_saltconc = 0
> gb_obc_alpha = 1
> gb_obc_beta = 0.8
> gb_obc_gamma = 4.85
> sa_surface_tension = 2.092
> DispCorr = EnerPres
> free_energy = no
> init_lambda = 0
> sc_alpha = 0
> sc_power = 0
> sc_sigma = 0.3
> delta_lambda = 0
> nwall = 0
> wall_type = 9-3
> wall_atomtype[0] = -1
> wall_atomtype[1] = -1
> wall_density[0] = 0
> wall_density[1] = 0
> wall_ewald_zfac = 3
> pull = no
> disre = No
> disre_weighting = Conservative
> disre_mixed = FALSE
> dr_fc = 1000
> dr_tau = 0
> nstdisreout = 100
> orires_fc = 0
> orires_tau = 0
> nstorireout = 100
> dihre-fc = 1000
> em_stepsize = 0.01
> em_tol = 10
> niter = 20
> fc_stepsize = 0
> nstcgsteep = 1000
> nbfgscorr = 10
> ConstAlg = Lincs
> shake_tol = 0.0001
> lincs_order = 4
> lincs_warnangle = 30
> lincs_iter = 1
> bd_fric = 0
> ld_seed = 1993
> cos_accel = 0
> deform (3x3):
> deform[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
> deform[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
> deform[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
> userint1 = 0
> userint2 = 0
> userint3 = 0
> userint4 = 0
> userreal1 = 0
> userreal2 = 0
> userreal3 = 0
> userreal4 = 0
> grpopts:
> nrdf: 231.858 4661.14
> ref_t: 300 300
> tau_t: 0.1 0.1
> anneal: No No
> ann_npoints: 0 0
> acc: 0 0 0
> nfreeze: N N N
> energygrp_flags[ 0]: 0
> efield-x:
> n = 0
> efield-xt:
> n = 0
> efield-y:
> n = 0
> efield-yt:
> n = 0
> efield-z:
> n = 0
> efield-zt:
> n = 0
> bQMMM = FALSE
> QMconstraints = 0
> QMMMscheme = 0
> scalefactor = 1
> qm_opts:
> ngQM = 0
>
> Initializing Domain Decomposition on 16 nodes
> Dynamic load balancing: auto
> Will sort the charge groups at every domain (re)decomposition
> Initial maximum inter charge-group distances:
> two-body bonded interactions: 0.589 nm, LJ-14, atoms 65 87
> multi-body bonded interactions: 0.538 nm, G96Angle, atoms 62 65
> Minimum cell size due to bonded interactions: 0.591 nm
> Maximum distance for 5 constraints, at 120 deg. angles, all-trans:
> 0.765 nm
> Estimated maximum distance required for P-LINCS: 0.765 nm
> This distance will limit the DD cell size, you can override this
> with -rcon
> Guess for relative PME load: 0.31
> Will use 10 particle-particle and 6 PME only nodes
> This is a guess, check the performance at the end of the log file
> Using 6 separate PME nodes
> Scaling the initial minimum size with 1/0.8 (option -dds) = 1.25
> Optimizing the DD grid for 10 cells with a minimum initial size of
> 0.956 nm
> The maximum allowed number of cells is: X 3 Y 3 Z 3
>
> -------------------------------------------------------
> Program mdrun_openmpi, VERSION 4.0.5
> Source code file: domdec.c, line: 5873
>
> Fatal error:
> There is no domain decomposition for 10 nodes that is compatible
> with the given box and a minimum cell size of 0.95625 nm
> Change the number of nodes or mdrun option -rcon or -dds or your
> LINCS settings
> Look in the log file for details on the domain decomposition
> -------------------------------------------------------
>
> "If You Want Something Done You Have to Do It Yourself" (Highlander
> II)
>
> _______________________________________________
> gmx-users mailing list gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> Please search the archive at http://www.gromacs.org/search before
> posting!
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/mailing_lists/users.php
--
Dr. Carsten Kutzner
Max Planck Institute for Biophysical Chemistry
Theoretical and Computational Biophysics
Am Fassberg 11, 37077 Goettingen, Germany
Tel. +49-551-2012313, Fax: +49-551-2012302
http://www.mpibpc.mpg.de/home/grubmueller/ihp/ckutzne
More information about the gromacs.org_gmx-users
mailing list