[gmx-users] How to tune number of CPUs for a run?

Wed Nov 4 17:11:05 CET 2009

Hi Pablo,

the tool g_tune_pme helps to find the optimum setting on a given  
number of
processors. If you do not want to use the newest git version of
gromacs, there is also a version for gromacs 4.0.5 available here:

http://www.mpibpc.mpg.de/home/grubmueller/projects/MethodAdvancements/Gromacs/

If grompp reports a lower PME/PP ratio than 0.25, it will be helpful  
for the
scaling on large numbers of cores. On the other hand, having much more
than a third of all processors doing PME will very likely be very bad  
if you want
to scale to a large number of processors.

Typical PME setups, like cutoffs at 1 nm, fourier grid spacing about  
0.135 nm,
will result in PME/PP ratios of 0.25-0.33 though.

If you want to tune the number of CPUs for a run you need to think about
whether you want the highest performance possible or a decent  
performance
without wasting CPU time due to bad scaling. For both settings it  
helps a
lot to derive the performance numbers as a function of the number of
processors.

Carsten

On Nov 4, 2009, at 3:52 PM, Pablo Englebienne wrote:

> Hi all,
>
> I'm having some trouble running simulations with increasing number  
> of CPUs. What parameters should I modify to make sure that the  
> simulation would run with a specific number of processors? Or,  
> having access to a large number of processors, how to select the  
> number of CPUs to request?
>
> Besides this, should the PP/PME reported by grompp always fall in  
> the range 0.25-0.33? What if it is lower (e.g., 0.16)?
>
> I'm attaching an mdrun logfile of a failed run.
>
> Thanks for suggestions,
> Pablo
>
> -- 
> Pablo Englebienne, PhD
> Institute of Complex Molecular Systems (ICMS)
> Eindhoven University of Technology, TU/e
> PO Box 513, HG -1.26
> 5600 MB Eindhoven, The Netherlands
> Tel +31 40 247 5349
>
> Log file opened on Mon Nov  2 18:23:16 2009
> Host: node052  pid: 22760  nodeid: 0  nnodes:  16
> The Gromacs distribution was built Thu Oct 29 14:19:59 CET 2009 by
> penglebie at ST-HPC-Main (Linux 2.6.18-128.7.1.el5 x86_64)
>
>
>                         :-)  G  R  O  M  A  C  S  (-:
>
>                   Good gRace! Old Maple Actually Chews Slate
>
>                            :-)  VERSION 4.0.5  (-:
>
>
>      Written by David van der Spoel, Erik Lindahl, Berk Hess, and  
> others.
>       Copyright (c) 1991-2000, University of Groningen, The  
> Netherlands.
>             Copyright (c) 2001-2008, The GROMACS development team,
>            check out http://www.gromacs.org for more information.
>
>         This program is free software; you can redistribute it and/or
>          modify it under the terms of the GNU General Public License
>         as published by the Free Software Foundation; either version 2
>             of the License, or (at your option) any later version.
>
>    :-)  /home/penglebie/software/bin/mdrun_openmpi (double  
> precision)  (-:
>
>
> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
> GROMACS 4: Algorithms for highly efficient, load-balanced, and  
> scalable
> molecular simulation
> J. Chem. Theory Comput. 4 (2008) pp. 435-447
> -------- -------- --- Thank You --- -------- --------
>
>
> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and  
> H. J. C.
> Berendsen
> GROMACS: Fast, Flexible and Free
> J. Comp. Chem. 26 (2005) pp. 1701-1719
> -------- -------- --- Thank You --- -------- --------
>
>
> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> E. Lindahl and B. Hess and D. van der Spoel
> GROMACS 3.0: A package for molecular simulation and trajectory  
> analysis
> J. Mol. Mod. 7 (2001) pp. 306-317
> -------- -------- --- Thank You --- -------- --------
>
>
> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> H. J. C. Berendsen, D. van der Spoel and R. van Drunen
> GROMACS: A message-passing parallel molecular dynamics implementation
> Comp. Phys. Comm. 91 (1995) pp. 43-56
> -------- -------- --- Thank You --- -------- --------
>
> parameters of the run:
>   integrator           = md
>   nsteps               = 50000
>   init_step            = 0
>   ns_type              = Grid
>   nstlist              = 5
>   ndelta               = 2
>   nstcomm              = 1
>   comm_mode            = Linear
>   nstlog               = 1000
>   nstxout              = 1000
>   nstvout              = 1000
>   nstfout              = 0
>   nstenergy            = 1000
>   nstxtcout            = 0
>   init_t               = 0
>   delta_t              = 0.002
>   xtcprec              = 1000
>   nkx                  = 40
>   nky                  = 40
>   nkz                  = 40
>   pme_order            = 4
>   ewald_rtol           = 1e-05
>   ewald_geometry       = 0
>   epsilon_surface      = 0
>   optimize_fft         = FALSE
>   ePBC                 = xyz
>   bPeriodicMols        = FALSE
>   bContinuation        = TRUE
>   bShakeSOR            = FALSE
>   etc                  = V-rescale
>   epc                  = Parrinello-Rahman
>   epctype              = Isotropic
>   tau_p                = 5
>   ref_p (3x3):
>      ref_p[    0]={ 1.00000e+00,  0.00000e+00,  0.00000e+00}
>      ref_p[    1]={ 0.00000e+00,  1.00000e+00,  0.00000e+00}
>      ref_p[    2]={ 0.00000e+00,  0.00000e+00,  1.00000e+00}
>   compress (3x3):
>      compress[    0]={ 1.00000e-04,  0.00000e+00,  0.00000e+00}
>      compress[    1]={ 0.00000e+00,  1.00000e-04,  0.00000e+00}
>      compress[    2]={ 0.00000e+00,  0.00000e+00,  1.00000e-04}
>   refcoord_scaling     = No
>   posres_com (3):
>      posres_com[0]= 0.00000e+00
>      posres_com[1]= 0.00000e+00
>      posres_com[2]= 0.00000e+00
>   posres_comB (3):
>      posres_comB[0]= 0.00000e+00
>      posres_comB[1]= 0.00000e+00
>      posres_comB[2]= 0.00000e+00
>   andersen_seed        = 815131
>   rlist                = 1.4
>   rtpi                 = 0.05
>   coulombtype          = PME
>   rcoulomb_switch      = 0
>   rcoulomb             = 1.4
>   vdwtype              = Cut-off
>   rvdw_switch          = 0
>   rvdw                 = 1.4
>   epsilon_r            = 1
>   epsilon_rf           = 1
>   tabext               = 1
>   implicit_solvent     = No
>   gb_algorithm         = Still
>   gb_epsilon_solvent   = 80
>   nstgbradii           = 1
>   rgbradii             = 2
>   gb_saltconc          = 0
>   gb_obc_alpha         = 1
>   gb_obc_beta          = 0.8
>   gb_obc_gamma         = 4.85
>   sa_surface_tension   = 2.092
>   DispCorr             = EnerPres
>   free_energy          = no
>   init_lambda          = 0
>   sc_alpha             = 0
>   sc_power             = 0
>   sc_sigma             = 0.3
>   delta_lambda         = 0
>   nwall                = 0
>   wall_type            = 9-3
>   wall_atomtype[0]     = -1
>   wall_atomtype[1]     = -1
>   wall_density[0]      = 0
>   wall_density[1]      = 0
>   wall_ewald_zfac      = 3
>   pull                 = no
>   disre                = No
>   disre_weighting      = Conservative
>   disre_mixed          = FALSE
>   dr_fc                = 1000
>   dr_tau               = 0
>   nstdisreout          = 100
>   orires_fc            = 0
>   orires_tau           = 0
>   nstorireout          = 100
>   dihre-fc             = 1000
>   em_stepsize          = 0.01
>   em_tol               = 10
>   niter                = 20
>   fc_stepsize          = 0
>   nstcgsteep           = 1000
>   nbfgscorr            = 10
>   ConstAlg             = Lincs
>   shake_tol            = 0.0001
>   lincs_order          = 4
>   lincs_warnangle      = 30
>   lincs_iter           = 1
>   bd_fric              = 0
>   ld_seed              = 1993
>   cos_accel            = 0
>   deform (3x3):
>      deform[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
>      deform[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
>      deform[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
>   userint1             = 0
>   userint2             = 0
>   userint3             = 0
>   userint4             = 0
>   userreal1            = 0
>   userreal2            = 0
>   userreal3            = 0
>   userreal4            = 0
> grpopts:
>   nrdf:     231.858     4661.14
>   ref_t:         300         300
>   tau_t:         0.1         0.1
> anneal:          No          No
> ann_npoints:           0           0
>   acc:	           0           0           0
>   nfreeze:           N           N           N
>   energygrp_flags[  0]: 0
>   efield-x:
>      n = 0
>   efield-xt:
>      n = 0
>   efield-y:
>      n = 0
>   efield-yt:
>      n = 0
>   efield-z:
>      n = 0
>   efield-zt:
>      n = 0
>   bQMMM                = FALSE
>   QMconstraints        = 0
>   QMMMscheme           = 0
>   scalefactor          = 1
> qm_opts:
>   ngQM                 = 0
>
> Initializing Domain Decomposition on 16 nodes
> Dynamic load balancing: auto
> Will sort the charge groups at every domain (re)decomposition
> Initial maximum inter charge-group distances:
>    two-body bonded interactions: 0.589 nm, LJ-14, atoms 65 87
>  multi-body bonded interactions: 0.538 nm, G96Angle, atoms 62 65
> Minimum cell size due to bonded interactions: 0.591 nm
> Maximum distance for 5 constraints, at 120 deg. angles, all-trans:  
> 0.765 nm
> Estimated maximum distance required for P-LINCS: 0.765 nm
> This distance will limit the DD cell size, you can override this  
> with -rcon
> Guess for relative PME load: 0.31
> Will use 10 particle-particle and 6 PME only nodes
> This is a guess, check the performance at the end of the log file
> Using 6 separate PME nodes
> Scaling the initial minimum size with 1/0.8 (option -dds) = 1.25
> Optimizing the DD grid for 10 cells with a minimum initial size of  
> 0.956 nm
> The maximum allowed number of cells is: X 3 Y 3 Z 3
>
> -------------------------------------------------------
> Program mdrun_openmpi, VERSION 4.0.5
> Source code file: domdec.c, line: 5873
>
> Fatal error:
> There is no domain decomposition for 10 nodes that is compatible  
> with the given box and a minimum cell size of 0.95625 nm
> Change the number of nodes or mdrun option -rcon or -dds or your  
> LINCS settings
> Look in the log file for details on the domain decomposition
> -------------------------------------------------------
>
> "If You Want Something Done You Have to Do It Yourself" (Highlander  
> II)
>
> _______________________________________________
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> Please search the archive at http://www.gromacs.org/search before  
> posting!
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/mailing_lists/users.php

--
Dr. Carsten Kutzner
Max Planck Institute for Biophysical Chemistry
Theoretical and Computational Biophysics
Am Fassberg 11, 37077 Goettingen, Germany
Tel. +49-551-2012313, Fax: +49-551-2012302
http://www.mpibpc.mpg.de/home/grubmueller/ihp/ckutzne