[gmx-users] Different "optimal pme grid ... coulomb cutoff" values from identical input files

Mark Abraham mark.j.abraham at gmail.com
Wed Feb 5 18:43:33 CET 2014


What's the network? If it's some kind of switched Infiniband shared with
other user's jobs, then getting hit by the traffic does happen. You can see
that the individual timings of the things the load balancer tries differ a
lot between runs. So there must be an extrinsic factor (if the .tpr is
functionally the same). Organizing yourself a quiet corner of the network
is ideal, if you can do the required social engineering :-P

Mark


On Wed, Feb 5, 2014 at 6:22 PM, yunshi11 . <yunshi09 at gmail.com> wrote:

> Hello all,
>
> I am doing a production MD run of a protein-ligand complex in explicit
> water with GROMACS4.6.5
>
> However, I got different coulomb cutoff values as shown in the output log
> files.
>
> 1st one:
>
> ...................................................................................................................................
> NOTE: Turning on dynamic load balancing
>
> step   60: timed with pme grid 112 112 112, coulomb cutoff 1.000: 235.9
> M-cycles
> step  100: timed with pme grid 100 100 100, coulomb cutoff 1.116: 228.8
> M-cycles
> step  100: the domain decompostion limits the PME load balancing to a
> coulomb cut-off of 1.162
> step  140: timed with pme grid 112 112 112, coulomb cutoff 1.000: 223.9
> M-cycles
> step  180: timed with pme grid 108 108 108, coulomb cutoff 1.033: 219.2
> M-cycles
> step  220: timed with pme grid 104 104 104, coulomb cutoff 1.073: 210.9
> M-cycles
> step  260: timed with pme grid 100 100 100, coulomb cutoff 1.116: 229.0
> M-cycles
> step  300: timed with pme grid 96 96 96, coulomb cutoff 1.162: 267.8
> M-cycles
> step  340: timed with pme grid 112 112 112, coulomb cutoff 1.000: 241.4
> M-cycles
> step  380: timed with pme grid 108 108 108, coulomb cutoff 1.033: 424.1
> M-cycles
> step  420: timed with pme grid 104 104 104, coulomb cutoff 1.073: 215.1
> M-cycles
> step  460: timed with pme grid 100 100 100, coulomb cutoff 1.116: 226.4
> M-cycles
>               optimal pme grid 104 104 104, coulomb cutoff 1.073
> DD  step 24999  vol min/aver 0.834  load imb.: force  2.3%  pme mesh/force
> 0.687
>
> ...................................................................................................................................
>
>
> 2nd one:
> NOTE: Turning on dynamic load balancing
>
> step   60: timed with pme grid 112 112 112, coulomb cutoff 1.000: 187.1
> M-cycles
> step  100: timed with pme grid 100 100 100, coulomb cutoff 1.116: 218.3
> M-cycles
> step  140: timed with pme grid 112 112 112, coulomb cutoff 1.000: 172.4
> M-cycles
> step  180: timed with pme grid 108 108 108, coulomb cutoff 1.033: 188.3
> M-cycles
> step  220: timed with pme grid 104 104 104, coulomb cutoff 1.073: 203.1
> M-cycles
> step  260: timed with pme grid 112 112 112, coulomb cutoff 1.000: 174.3
> M-cycles
> step  300: timed with pme grid 108 108 108, coulomb cutoff 1.033: 184.4
> M-cycles
> step  340: timed with pme grid 104 104 104, coulomb cutoff 1.073: 205.4
> M-cycles
> step  380: timed with pme grid 112 112 112, coulomb cutoff 1.000: 172.1
> M-cycles
> step  420: timed with pme grid 108 108 108, coulomb cutoff 1.033: 188.8
> M-cycles
>               optimal pme grid 112 112 112, coulomb cutoff 1.000
> DD  step 24999  vol min/aver 0.789  load imb.: force  4.7%  pme mesh/force
> 0.766
>
> ...................................................................................................................................
>
>
>
>
> The 2nd MD run turned out to be much faster (5 times), and the reason I
> submitted the 2nd is because the 1st was unexpectedly slow.
>
> I made sure the .tpr file and .pbs file (MPI for a cluster, which consists
> of Xeon E5649 CPUs) are virtually identical, and here is my .mdp file:
> ;
> title                    = Production Simulation
> cpp                      = /lib/cpp
>
> ; RUN CONTROL PARAMETERS
> integrator               = md
> tinit                    = 0       ; Starting time
> dt                       = 0.002   ; 2 femtosecond time step for
> integration
> nsteps                   = 500000000  ; 1000 ns = 0.002ps * 50,000,000
>
> ; OUTPUT CONTROL OPTIONS
> nstxout                  = 25000       ;  .trr full precision coor every
> 50ps
> nstvout                  = 0         ;  .trr velocities output
> nstfout                  = 0         ; Not writing forces
> nstlog                   = 25000       ; Writing to the log file every 50ps
> nstenergy                = 25000       ; Writing out energy information
> every 50ps
> energygrps               = dikpgdu Water_and_ions
>
> ; NEIGHBORSEARCHING PARAMETERS
> cutoff-scheme = Verlet
> nstlist                  = 20
> ns-type                  = Grid
> pbc                      = xyz       ; 3-D PBC
> rlist                    = 1.0
>
> ; OPTIONS FOR ELECTROSTATICS AND VDW
> rcoulomb                 = 1.0      ; short-range electrostatic cutoff (in
> nm)
> coulombtype              = PME       ; Particle Mesh Ewald for long-range
> electrostatics
> pme_order                = 4         ; interpolation
> fourierspacing           = 0.12      ; grid spacing for FFT
> vdw-type                 = Cut-off
> rvdw                     = 1.0       ; short-range van der Waals cutoff (in
> nm)
> optimize_fft            =    yes ;
>
> ; Temperature coupling
> Tcoupl                   = v-rescale
> tc-grps                  = dikpgdu  Water_and_ions
> tau_t                    = 0.1      0.1
> ref_t                    = 298      298
>
> ; Pressure coupling
> Pcoupl                   = Berendsen
> Pcoupltype               = Isotropic
> tau_p                    = 1.0
> compressibility          = 4.5e-5
> ref_p                    = 1.0
>
> ; Dispersion correction
> DispCorr    = EnerPres  ; account for cut-off vdW scheme
>
> ; GENERATE VELOCITIES FOR STARTUP RUN
> gen_vel     = no
>
> ; OPTIONS FOR BONDS
> continuation             = yes
> constraints              = hbonds
> constraint-algorithm     = Lincs
> lincs-order              = 4
> lincs-iter               = 1
> lincs-warnangle          = 30
>
>
>
> I am surprised that the coulomb cutoffs of 1.073 vs 1.000 could cause
> 5-fold performance difference, and why would they be different in the first
> place if identical input files were used?
>
> I haven't found anything peculiar on the cluster I am using.
>
> Any suggestions for the issue?
>
> Thanks,
> Yun
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>


More information about the gromacs.org_gmx-users mailing list