[gmx-users] Different "optimal pme grid ... coulomb cutoff" values from identical input files
Mark Abraham
mark.j.abraham at gmail.com
Thu Feb 6 21:54:54 CET 2014
On Feb 6, 2014 8:42 AM, "yunshi11 ." <yunshi09 at gmail.com> wrote:
>
> On Wed, Feb 5, 2014 at 9:43 AM, Mark Abraham <mark.j.abraham at gmail.com
>wrote:
>
> > What's the network? If it's some kind of switched Infiniband shared with
> > other user's jobs, then getting hit by the traffic does happen. You can
see
> >
>
>
> It indeed use an InfiniBand 4X QDR (Quad Data Rate) 40 Gbit/s switched
> fabric, with a two to one blocking factor.
>
> And I tried running this again with GPU version, which illustrated the
same
> issue: every single run gets a different coulomb cutoff after automatic
> optimization.
It is getting a different PME tuning, which is not surprising given the
noise in the timings it measures. There's probably a right tuning for you,
but you'd have to run each testing phase long enough to average over the
noise! The differences in result don't matter for correctness, only for
efficiency. If you decide on a setup you think is fastest on balance,
describe it in the .mdp and use mdrun -notunepme. That way you won't get a
stupid result from the tuner. It doesn't help that any single measurement
could be slow from noise, or because it is bad, and it is hard to tell the
difference without repeats.
> Since it is unlikely to have my own corner on a nation-wide supercomputer,
> is there any parameters that could avoid this from happening?
Whether there are other users ;-) The next best is to request the scheduler
give you nodes at the same lowest level of the switch hierarchy. This
reduces your surface area, by making you your own neighbour more often.
This will lead to longer queue times, of course, so weigh up efficiency vs
throughput. Naturally, your scheduler won't support this request, but if
you don't ask for it, it never will! Likewise for a machine that can be
partitioned for sufficient need.
> Turning off load balancing sounds crazy.
Yes. PME tuning and load balancing are different things! Neither is a
problem here, but both are affected by the runtime context.
Mark
>
>
>
> > that the individual timings of the things the load balancer tries
differ a
> > lot between runs. So there must be an extrinsic factor (if the .tpr is
> > functionally the same). Organizing yourself a quiet corner of the
network
> > is ideal, if you can do the required social engineering :-P
> >
> > Mark
> >
> >
> > On Wed, Feb 5, 2014 at 6:22 PM, yunshi11 . <yunshi09 at gmail.com> wrote:
> >
> > > Hello all,
> > >
> > > I am doing a production MD run of a protein-ligand complex in explicit
> > > water with GROMACS4.6.5
> > >
> > > However, I got different coulomb cutoff values as shown in the output
log
> > > files.
> > >
> > > 1st one:
> > >
> > >
> >
...................................................................................................................................
> > > NOTE: Turning on dynamic load balancing
> > >
> > > step 60: timed with pme grid 112 112 112, coulomb cutoff 1.000:
235.9
> > > M-cycles
> > > step 100: timed with pme grid 100 100 100, coulomb cutoff 1.116:
228.8
> > > M-cycles
> > > step 100: the domain decompostion limits the PME load balancing to a
> > > coulomb cut-off of 1.162
> > > step 140: timed with pme grid 112 112 112, coulomb cutoff 1.000:
223.9
> > > M-cycles
> > > step 180: timed with pme grid 108 108 108, coulomb cutoff 1.033:
219.2
> > > M-cycles
> > > step 220: timed with pme grid 104 104 104, coulomb cutoff 1.073:
210.9
> > > M-cycles
> > > step 260: timed with pme grid 100 100 100, coulomb cutoff 1.116:
229.0
> > > M-cycles
> > > step 300: timed with pme grid 96 96 96, coulomb cutoff 1.162: 267.8
> > > M-cycles
> > > step 340: timed with pme grid 112 112 112, coulomb cutoff 1.000:
241.4
> > > M-cycles
> > > step 380: timed with pme grid 108 108 108, coulomb cutoff 1.033:
424.1
> > > M-cycles
> > > step 420: timed with pme grid 104 104 104, coulomb cutoff 1.073:
215.1
> > > M-cycles
> > > step 460: timed with pme grid 100 100 100, coulomb cutoff 1.116:
226.4
> > > M-cycles
> > > optimal pme grid 104 104 104, coulomb cutoff 1.073
> > > DD step 24999 vol min/aver 0.834 load imb.: force 2.3% pme
> > mesh/force
> > > 0.687
> > >
> > >
> >
...................................................................................................................................
> > >
> > >
> > > 2nd one:
> > > NOTE: Turning on dynamic load balancing
> > >
> > > step 60: timed with pme grid 112 112 112, coulomb cutoff 1.000:
187.1
> > > M-cycles
> > > step 100: timed with pme grid 100 100 100, coulomb cutoff 1.116:
218.3
> > > M-cycles
> > > step 140: timed with pme grid 112 112 112, coulomb cutoff 1.000:
172.4
> > > M-cycles
> > > step 180: timed with pme grid 108 108 108, coulomb cutoff 1.033:
188.3
> > > M-cycles
> > > step 220: timed with pme grid 104 104 104, coulomb cutoff 1.073:
203.1
> > > M-cycles
> > > step 260: timed with pme grid 112 112 112, coulomb cutoff 1.000:
174.3
> > > M-cycles
> > > step 300: timed with pme grid 108 108 108, coulomb cutoff 1.033:
184.4
> > > M-cycles
> > > step 340: timed with pme grid 104 104 104, coulomb cutoff 1.073:
205.4
> > > M-cycles
> > > step 380: timed with pme grid 112 112 112, coulomb cutoff 1.000:
172.1
> > > M-cycles
> > > step 420: timed with pme grid 108 108 108, coulomb cutoff 1.033:
188.8
> > > M-cycles
> > > optimal pme grid 112 112 112, coulomb cutoff 1.000
> > > DD step 24999 vol min/aver 0.789 load imb.: force 4.7% pme
> > mesh/force
> > > 0.766
> > >
> > >
> >
...................................................................................................................................
> > >
> > >
> > >
> > >
> > > The 2nd MD run turned out to be much faster (5 times), and the reason
I
> > > submitted the 2nd is because the 1st was unexpectedly slow.
> > >
> > > I made sure the .tpr file and .pbs file (MPI for a cluster, which
> > consists
> > > of Xeon E5649 CPUs) are virtually identical, and here is my .mdp file:
> > > ;
> > > title = Production Simulation
> > > cpp = /lib/cpp
> > >
> > > ; RUN CONTROL PARAMETERS
> > > integrator = md
> > > tinit = 0 ; Starting time
> > > dt = 0.002 ; 2 femtosecond time step for
> > > integration
> > > nsteps = 500000000 ; 1000 ns = 0.002ps * 50,000,000
> > >
> > > ; OUTPUT CONTROL OPTIONS
> > > nstxout = 25000 ; .trr full precision coor
every
> > > 50ps
> > > nstvout = 0 ; .trr velocities output
> > > nstfout = 0 ; Not writing forces
> > > nstlog = 25000 ; Writing to the log file every
> > 50ps
> > > nstenergy = 25000 ; Writing out energy
information
> > > every 50ps
> > > energygrps = dikpgdu Water_and_ions
> > >
> > > ; NEIGHBORSEARCHING PARAMETERS
> > > cutoff-scheme = Verlet
> > > nstlist = 20
> > > ns-type = Grid
> > > pbc = xyz ; 3-D PBC
> > > rlist = 1.0
> > >
> > > ; OPTIONS FOR ELECTROSTATICS AND VDW
> > > rcoulomb = 1.0 ; short-range electrostatic cutoff
> > (in
> > > nm)
> > > coulombtype = PME ; Particle Mesh Ewald for
long-range
> > > electrostatics
> > > pme_order = 4 ; interpolation
> > > fourierspacing = 0.12 ; grid spacing for FFT
> > > vdw-type = Cut-off
> > > rvdw = 1.0 ; short-range van der Waals
cutoff
> > (in
> > > nm)
> > > optimize_fft = yes ;
> > >
> > > ; Temperature coupling
> > > Tcoupl = v-rescale
> > > tc-grps = dikpgdu Water_and_ions
> > > tau_t = 0.1 0.1
> > > ref_t = 298 298
> > >
> > > ; Pressure coupling
> > > Pcoupl = Berendsen
> > > Pcoupltype = Isotropic
> > > tau_p = 1.0
> > > compressibility = 4.5e-5
> > > ref_p = 1.0
> > >
> > > ; Dispersion correction
> > > DispCorr = EnerPres ; account for cut-off vdW scheme
> > >
> > > ; GENERATE VELOCITIES FOR STARTUP RUN
> > > gen_vel = no
> > >
> > > ; OPTIONS FOR BONDS
> > > continuation = yes
> > > constraints = hbonds
> > > constraint-algorithm = Lincs
> > > lincs-order = 4
> > > lincs-iter = 1
> > > lincs-warnangle = 30
> > >
> > >
> > >
> > > I am surprised that the coulomb cutoffs of 1.073 vs 1.000 could cause
> > > 5-fold performance difference, and why would they be different in the
> > first
> > > place if identical input files were used?
> > >
> > > I haven't found anything peculiar on the cluster I am using.
> > >
> > > Any suggestions for the issue?
> > >
> > > Thanks,
> > > Yun
> > > --
> > > Gromacs Users mailing list
> > >
> > > * Please search the archive at
> > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > > posting!
> > >
> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >
> > > * For (un)subscribe requests visit
> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > > send a mail to gmx-users-request at gromacs.org.
> > >
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> >
> --
> Gromacs Users mailing list
>
> * Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
send a mail to gmx-users-request at gromacs.org.
More information about the gromacs.org_gmx-users
mailing list