[gmx-users] 2018: large performance variations

Szilárd Páll pall.szilard at gmail.com
Mon Mar 5 13:54:29 CET 2018


Hi,

Please keep the conversation on the gmx-users list.

On Sun, Mar 4, 2018 at 2:58 PM, Michael Brunsteiner <mbx0009 at yahoo.com>
wrote:

>
> also: in the meantime i tried "-notunepme -dlb yes" and this gave me in
> all cases
> i tried so far a performance comparable to the best performance with
> tunepme.
> in fact i do not quite understand why "dlb yes" (instead of tunepme) is
> not the default
> setting, or does dlb come with such a large overhead?
>

"DLB" and "PME tuning" a two entirely different things:
- DLB is the domain-decomposition dynamic load balancing (scaled domain
size to balance load among ranks)
- "PME tuning" is load balancing between the short- and long-range
electrostatics (i.e. shifting work from CPU to GPU or from PME ranks to PP
ranks).


> ps: different topic: i assume you have some experience with graphics cards.
> I just bought four new GTX 1060, and did memtestG80 ... turns out at least
> one, probably two, of the 4 cards have damaged memory ... i wonder am i
> just
> unlucky, or was this to be expected? also, in spite of memtestG80 showing
> errors Gromacs seems to run without hickups using these cards ...
> does this mean i can ignore the errors reported by memtestG80?
>

Just because mdrun does not crash or seemingly produces correct results, I
would not trust cards that produce errors. GROMACS generally uses a fairly
small amount of GPU memory and puts only a moderate load on the GPU memory.
That's not to say however that you won't get corruption when you e.g. run
for longer, warm the GPUs up more, etc.

I'd strongly suggest doing a thorough burn-in test and avoid using GPUs
that are known to be unstable.
I'd also recommend the cuda-memtest tool (instead of the AFAIK
outdated/unmaintained memtestG80).

Cheers,
--
Szilárd




>
>
>
> =============================== Why be happy when you could be normal?
>
>
> ------------------------------
> *From:* Szilárd Páll <pall.szilard at gmail.com>
> *To:* Discussion list for GROMACS users <gmx-users at gromacs.org>; Michael
> Brunsteiner <mbx0009 at yahoo.com>
> *Sent:* Friday, March 2, 2018 7:29 PM
> *Subject:* Re: [gmx-users] 2018: large performance variations
>
> BTW, we have considered adding a warmup delay to the tuner, would you be
> willing to help testing (or even contributing such a feature)?
>
> --
> Szilárd
>
> On Fri, Mar 2, 2018 at 7:28 PM, Szilárd Páll <pall.szilard at gmail.com>
> wrote:
>
> Hi Michael,
>
> Can you post full logs, please? This is likely related to a known issue
> where CPU cores (and in some cases GPUs too) may take longer to clock up
> and get a stable performance than the time the auto-tuner takes to do a few
> cycles of measurements.
>
> Unfortunately we do not have a good solution for this, but what you can do
> make runs more consistent is:
> - try "warming up" the CPU/GPU before production runs (e.g. stress -c or
> just a dummy 30 sec mdrun run)
> - repeat the benchmark a few times, see which cutoff / grid setting is
> best, set that in the mdp options and run with -notunepme
>
> Of course the latter may be too tedious if you have a variety of
> systems/inputs to run.
>
> Regarding tune_pme: that issue is related to resetting timings too early
> (for -resetstep see mdrun -h -hidden); not sure if we have a fix, but
> either way tune_pme is more suited for parallel runs' separate PME rank
> count tuning.
>
> Cheers,
>
> --
> Szilárd
>
> On Thu, Mar 1, 2018 at 7:11 PM, Michael Brunsteiner <mbx0009 at yahoo.com>
> wrote:
>
> Hi,I ran a few MD runs with identical input files (the SAME tpr file. mdp
> included below) on the same computer
> with gmx 2018 and observed rather large performance variations (~50%) as
> in:
> grep Performance */mcz1.log7/mcz1.log:Performan ce:       98.510
> 0.244
>
> 7d/mcz1.log:Performance:      140.733        0.171
> 7e/mcz1.log:Performance:      115.586        0.208
> 7f/mcz1.log:Performance:      139.197        0.172
>
> turns out the load balancing effort that is done at the beginning gives
> quite different results:
> grep "optimal pme grid" */mcz1.log
> 7/mcz1.log:              optimal pme grid 32 32 28, coulomb cutoff 1.394
> 7d/mcz1.log:              optimal pme grid 36 36 32, coulomb cutoff 1.239
> 7e/mcz1.log:              optimal pme grid 25 24 24, coulomb cutoff 1.784
> 7f/mcz1.log:              optimal pme grid 40 36 32, coulomb cutoff 1.200
>
> next i tried tune_pme as in:gmx tune_pme -mdrun 'gmx mdrun' -nt 6 -ntmpi 1
> -ntomp 6 -pin on -pinoffset 0 -s mcz1.tpr  -pmefft cpu -pinstride 1 -r 10
> which didn't work ... in some log file it says:Fatal error:
> PME tuning was still active when attempting to reset mdrun counters at step
> 1500. Try resetting counters later in the run, e.g. with gmx mdrun
> -resetstep.
>
> i found no documentation regarding "-resetstep"  ...
>
> i could of course optimize the PME grid manually but since i plan to run a
> large numberof jobs with different systems and sizes this would be a lot of
> work and if possible i'd like to avoid that.
> is there any way to ask gmx to perform more tests at the beginning of
> therun when optimizing the PME grid?or is using "-notunepme -dlb yes" an
> option, and does the latter require aconcurrent optimization of the domain
> decomposition, if so how is this done?
> thanks for any help!
> michael
>
>
> mdp:
> integrator        = md
> dt                = 0.001
> nsteps            = 500000
> comm-grps         = System
> ;
> nstxout           = 0
> nstvout           = 0
> nstfout           = 0
> nstlog            = 1000
> nstenergy         = 1000
> ;
> nstlist                  = 40
> ns_type                  = grid
> pbc                      = xyz
> rlist                    = 1.2
> cutoff-scheme            = Verlet
> ;
> coulombtype              = PME
> rcoulomb                 = 1.2
> vdw_type                 = cut-off
> rvdw                     = 1.2
> ;
> constraints              = none
> ;
> tcoupl             = v-rescale
> tau-t              = 0.1
> ref-t              = 300
> tc-grps            = System
> ;
> pcoupl             = berendsen
> pcoupltype         = anisotropic
> tau-p              = 2.0
> compressibility    = 4.5e-5 4.5e-5 4.5e-5 0 0 0
> ref-p              = 1 1 1 0 0 0
> ;
> annealing          = single
> annealing-npoints  = 2
> annealing-time     = 0 500
> annealing-temp     = 500 480
>
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/Support
> /Mailing_Lists/GMX-Users_List
> <http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List> before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support /Mailing_Lists
> <http://www.gromacs.org/Support/Mailing_Lists>
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/ma ilman/listinfo/gromacs.org_gmx -users
> <https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users> or
> send a mail to gmx-users-request at gromacs.org.
>
>
>
>
>
>


More information about the gromacs.org_gmx-users mailing list