[gmx-users] 2018 performance question

Tue Feb 27 22:05:55 CET 2018

Hi,
I did a number of test runs with gmx versions 2018 vs 2016.5, 
both on workstations with either GTX 780 or GTX1060 (CPU and other hardwareidentical: Intel Core i7-4930K CPU @ 3.40GHz, 6 cores, each two threads)
the system is an amorphous sample of some small organic molecule (a glass,about 5500 atoms, 250 molecules) at ambient pressure, plus a bit of simulatedannealing, otherwise vanilla MD (mdp below)

performance details in the table below.
what i see, in a nutshell, is:
1 for this system and hardware 2018 is about 50% faster than 2016 (good)
2 GTX1060 is about 25% faster than GTX 780 (rather disappointing considering the $$$)
3 playing with different mdrun options (-nt, -pin, -pme, etc...) did in no case improve speed.  compared to leaving all decisions to gromacs (smart gmx)
4 running of 2 jobs on the same node simultaneously, and splitting the cores, does not improve 
  overall performance although the CPU appears to be under-used compared to the GPU in all cases

in particular point 4 surprised me, and i wonder if anybody can suggest options i didn't trythat might result in a better performance!? (details from log files for individual cases canbe supplied if that helps)

cheers,michael
the table: first column: gmx version, second column: performance (ns/d). "x2" next to the versionmeans that two jobs were run simultaneously on a single node (in these cases the performance value isthe sum of the values from the two individual cases for a fair comparison)
third column: the mdrun command with all options used in each case.

GTX 1060            
========
2016   171.1 gmx mdrun -v -deffnm mcz
2016x2 141.8 gmx mdrun -nt 6 -v -ntmpi 1 -ntomp 6 -pin on -pinoffset 0 -deffnm mcz1
             gmx mdrun -nt 6 -v -ntmpi 1 -ntomp 6 -pin on -pinoffset 6 -deffnm mcz2
2016x2 119.5 gmx mdrun -v -nt 6 -pin on -pinoffset 0 -pinstride 1 -deffnm mcz1
             gmx mdrun -v -nt 6 -pin on -pinoffset 3 -pinstride 1 -deffnm mcz2
2018   255.6 gmx mdrun -v -deffnm mcz
2018x2 245.6 gmx mdrun -v -deffnm mcz1
             gmx mdrun -v -deffnm mcz2
2018x2 190.7 gmx mdrun -v -nt 6 -pin on -pinoffset 0 -pinstride 1 -deffnm mcz1
             gmx mdrun -v -nt 6 -pin on -pinoffset 3 -pinstride 1 -deffnm mcz2
2018   223.9 gmx mdrun -v -deffnm mcz -pmefft cpu
2018   136.8 gmx mdrun -v -deffnm mcz -nt 12 -ntmpi 2 -ntomp 6
2018   138.1 gmx mdrun -v -deffnm mcz -nt 12 -ntmpi 2 -ntomp 6 -pmefft cpu
2018x2 168.0 gmx mdrun -v -deffnm mcz1 -nt 6 -ntmpi 1 -ntomp 6 -pin on -pinoffset 0 -pinstride 1 -pmefft cpu 
             gmx mdrun -v -deffnm mcz2 -nt 6 -ntmpi 1 -ntomp 6 -pin on -pinoffset 3 -pinstride 1 -pmefft cpu
2018   222.0 gmx mdrun -nt 12 -v -deffnm mcz -pme cpu

GTX 780      
=======   
2016   188.0 gmx mdrun -v -deffnm mcz
2016x2 155.3 gmx mdrun -nt 6 -v -ntmpi 1 -ntomp 6 -pin on -pinoffset 0 -deffnm mcz1
             gmx mdrun -nt 6 -v -ntmpi 1 -ntomp 6 -pin on -pinoffset 6 -deffnm mcz2
2016x2 112.7 gmx mdrun -v -nt 6 -pin on -pinoffset 0 -pinstride 1 -deffnm mcz1
             gmx mdrun -v -nt 6 -pin on -pinoffset 3 -pinstride 1 -deffnm mcz2
2018   202.8 gmx mdrun -v -deffnm mcz
2018x2 198.3 gmx mdrun -v -deffnm mcz1
             gmx mdrun -v -deffnm mcz2
2018x2 DEAD  gmx mdrun -v -nt 6 -pin on -pinoffset 0 -pinstride 1 -deffnm mcz1
             gmx mdrun -v -nt 6 -pin on -pinoffset 3 -pinstride 1 -deffnm mcz2             the above jobs died on me without further notice, both gmx processes were             still running (used CPU as seen with top) but gave no more output
2018   203.5 gmx mdrun -v -deffnm mcz -pmefft cpu
2018   136.9 gmx mdrun -v -deffnm mcz -nt 12 -ntmpi 2 -ntomp 6 
2018   137.1 gmx mdrun -v -deffnm mcz -nt 12 -ntmpi 2 -ntomp 6 -pmefft cpu
2018x2 152.8 gmx mdrun -v -deffnm mcz1 -nt 6 -ntmpi 1 -ntomp 6 -pin on -pinoffset 0 -pinstride 1 -pmefft cpu
             gmx mdrun -v -deffnm mcz2 -nt 6 -ntmpi 1 -ntomp 6 -pin on -pinoffset 3 -pinstride 1 -pmefft cpu 
2018   188.4 gmx mdrun -nt 12 -v -deffnm mcz -pme cpu

mdp-file
integrator        = md
dt                = 0.001
nsteps            = 1000000
comm-grps         = System
;
nstxout           = 0
nstvout           = 0
nstfout           = 0
nstlog            = 1000
nstenergy         = 1000
;
nstlist                  = 40
ns_type                  = grid
pbc                      = xyz
rlist                    = 1.2
cutoff-scheme            = Verlet
;
coulombtype              = PME
rcoulomb                 = 1.2
vdw_type                 = cut-off 
rvdw                     = 1.2
;
constraints              = none
;
tcoupl             = v-rescale
tau-t              = 0.1
ref-t              = 300
tc-grps            = System
;
pcoupl             = berendsen
pcoupltype         = anisotropic
tau-p              = 2.0
compressibility    = 4.5e-5 4.5e-5 4.5e-5 0 0 0
ref-p              = 1 1 1 0 0 0
;
annealing          = single
annealing-npoints  = 2
annealing-time     = 0 10
annealing-temp     = 511 491

      From: Szilárd Páll <pall.szilard at gmail.com>
 To: Michael Brunsteiner <mbx0009 at yahoo.com> 
 Sent: Thursday, February 22, 2018 4:15 PM
 Subject: Re: [gmx-users] 2018 performance question

Hi,

What I meant is _not_ that you should scale the GPU-accelerated
GROMACS across multiple GPUs -- the scaling efficiency is not great
and depends a lot on the system size.

What I meant instead is that the GROMACS 2018 release now requires
fewer cores/GPU to get near peak performance (in single-GPU mode) and
therefore, whereas you may not see a lot of improvement if you just
keep using the 6 cores/GPU in your machine, you generally should be
able to run 2-3 simulations side-by-side, each on a separate GPU, but
each using only 2-3 cores.

Cheers,
--
Szilárd

On Thu, Feb 22, 2018 at 9:59 AM, Michael Brunsteiner <mbx0009 at yahoo.com> wrote:
> From: Szilárd Páll <pall.szilard at gmail.com>
>
>
> To: Michael Brunsteiner <mbx0009 at yahoo.com>
> Sent: Wednesday, February 21, 2018 9:16 PM
> Subject: Re: [gmx-users] 2018 performance question
>
>> Hi Michael,
>>
>> Why not use both GPUs, you should be able to get up to 80% performance
>> on just 3 of the 6 cores.
>
> its a bit more complicated. i have 4 machines with identical hardware
> but one of the 4 graphic cards started giving me seg-faults lately,
> some of the RAM must be broken there ...
> so i need to replace that in any  case ... but if you suggest that gmx
> can use two different graphics cards (780 and 1060) simultaneously I'll
> certainly give that a try on the 3 others.
> thanks for your help!
> regards,
> Michael
>
>
>