[gmx-users] Timing variability

Mark Abraham Mark.Abraham at anu.edu.au
Wed Jun 22 18:44:54 CEST 2011


On 23/06/2011 2:32 AM, chris.neale at utoronto.ca wrote:
> Dear Users:
>
> Has anybody else looked at simulation speed (ns/day) over the segments 
> of long runs? I always benchmark and optimize my systems carefully, 
> but it was only recently that I realized how much variability I am 
> obtaining over long runs. Perhaps this is specific to my cluster, 
> which is one reason for my post.

You can get random and large hits if someone else is sharing your 
network (yes, even Infiniband) and hitting it hard enough. You can 
sort-of diagnose this after the fact by looking at the breakdown of 
GROMACS timings and seeing where runs spend their time. diff -y -W 160 
can be useful here. Look at the MPMD flowchart in the manual (Figure 
3.15) to get an idea which parts match up with what lines.

Speak with your admins and learn what you can. You might be able to 
mitigate the effects by using fewer processors, so that your 
computation/communication ratio goes up. Your throughput is lower, but 
your efficiency is higher. Or beg for dedicated network space to see 
what quiet-conditions performance looks like.

> Here is the performance (in ns/day) that I obtain with a single run. 
> This is representative of what I see with 30 other runs (see list 
> below). First, the distribution is bimodal, with peaks at around 80 
> and 90 ns/day. Second, the values go as low as 70 ns/day (and I have 
> seen as low as 50 ns/day when I look through all 30 run directories 
> that differ only by the position of the umbrella restraint).
>
> I am using gromacs 4.5.3 and I run it like this:
> mpirun mdrun_mpi -deffnm md1 -dlb yes -npme 16 -cpt 30 -maxh 47 -cpi 
> md1.cpt -cpo md1.cpt -px coord.xvg -pf force.xvg -noappend
>
> I have obtained similar behaviour also with another system.
>
> I have attempted to correlate timing with the node on which the job 
> runs, the output of the "^Grid:" in the .log file, the DD division of 
> PME and real-space and the value of pme mesh/force, all to no avail. I 
> have found one correlation whereby the very slow runs also indicate a 
> high relative load for PME.

That is suggestive of network noise - PME requires global 
intra-simulation communication.

> I suspect that it is some value that is being determined at run start 
> time that is affecting my performance, but I am not sure what this 
> could be. Perhaps the fourier-spacing is leading to different initial 
> grids?

I don't think "optimize_fft = yes" does much any more - but you can 
verify that by inspection of the start of the log file.

Mark

> Thank you (timing information and my .mdp file follows),
> Chris
>
> ########### Output the ns/day obtained by the run segments
> $ grep Perf *log|awk '{print $4}'
>
> 78.286
> 82.345
> 81.573
> 83.418
> 92.423
> 90.863
> 85.833
> 91.131
> 91.820
> 71.246
> 76.844
> 91.805
> 92.037
> 85.702
> 92.251
> 89.706
> 88.590
> 89.381
> 90.446
> 81.142
> 76.365
> 76.968
> 76.037
> 79.286
> 79.895
> 79.047
> 78.273
> 79.406
> 78.018
> 78.645
> 78.172
> 80.255
> 81.032
> 81.047
> 77.414
> 78.414
> 80.167
> 79.278
> 80.892
> 82.796
> 81.300
> 77.392
> 71.350
> 73.134
> 76.519
> 75.879
> 80.684
> 81.076
> 87.821
> 90.064
> 88.409
> 80.803
> 88.435
>
> ########### My .mdp file
>
> constraints = all-bonds
> lincs-iter =  1
> lincs-order =  6
> constraint_algorithm =  lincs
> integrator = sd
> dt = 0.004
> tinit = 1000000
> init_step            =  0
> nsteps = 1000000000
> nstcomm = 1
> nstxout = 1000000000
> nstvout = 1000000000
> nstfout = 1000000000
> nstxtcout = 25000
> nstenergy = 25000
> nstlist = 5
> nstlog=0 ; reduce log file size
> ns_type = grid
> rlist = 1
> rcoulomb = 1
> rvdw = 1
> coulombtype = PME
> ewald-rtol = 1e-5
> optimize_fft = yes
> fourierspacing = 0.12
> fourier_nx = 0
> fourier_ny = 0
> fourier_nz = 0
> pme_order = 4
> tc_grps             =  System
> tau_t               =  1.0
> ld_seed             =  -1
> ref_t = 300
> gen_temp = 300
> gen_vel = yes
> unconstrained_start = no
> gen_seed = -1
> Pcoupl = berendsen
> pcoupltype = semiisotropic
> tau_p = 4 4
> compressibility = 4.5e-5 4.5e-5
> ref_p = 1.0 1.0
>
> ; COM PULLING
> pull                     = umbrella
> pull_geometry            = position
> pull_dim                 = N N Y
> pull_start               = no
> pull_nstxout             = 250
> pull_nstfout             = 250
> pull_ngroups             = 1
> pull_group0              = POPC
> pull_pbcatom0            = 338
> pull_group1              = KSC
> pull_pbcatom1            = 0
> pull_init1               = 0 0 0.0
> pull_rate1               = 0
> pull_k1                  = 3000.0
> pull_vec1                = 0 0 0
> ;;;EOF
>
>




More information about the gromacs.org_gmx-users mailing list