[gmx-users] why Blue Gene/Q is so slow?
Mark Abraham
Mark.Abraham at anu.edu.au
Tue Jul 17 10:40:05 CEST 2012
On 17/07/2012 5:00 PM, DeChang Li wrote:
> Dear all,
>
> I am running a 9000 atom system with GBSA (Gromacs 4.5.5) in a
> Blue Gene/Q cluster. I got the speed 1.002 ns/day with 8 cores.
> However, in my own workstation with 8 cores the same system can reach
> nearly 10 ns/day (Intel(R) Xeon(R) CPU E5620 @ 2.40GHz). Can anyone
> tell me what's wrong in my simulation? Any suggestion will be
> appreciated.
Your workstation is running highly effective optimized SSE loops.
BlueGene/Q is not using its multiple FPU because that code hasn't been
written (for explicit or implicit solvation), and BlueGene's processors
are probably slower too.
Mark
>
> Following is my md.mdp file:
>
> constraints = hbonds
> constraint_algorithm = LINCS
> lincs_order = 4
> comm_mode = Angular
> comm_grps = system
> integrator = sd
> ;annealing = single single
> ;annealing_npoints = 2 2
> ;annealing_time = 0 500 0 500
> ;annealing_temp = 200 300 200 300
> dt = 0.002 ; ps !
> nsteps = 5000000 ; total 5000 ps.
> nstcomm = 10
> nstcalcenergy = 10
> nstxout = 10000 ; collect data every 1 ps
> nstenergy = 10000
> nstvout = 10000
> nstlog = 1000
> ;nstxtcout = 50000
> ;xtc_grps = system
> nstfout = 0
> nstlist = 10
> ns_type = grid
> pbc = no
> rlist = 1.2
> coulombtype = cut-off
> rcoulomb = 1.2
> rvdw = 1.2
> fourierspacing = 0.12
> fourier_nx = 0
> fourier_ny = 0
> fourier_nz = 0
> pme_order = 4
> ewald_rtol = 1e-5
> optimize_fft = yes
> ;energygrps = alpha1 alpha2 alpha3 beta1 beta2 beta3 gamma
> ;DispCorr = EnerPres
> ; Berendsen temperature coupling is on in two groups
> Tcoupl =
> tau_t = 0.5
> tc-grps = system
> ref_t = 300
> ; Pressure coupling is on
> Pcoupl = no ;berendsen
> tau_p = 1.0
> compressibility = 4.5e-5
> ref_p = 1.0
> ; Generate velocites is on at 300 K.
> gen_vel = yes
> gen_temp = 300
> gen_seed = -1
>
> implicit_solvent = GBSA
> gb_algorithm = OBC
> rgbradii = 1.2
> sa_surface_tension = 2.25936
>
>
>
> Here is the preformace info:
>
> M E G A - F L O P S A C C O U N T I N G
>
> RF=Reaction-Field FE=Free Energy SCFE=Soft-Core/Free Energy
> T=Tabulated W3=SPC/TIP3p W4=TIP4p (single or pairs)
> NF=No Forces
>
> Computing: M-Number M-Flops % Flops
> -----------------------------------------------------------------------------
> Generalized Born Coulomb 61.482892 2951.179 0.4
> GB Coulomb + LJ 2565.481100 156494.347 19.4
> Outer nonbonded loop 152.268546 1522.685 0.2
> 1,4 nonbonded interactions 116.143224 10452.890 1.3
> Born radii (HCT/OBC) 2868.222234 524884.669 64.9
> Born force chain rule 2868.222234 43023.334 5.3
> NS-Pairs 516.814696 10853.109 1.3
> Reset In Box 4.464788 13.394 0.0
> CG-CoM 4.482576 13.448 0.0
> Bonds 22.174434 1308.292 0.2
> Angles 80.586114 13538.467 1.7
> Propers 160.742142 36809.951 4.6
> Virial 4.636254 83.453 0.0
> Update 44.478894 1378.846 0.2
> Stop-CM 4.455894 44.559 0.0
> Calc-Ekin 44.487788 1201.170 0.1
> Lincs 44.951630 2697.098 0.3
> Lincs-Mat 261.822552 1047.290 0.1
> Constraint-V 44.951630 359.613 0.0
> Constraint-Vir 2.251163 54.028 0.0
> -----------------------------------------------------------------------------
> Total 808731.820 100.0
> -----------------------------------------------------------------------------
>
>
> D O M A I N D E C O M P O S I T I O N S T A T I S T I C S
>
> av. #atoms communicated per step for force: 2 x 660.5
> av. #atoms communicated per step for LINCS: 2 x 34.3
>
> Average load imbalance: 1.7 %
> Part of the total run time spent waiting due to load imbalance: 1.4 %
>
>
> R E A L C Y C L E A N D T I M E A C C O U N T I N G
>
> Computing: Nodes Number G-Cycles Seconds %
> -----------------------------------------------------------------------
> Domain decomp. 8 502 59.421 37.1 0.5
> DD comm. load 8 8 0.004 0.0 0.0
> Comm. coord. 8 5001 16.575 10.4 0.2
> Neighbor search 8 502 136.093 85.1 1.2
> Force 8 5001 9744.582 6090.7 88.3
> Wait + Comm. F 8 5001 90.905 56.8 0.8
> Write traj. 8 2 0.954 0.6 0.0
> Update 8 5001 72.936 45.6 0.7
> Constraints 8 10002 171.445 107.2 1.6
> Comm. energies 8 502 10.427 6.5 0.1
> Rest 8 732.742 458.0 6.6
> -----------------------------------------------------------------------
> Total 8 11036.086 6897.9 100.0
> -----------------------------------------------------------------------
>
> Parallel run - timing based on wallclock.
>
> NODE (s) Real (s) (%)
> Time: 862.243 862.243 100.0
> 14:22
> (Mnbf/s) (MFlops) (ns/day) (hour/ns)
> Performance: 3.047 937.940 1.002 23.946
> Finished mdrun on node 0 Tue Jul 17 16:06:48 2012
More information about the gromacs.org_gmx-users
mailing list