[gmx-users] why Blue Gene/Q is so slow?

Mark Abraham Mark.Abraham at anu.edu.au
Tue Jul 17 10:40:05 CEST 2012


On 17/07/2012 5:00 PM, DeChang Li wrote:
> Dear all,
>
>       I am running a 9000 atom system with GBSA (Gromacs 4.5.5) in a
> Blue Gene/Q cluster. I got the speed 1.002 ns/day with 8 cores.
> However, in my own workstation with 8 cores the same system can reach
> nearly 10 ns/day (Intel(R) Xeon(R) CPU E5620  @ 2.40GHz). Can anyone
> tell me what's wrong in my simulation? Any suggestion will be
> appreciated.

Your workstation is running highly effective optimized SSE loops. 
BlueGene/Q is not using its multiple FPU because that code hasn't been 
written (for explicit or implicit solvation), and BlueGene's processors 
are probably slower too.

Mark
>
> Following is my md.mdp file:
>
> constraints            = hbonds
> constraint_algorithm   = LINCS
> lincs_order            = 4
> comm_mode              = Angular
> comm_grps              = system
> integrator             = sd
> ;annealing           = single single
> ;annealing_npoints   = 2 2
> ;annealing_time      = 0 500 0 500
> ;annealing_temp      = 200 300 200 300
> dt                     = 0.002 ; ps !
> nsteps                 = 5000000 ; total 5000 ps.
> nstcomm                = 10
> nstcalcenergy           = 10
> nstxout                = 10000 ; collect data every 1 ps
> nstenergy              = 10000
> nstvout                = 10000
> nstlog                 = 1000
> ;nstxtcout              = 50000
> ;xtc_grps               = system
> nstfout                = 0
> nstlist                = 10
> ns_type                = grid
> pbc                    = no
> rlist                  = 1.2
> coulombtype            = cut-off
> rcoulomb               = 1.2
> rvdw                   = 1.2
> fourierspacing         = 0.12
> fourier_nx             = 0
> fourier_ny             = 0
> fourier_nz             = 0
> pme_order              = 4
> ewald_rtol             = 1e-5
> optimize_fft           = yes
> ;energygrps             = alpha1 alpha2 alpha3 beta1 beta2 beta3 gamma
> ;DispCorr               = EnerPres
> ; Berendsen temperature coupling is on in two groups
> Tcoupl                 =
> tau_t                  = 0.5
> tc-grps                = system
> ref_t                  = 300
> ; Pressure coupling is on
> Pcoupl                 = no ;berendsen
> tau_p                  = 1.0
> compressibility        = 4.5e-5
> ref_p                  = 1.0
> ; Generate velocites is on at 300 K.
> gen_vel                = yes
> gen_temp               = 300
> gen_seed               = -1
>
> implicit_solvent       = GBSA
> gb_algorithm           = OBC
> rgbradii               = 1.2
> sa_surface_tension     = 2.25936
>
>
>
> Here is the preformace info:
>
>          M E G A - F L O P S   A C C O U N T I N G
>
>     RF=Reaction-Field  FE=Free Energy  SCFE=Soft-Core/Free Energy
>     T=Tabulated        W3=SPC/TIP3p    W4=TIP4p (single or pairs)
>     NF=No Forces
>
>   Computing:                               M-Number         M-Flops  % Flops
> -----------------------------------------------------------------------------
>   Generalized Born Coulomb                61.482892        2951.179     0.4
>   GB Coulomb + LJ                       2565.481100      156494.347    19.4
>   Outer nonbonded loop                   152.268546        1522.685     0.2
>   1,4 nonbonded interactions             116.143224       10452.890     1.3
>   Born radii (HCT/OBC)                  2868.222234      524884.669    64.9
>   Born force chain rule                 2868.222234       43023.334     5.3
>   NS-Pairs                               516.814696       10853.109     1.3
>   Reset In Box                             4.464788          13.394     0.0
>   CG-CoM                                   4.482576          13.448     0.0
>   Bonds                                   22.174434        1308.292     0.2
>   Angles                                  80.586114       13538.467     1.7
>   Propers                                160.742142       36809.951     4.6
>   Virial                                   4.636254          83.453     0.0
>   Update                                  44.478894        1378.846     0.2
>   Stop-CM                                  4.455894          44.559     0.0
>   Calc-Ekin                               44.487788        1201.170     0.1
>   Lincs                                   44.951630        2697.098     0.3
>   Lincs-Mat                              261.822552        1047.290     0.1
>   Constraint-V                            44.951630         359.613     0.0
>   Constraint-Vir                           2.251163          54.028     0.0
> -----------------------------------------------------------------------------
>   Total                                                  808731.820   100.0
> -----------------------------------------------------------------------------
>
>
>      D O M A I N   D E C O M P O S I T I O N   S T A T I S T I C S
>
>   av. #atoms communicated per step for force:  2 x 660.5
>   av. #atoms communicated per step for LINCS:  2 x 34.3
>
>   Average load imbalance: 1.7 %
>   Part of the total run time spent waiting due to load imbalance: 1.4 %
>
>
>       R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
>
>   Computing:         Nodes     Number     G-Cycles    Seconds     %
> -----------------------------------------------------------------------
>   Domain decomp.         8        502       59.421       37.1     0.5
>   DD comm. load          8          8        0.004        0.0     0.0
>   Comm. coord.           8       5001       16.575       10.4     0.2
>   Neighbor search        8        502      136.093       85.1     1.2
>   Force                  8       5001     9744.582     6090.7    88.3
>   Wait + Comm. F         8       5001       90.905       56.8     0.8
>   Write traj.            8          2        0.954        0.6     0.0
>   Update                 8       5001       72.936       45.6     0.7
>   Constraints            8      10002      171.445      107.2     1.6
>   Comm. energies         8        502       10.427        6.5     0.1
>   Rest                   8                 732.742      458.0     6.6
> -----------------------------------------------------------------------
>   Total                  8               11036.086     6897.9   100.0
> -----------------------------------------------------------------------
>
>          Parallel run - timing based on wallclock.
>
>                 NODE (s)   Real (s)      (%)
>         Time:    862.243    862.243    100.0
>                         14:22
>                 (Mnbf/s)   (MFlops)   (ns/day)  (hour/ns)
> Performance:      3.047    937.940      1.002     23.946
> Finished mdrun on node 0 Tue Jul 17 16:06:48 2012





More information about the gromacs.org_gmx-users mailing list