[gmx-users] Re: Re: why Blue Gene/Q is so slow? (Mark Abraham)
Mark Abraham
Mark.Abraham at anu.edu.au
Tue Jul 17 16:58:20 CEST 2012
On 17/07/2012 7:06 PM, DeChang Li wrote:
>> ------------------------------
>>
>> Message: 8
>> Date: Tue, 17 Jul 2012 18:40:05 +1000
>> From: Mark Abraham <Mark.Abraham at anu.edu.au>
>> Subject: Re: [gmx-users] why Blue Gene/Q is so slow?
>> To: Discussion list for GROMACS users <gmx-users at gromacs.org>
>> Message-ID: <500524E5.9050402 at anu.edu.au>
>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>
>> On 17/07/2012 5:00 PM, DeChang Li wrote:
>>> Dear all,
>>>
>>> I am running a 9000 atom system with GBSA (Gromacs 4.5.5) in a
>>> Blue Gene/Q cluster. I got the speed 1.002 ns/day with 8 cores.
>>> However, in my own workstation with 8 cores the same system can reach
>>> nearly 10 ns/day (Intel(R) Xeon(R) CPU E5620 @ 2.40GHz). Can anyone
>>> tell me what's wrong in my simulation? Any suggestion will be
>>> appreciated.
>> Your workstation is running highly effective optimized SSE loops.
>> BlueGene/Q is not using its multiple FPU because that code hasn't been
>> written (for explicit or implicit solvation), and BlueGene's processors
>> are probably slower too.
>>
>> Mark
> That means the code itself causes only 10% speed in BlueGene/Q
> compared with intel CPUs workstation?
You'd see a comparable decrease if you would turn off the SSE
optimization on your workstation, but perhaps not as severe. There's art
and skill in making code run fast, and it's very rare that you don't
need to target a specific architecture to achieve it.
> Is there any method to improve
> the speed in BG/Q?
Write the optimized code ;-) Also, use more of the machine - you can
probably get down to 500 atoms/core or below. There will be a limit
beyond which it's impossible to go (or be effective). You can try
simulating without cut-offs (see parts of manual 7.3 and mailing list
discussions) which uses different all-vs-all inner loops, but your
system might be too large for that to be useful.
Mark
>
>
> Dechang
>
>
>
>
>>> Following is my md.mdp file:
>>>
>>> constraints = hbonds
>>> constraint_algorithm = LINCS
>>> lincs_order = 4
>>> comm_mode = Angular
>>> comm_grps = system
>>> integrator = sd
>>> ;annealing = single single
>>> ;annealing_npoints = 2 2
>>> ;annealing_time = 0 500 0 500
>>> ;annealing_temp = 200 300 200 300
>>> dt = 0.002 ; ps !
>>> nsteps = 5000000 ; total 5000 ps.
>>> nstcomm = 10
>>> nstcalcenergy = 10
>>> nstxout = 10000 ; collect data every 1 ps
>>> nstenergy = 10000
>>> nstvout = 10000
>>> nstlog = 1000
>>> ;nstxtcout = 50000
>>> ;xtc_grps = system
>>> nstfout = 0
>>> nstlist = 10
>>> ns_type = grid
>>> pbc = no
>>> rlist = 1.2
>>> coulombtype = cut-off
>>> rcoulomb = 1.2
>>> rvdw = 1.2
>>> fourierspacing = 0.12
>>> fourier_nx = 0
>>> fourier_ny = 0
>>> fourier_nz = 0
>>> pme_order = 4
>>> ewald_rtol = 1e-5
>>> optimize_fft = yes
>>> ;energygrps = alpha1 alpha2 alpha3 beta1 beta2 beta3 gamma
>>> ;DispCorr = EnerPres
>>> ; Berendsen temperature coupling is on in two groups
>>> Tcoupl =
>>> tau_t = 0.5
>>> tc-grps = system
>>> ref_t = 300
>>> ; Pressure coupling is on
>>> Pcoupl = no ;berendsen
>>> tau_p = 1.0
>>> compressibility = 4.5e-5
>>> ref_p = 1.0
>>> ; Generate velocites is on at 300 K.
>>> gen_vel = yes
>>> gen_temp = 300
>>> gen_seed = -1
>>>
>>> implicit_solvent = GBSA
>>> gb_algorithm = OBC
>>> rgbradii = 1.2
>>> sa_surface_tension = 2.25936
>>>
>>>
>>>
>>> Here is the preformace info:
>>>
>>> M E G A - F L O P S A C C O U N T I N G
>>>
>>> RF=Reaction-Field FE=Free Energy SCFE=Soft-Core/Free Energy
>>> T=Tabulated W3=SPC/TIP3p W4=TIP4p (single or pairs)
>>> NF=No Forces
>>>
>>> Computing: M-Number M-Flops % Flops
>>> -----------------------------------------------------------------------------
>>> Generalized Born Coulomb 61.482892 2951.179 0.4
>>> GB Coulomb + LJ 2565.481100 156494.347 19.4
>>> Outer nonbonded loop 152.268546 1522.685 0.2
>>> 1,4 nonbonded interactions 116.143224 10452.890 1.3
>>> Born radii (HCT/OBC) 2868.222234 524884.669 64.9
>>> Born force chain rule 2868.222234 43023.334 5.3
>>> NS-Pairs 516.814696 10853.109 1.3
>>> Reset In Box 4.464788 13.394 0.0
>>> CG-CoM 4.482576 13.448 0.0
>>> Bonds 22.174434 1308.292 0.2
>>> Angles 80.586114 13538.467 1.7
>>> Propers 160.742142 36809.951 4.6
>>> Virial 4.636254 83.453 0.0
>>> Update 44.478894 1378.846 0.2
>>> Stop-CM 4.455894 44.559 0.0
>>> Calc-Ekin 44.487788 1201.170 0.1
>>> Lincs 44.951630 2697.098 0.3
>>> Lincs-Mat 261.822552 1047.290 0.1
>>> Constraint-V 44.951630 359.613 0.0
>>> Constraint-Vir 2.251163 54.028 0.0
>>> -----------------------------------------------------------------------------
>>> Total 808731.820 100.0
>>> -----------------------------------------------------------------------------
>>>
>>>
>>> D O M A I N D E C O M P O S I T I O N S T A T I S T I C S
>>>
>>> av. #atoms communicated per step for force: 2 x 660.5
>>> av. #atoms communicated per step for LINCS: 2 x 34.3
>>>
>>> Average load imbalance: 1.7 %
>>> Part of the total run time spent waiting due to load imbalance: 1.4 %
>>>
>>>
>>> R E A L C Y C L E A N D T I M E A C C O U N T I N G
>>>
>>> Computing: Nodes Number G-Cycles Seconds %
>>> -----------------------------------------------------------------------
>>> Domain decomp. 8 502 59.421 37.1 0.5
>>> DD comm. load 8 8 0.004 0.0 0.0
>>> Comm. coord. 8 5001 16.575 10.4 0.2
>>> Neighbor search 8 502 136.093 85.1 1.2
>>> Force 8 5001 9744.582 6090.7 88.3
>>> Wait + Comm. F 8 5001 90.905 56.8 0.8
>>> Write traj. 8 2 0.954 0.6 0.0
>>> Update 8 5001 72.936 45.6 0.7
>>> Constraints 8 10002 171.445 107.2 1.6
>>> Comm. energies 8 502 10.427 6.5 0.1
>>> Rest 8 732.742 458.0 6.6
>>> -----------------------------------------------------------------------
>>> Total 8 11036.086 6897.9 100.0
>>> -----------------------------------------------------------------------
>>>
>>> Parallel run - timing based on wallclock.
>>>
>>> NODE (s) Real (s) (%)
>>> Time: 862.243 862.243 100.0
>>> 14:22
>>> (Mnbf/s) (MFlops) (ns/day) (hour/ns)
>>> Performance: 3.047 937.940 1.002 23.946
>>> Finished mdrun on node 0 Tue Jul 17 16:06:48 2012
>>
More information about the gromacs.org_gmx-users
mailing list