[gmx-users] paralel scaleing
David van der Spoel
spoel at xray.bmc.uu.se
Thu Aug 7 17:54:01 CEST 2003
On Thu, 2003-08-07 at 17:41, Yuguang Mu wrote:
> I try gromacs on 4 cpus (2 nodes Pentium linux with Myrinet link),
> unfortunately, it runs nearly the same speed as in 2 cpus (1 node)
> I check the output , here is the output in log file:
>
>
> (Mnbf/s) (GFlops) (ps/NODE hour) (NODE hour/ns)
> Performance: 26.848 1.938 30.782 32.486
> Detailed load balancing info in percentage of average
>
> Type NODE: 0 1 2 3 Scaling
> ---------------------------------------
> LJ:396 0 0 3 25%
> LJ(S): 0 0 386 13 25%
> LJ + Coulomb:400 0 0 0 25%
> LJ + Coulomb(T):386 0 0 13 25%
> LJ + Coulomb(T)(S): 94 97 110 97 90%
> Innerloop-Iatom: 88 82 94 134 74%
> Spread Q Bspline: 99 99 100 99 99%
> Gather F Bspline: 99 99 100 99 99%
> 3D-FFT:100 100 100 100 100%
> Solve PME:100 100 100 100 100%
> NS-Pairs: 98 95 108 97 92%
> Reset In Box: 99 99 100 99 99%
> Shift-X:100 100 100 99 99%
> CG-CoM: 95 101 101 101 98%
> Sum Forces:100 100 100 99 99%
> Bonds:400 0 0 0 25%
> Angles:400 0 0 0 25%
> Propers:400 0 0 0 25%
> RB-Dihedrals:400 0 0 0 25%
> Dist. Restr.:400 0 0 0 25%
> Virial: 99 99 100 99 99%
> Update: 99 99 100 99 99%
> Stop-CM: 99 99 100 99 99%
> P-Coupling: 99 99 100 99 99%
> Calc-Ekin: 99 99 100 99 99%
> Lincs:400 0 0 0 25%
> Lincs-Mat:400 0 0 0 25%
> Shake-V: 99 99 100 99 99%
> Shake-Vir: 99 99 100 99 99%
> Settle: 91 102 102 102 97%
> Dummy2:400 0 0 0 25%
>
> Total Force:103 94 106 94 93%
>
>
> Total Shake: 95 101 101 101 98%
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> Compared with 2 cpus (1 nodes) :
>
>
> (Mnbf/s) (GFlops) (ps/NODE hour) (NODE hour/ns)
> Performance: 23.269 1.638 26.726 37.417
>
> Detailed load balancing info in percentage of average
> Type NODE: 0 1 Scaling
> -------------------------------
> LJ:198 1 50%
> LJ(S): 0 200 50%
> LJ + Coulomb:200 0 50%
> LJ + Coulomb(T):193 6 51%
> LJ + Coulomb(T)(S): 95 104 95%
> Innerloop-Iatom: 85 114 87%
> Spread Q Bspline:100 99 99%
> Gather F Bspline:100 99 99%
> 3D-FFT:100 100 100%
> Solve PME:100 100 100%
> NS-Pairs: 97 102 97%
> Reset In Box:100 99 99%
> Shift-X:100 99 99%
> CG-CoM: 98 101 98%
> Sum Forces:100 99 99%
> Bonds:200 0 50%
> Angles:200 0 50%
> Propers:200 0 50%
> RB-Dihedrals:200 0 50%
> Dist. Restr.:200 0 50%
> Virial:100 99 99%
> Update:100 99 99%
> Stop-CM:100 99 99%
> P-Coupling:100 99 99%
> Calc-Ekin:100 99 99%
> Lincs:200 0 50%
> Lincs-Mat:200 0 50%
> Shake-V:100 99 99%
> Shake-Vir:100 99 99%
> Settle: 97 102 97%
> Dummy2:200 0 50%
>
> Total Force: 99 100 99%
>
>
> Total Shake: 98 101 98%
>
>
> Total Scaling: 99% of max performance
>
> $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
>
> Here I found that the calculations of LJ parts which consuming lots of
> cpus are not paralleled at all. Maybe this is the reasons why scaling
> factor is not increaded greatly from 2 cpus to 4 cpus.
> DO you agree with me ?
> How to improve ?
> I use gromacs 3.1.4.
PME scaling is poor, but it depends on your system size too.
Most of the work is in the LJ(S) solvent loops though, so don't worry
about the other term.
--
Groeten, David.
________________________________________________________________________
Dr. David van der Spoel, Dept. of Cell & Mol. Biology
Husargatan 3, Box 596, 75124 Uppsala, Sweden
phone: 46 18 471 4205 fax: 46 18 511 755
spoel at xray.bmc.uu.se spoel at gromacs.org http://xray.bmc.uu.se/~spoel
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
More information about the gromacs.org_gmx-users
mailing list