[gmx-users] paralel scaleing
Yuguang Mu
ygmu at theochem.uni-frankfurt.de
Thu Aug 7 17:42:01 CEST 2003
I try gromacs on 4 cpus (2 nodes Pentium linux with Myrinet link),
unfortunately, it runs nearly the same speed as in 2 cpus (1 node)
I check the output , here is the output in log file:
(Mnbf/s) (GFlops) (ps/NODE hour) (NODE hour/ns)
Performance: 26.848 1.938 30.782 32.486
Detailed load balancing info in percentage of average
Type NODE: 0 1 2 3 Scaling
---------------------------------------
LJ:396 0 0 3 25%
LJ(S): 0 0 386 13 25%
LJ + Coulomb:400 0 0 0 25%
LJ + Coulomb(T):386 0 0 13 25%
LJ + Coulomb(T)(S): 94 97 110 97 90%
Innerloop-Iatom: 88 82 94 134 74%
Spread Q Bspline: 99 99 100 99 99%
Gather F Bspline: 99 99 100 99 99%
3D-FFT:100 100 100 100 100%
Solve PME:100 100 100 100 100%
NS-Pairs: 98 95 108 97 92%
Reset In Box: 99 99 100 99 99%
Shift-X:100 100 100 99 99%
CG-CoM: 95 101 101 101 98%
Sum Forces:100 100 100 99 99%
Bonds:400 0 0 0 25%
Angles:400 0 0 0 25%
Propers:400 0 0 0 25%
RB-Dihedrals:400 0 0 0 25%
Dist. Restr.:400 0 0 0 25%
Virial: 99 99 100 99 99%
Update: 99 99 100 99 99%
Stop-CM: 99 99 100 99 99%
P-Coupling: 99 99 100 99 99%
Calc-Ekin: 99 99 100 99 99%
Lincs:400 0 0 0 25%
Lincs-Mat:400 0 0 0 25%
Shake-V: 99 99 100 99 99%
Shake-Vir: 99 99 100 99 99%
Settle: 91 102 102 102 97%
Dummy2:400 0 0 0 25%
Total Force:103 94 106 94 93%
Total Shake: 95 101 101 101 98%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Compared with 2 cpus (1 nodes) :
(Mnbf/s) (GFlops) (ps/NODE hour) (NODE hour/ns)
Performance: 23.269 1.638 26.726 37.417
Detailed load balancing info in percentage of average
Type NODE: 0 1 Scaling
-------------------------------
LJ:198 1 50%
LJ(S): 0 200 50%
LJ + Coulomb:200 0 50%
LJ + Coulomb(T):193 6 51%
LJ + Coulomb(T)(S): 95 104 95%
Innerloop-Iatom: 85 114 87%
Spread Q Bspline:100 99 99%
Gather F Bspline:100 99 99%
3D-FFT:100 100 100%
Solve PME:100 100 100%
NS-Pairs: 97 102 97%
Reset In Box:100 99 99%
Shift-X:100 99 99%
CG-CoM: 98 101 98%
Sum Forces:100 99 99%
Bonds:200 0 50%
Angles:200 0 50%
Propers:200 0 50%
RB-Dihedrals:200 0 50%
Dist. Restr.:200 0 50%
Virial:100 99 99%
Update:100 99 99%
Stop-CM:100 99 99%
P-Coupling:100 99 99%
Calc-Ekin:100 99 99%
Lincs:200 0 50%
Lincs-Mat:200 0 50%
Shake-V:100 99 99%
Shake-Vir:100 99 99%
Settle: 97 102 97%
Dummy2:200 0 50%
Total Force: 99 100 99%
Total Shake: 98 101 98%
Total Scaling: 99% of max performance
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
Here I found that the calculations of LJ parts which consuming lots of
cpus are not paralleled at all. Maybe this is the reasons why scaling
factor is not increaded greatly from 2 cpus to 4 cpus.
DO you agree with me ?
How to improve ?
I use gromacs 3.1.4.
Yuguang
More information about the gromacs.org_gmx-users
mailing list