[gmx-users] paralel scaleing

Thu Aug 7 17:42:01 CEST 2003

I try gromacs on 4 cpus (2 nodes Pentium linux with Myrinet link),
unfortunately, it runs nearly the same speed as in 2 cpus (1 node)
I check the output , here is the output in log file:

               (Mnbf/s)   (GFlops) (ps/NODE hour) (NODE hour/ns)
Performance:     26.848      1.938     30.782     32.486
Detailed load balancing info in percentage of average

Type        NODE:  0   1   2   3 Scaling
---------------------------------------
             LJ:396   0   0   3     25%
          LJ(S):  0   0 386  13     25%
   LJ + Coulomb:400   0   0   0     25%
LJ + Coulomb(T):386   0   0  13     25%
LJ + Coulomb(T)(S): 94  97 110  97     90%
Innerloop-Iatom: 88  82  94 134     74%
Spread Q Bspline: 99  99 100  99     99%
Gather F Bspline: 99  99 100  99     99%
         3D-FFT:100 100 100 100    100%
      Solve PME:100 100 100 100    100%
       NS-Pairs: 98  95 108  97     92%
   Reset In Box: 99  99 100  99     99%
        Shift-X:100 100 100  99     99%
         CG-CoM: 95 101 101 101     98%
     Sum Forces:100 100 100  99     99%
          Bonds:400   0   0   0     25%
         Angles:400   0   0   0     25%
        Propers:400   0   0   0     25%
   RB-Dihedrals:400   0   0   0     25%
   Dist. Restr.:400   0   0   0     25%
         Virial: 99  99 100  99     99%
         Update: 99  99 100  99     99%
        Stop-CM: 99  99 100  99     99%
     P-Coupling: 99  99 100  99     99%
      Calc-Ekin: 99  99 100  99     99%
          Lincs:400   0   0   0     25%
      Lincs-Mat:400   0   0   0     25%
        Shake-V: 99  99 100  99     99%
      Shake-Vir: 99  99 100  99     99%
         Settle: 91 102 102 102     97%
         Dummy2:400   0   0   0     25%

    Total Force:103  94 106  94     93%

    Total Shake: 95 101 101 101     98%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Compared with 2 cpus (1 nodes) :

               (Mnbf/s)   (GFlops) (ps/NODE hour) (NODE hour/ns)
Performance:     23.269      1.638     26.726     37.417

Detailed load balancing info in percentage of average
Type        NODE:  0   1 Scaling
-------------------------------
             LJ:198   1     50%
          LJ(S):  0 200     50%
   LJ + Coulomb:200   0     50%
LJ + Coulomb(T):193   6     51%
LJ + Coulomb(T)(S): 95 104     95%
Innerloop-Iatom: 85 114     87%
Spread Q Bspline:100  99     99%
Gather F Bspline:100  99     99%
         3D-FFT:100 100    100%
      Solve PME:100 100    100%
       NS-Pairs: 97 102     97%
   Reset In Box:100  99     99%
        Shift-X:100  99     99%
         CG-CoM: 98 101     98%
     Sum Forces:100  99     99%
          Bonds:200   0     50%
         Angles:200   0     50%
        Propers:200   0     50%
   RB-Dihedrals:200   0     50%
   Dist. Restr.:200   0     50%
         Virial:100  99     99%
         Update:100  99     99%
        Stop-CM:100  99     99%
     P-Coupling:100  99     99%
      Calc-Ekin:100  99     99%
          Lincs:200   0     50%
      Lincs-Mat:200   0     50%
        Shake-V:100  99     99%
      Shake-Vir:100  99     99%
         Settle: 97 102     97%
         Dummy2:200   0     50%

    Total Force: 99 100     99%

    Total Shake: 98 101     98%

Total Scaling: 99% of max performance

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

Here I found that the calculations of LJ parts which consuming lots of
cpus are not paralleled at all. Maybe this is the reasons why scaling
factor  is not increaded greatly from 2 cpus to 4 cpus.
DO you agree with me ?
How to improve ?
I use gromacs 3.1.4.

Yuguang