[gmx-users] paralel scaleing

Thu Aug 7 17:59:01 CEST 2003

David van der Spoel wrote:
> 
> On Thu, 2003-08-07 at 17:41, Yuguang Mu wrote:
> > I try gromacs on 4 cpus (2 nodes Pentium linux with Myrinet link),
> > unfortunately, it runs nearly the same speed as in 2 cpus (1 node)
> > I check the output , here is the output in log file:
> >
> >
> >                (Mnbf/s)   (GFlops) (ps/NODE hour) (NODE hour/ns)
> > Performance:     26.848      1.938     30.782     32.486
> > Detailed load balancing info in percentage of average
> >
> > Type        NODE:  0   1   2   3 Scaling
> > ---------------------------------------
> >              LJ:396   0   0   3     25%
> >           LJ(S):  0   0 386  13     25%
> >    LJ + Coulomb:400   0   0   0     25%
> > LJ + Coulomb(T):386   0   0  13     25%
> > LJ + Coulomb(T)(S): 94  97 110  97     90%
> > Innerloop-Iatom: 88  82  94 134     74%
> > Spread Q Bspline: 99  99 100  99     99%
> > Gather F Bspline: 99  99 100  99     99%
> >          3D-FFT:100 100 100 100    100%
> >       Solve PME:100 100 100 100    100%
> >        NS-Pairs: 98  95 108  97     92%
> >    Reset In Box: 99  99 100  99     99%
> >         Shift-X:100 100 100  99     99%
> >          CG-CoM: 95 101 101 101     98%
> >      Sum Forces:100 100 100  99     99%
> >           Bonds:400   0   0   0     25%
> >          Angles:400   0   0   0     25%
> >         Propers:400   0   0   0     25%
> >    RB-Dihedrals:400   0   0   0     25%
> >    Dist. Restr.:400   0   0   0     25%
> >          Virial: 99  99 100  99     99%
> >          Update: 99  99 100  99     99%
> >         Stop-CM: 99  99 100  99     99%
> >      P-Coupling: 99  99 100  99     99%
> >       Calc-Ekin: 99  99 100  99     99%
> >           Lincs:400   0   0   0     25%
> >       Lincs-Mat:400   0   0   0     25%
> >         Shake-V: 99  99 100  99     99%
> >       Shake-Vir: 99  99 100  99     99%
> >          Settle: 91 102 102 102     97%
> >          Dummy2:400   0   0   0     25%
> >
> >     Total Force:103  94 106  94     93%
> >
> >
> >     Total Shake: 95 101 101 101     98%
> >
> > %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> > Compared with 2 cpus (1 nodes) :
> >
> >
> >                (Mnbf/s)   (GFlops) (ps/NODE hour) (NODE hour/ns)
> > Performance:     23.269      1.638     26.726     37.417
> >
> > Detailed load balancing info in percentage of average
> > Type        NODE:  0   1 Scaling
> > -------------------------------
> >              LJ:198   1     50%
> >           LJ(S):  0 200     50%
> >    LJ + Coulomb:200   0     50%
> > LJ + Coulomb(T):193   6     51%
> > LJ + Coulomb(T)(S): 95 104     95%
> > Innerloop-Iatom: 85 114     87%
> > Spread Q Bspline:100  99     99%
> > Gather F Bspline:100  99     99%
> >          3D-FFT:100 100    100%
> >       Solve PME:100 100    100%
> >        NS-Pairs: 97 102     97%
> >    Reset In Box:100  99     99%
> >         Shift-X:100  99     99%
> >          CG-CoM: 98 101     98%
> >      Sum Forces:100  99     99%
> >           Bonds:200   0     50%
> >          Angles:200   0     50%
> >         Propers:200   0     50%
> >    RB-Dihedrals:200   0     50%
> >    Dist. Restr.:200   0     50%
> >          Virial:100  99     99%
> >          Update:100  99     99%
> >         Stop-CM:100  99     99%
> >      P-Coupling:100  99     99%
> >       Calc-Ekin:100  99     99%
> >           Lincs:200   0     50%
> >       Lincs-Mat:200   0     50%
> >         Shake-V:100  99     99%
> >       Shake-Vir:100  99     99%
> >          Settle: 97 102     97%
> >          Dummy2:200   0     50%
> >
> >     Total Force: 99 100     99%
> >
> >
> >     Total Shake: 98 101     98%
> >
> >
> > Total Scaling: 99% of max performance
> >
> > $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
> >
> > Here I found that the calculations of LJ parts which consuming lots of
> > cpus are not paralleled at all. Maybe this is the reasons why scaling
> > factor  is not increaded greatly from 2 cpus to 4 cpus.
> > DO you agree with me ?
> > How to improve ?
> > I use gromacs 3.1.4.
> 
> PME scaling is poor, but it depends on your system size too.
> 

on the other hand, with myrinet scaling should be considerably better.
Are you sure that you're actually making full use of the hardware?
(ie is the scaling at least better (for that system) than for fast ethernet?)

Bert

____________________________________________________________________________
Dr. Bert de Groot

Max Planck Institute for Biophysical Chemistry
Theoretical molecular biophysics group
Am Fassberg 11 
37077 Goettingen, Germany

tel: +49-551-2011306, fax: +49-551-2011089

email: bgroot at gwdg.de
http://www.mpibpc.gwdg.de/abteilungen/071/bgroot
____________________________________________________________________________