[gmx-users] paralel scaleing
Bert de Groot
bgroot at gwdg.de
Thu Aug 7 17:59:01 CEST 2003
David van der Spoel wrote:
>
> On Thu, 2003-08-07 at 17:41, Yuguang Mu wrote:
> > I try gromacs on 4 cpus (2 nodes Pentium linux with Myrinet link),
> > unfortunately, it runs nearly the same speed as in 2 cpus (1 node)
> > I check the output , here is the output in log file:
> >
> >
> > (Mnbf/s) (GFlops) (ps/NODE hour) (NODE hour/ns)
> > Performance: 26.848 1.938 30.782 32.486
> > Detailed load balancing info in percentage of average
> >
> > Type NODE: 0 1 2 3 Scaling
> > ---------------------------------------
> > LJ:396 0 0 3 25%
> > LJ(S): 0 0 386 13 25%
> > LJ + Coulomb:400 0 0 0 25%
> > LJ + Coulomb(T):386 0 0 13 25%
> > LJ + Coulomb(T)(S): 94 97 110 97 90%
> > Innerloop-Iatom: 88 82 94 134 74%
> > Spread Q Bspline: 99 99 100 99 99%
> > Gather F Bspline: 99 99 100 99 99%
> > 3D-FFT:100 100 100 100 100%
> > Solve PME:100 100 100 100 100%
> > NS-Pairs: 98 95 108 97 92%
> > Reset In Box: 99 99 100 99 99%
> > Shift-X:100 100 100 99 99%
> > CG-CoM: 95 101 101 101 98%
> > Sum Forces:100 100 100 99 99%
> > Bonds:400 0 0 0 25%
> > Angles:400 0 0 0 25%
> > Propers:400 0 0 0 25%
> > RB-Dihedrals:400 0 0 0 25%
> > Dist. Restr.:400 0 0 0 25%
> > Virial: 99 99 100 99 99%
> > Update: 99 99 100 99 99%
> > Stop-CM: 99 99 100 99 99%
> > P-Coupling: 99 99 100 99 99%
> > Calc-Ekin: 99 99 100 99 99%
> > Lincs:400 0 0 0 25%
> > Lincs-Mat:400 0 0 0 25%
> > Shake-V: 99 99 100 99 99%
> > Shake-Vir: 99 99 100 99 99%
> > Settle: 91 102 102 102 97%
> > Dummy2:400 0 0 0 25%
> >
> > Total Force:103 94 106 94 93%
> >
> >
> > Total Shake: 95 101 101 101 98%
> >
> > %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> > Compared with 2 cpus (1 nodes) :
> >
> >
> > (Mnbf/s) (GFlops) (ps/NODE hour) (NODE hour/ns)
> > Performance: 23.269 1.638 26.726 37.417
> >
> > Detailed load balancing info in percentage of average
> > Type NODE: 0 1 Scaling
> > -------------------------------
> > LJ:198 1 50%
> > LJ(S): 0 200 50%
> > LJ + Coulomb:200 0 50%
> > LJ + Coulomb(T):193 6 51%
> > LJ + Coulomb(T)(S): 95 104 95%
> > Innerloop-Iatom: 85 114 87%
> > Spread Q Bspline:100 99 99%
> > Gather F Bspline:100 99 99%
> > 3D-FFT:100 100 100%
> > Solve PME:100 100 100%
> > NS-Pairs: 97 102 97%
> > Reset In Box:100 99 99%
> > Shift-X:100 99 99%
> > CG-CoM: 98 101 98%
> > Sum Forces:100 99 99%
> > Bonds:200 0 50%
> > Angles:200 0 50%
> > Propers:200 0 50%
> > RB-Dihedrals:200 0 50%
> > Dist. Restr.:200 0 50%
> > Virial:100 99 99%
> > Update:100 99 99%
> > Stop-CM:100 99 99%
> > P-Coupling:100 99 99%
> > Calc-Ekin:100 99 99%
> > Lincs:200 0 50%
> > Lincs-Mat:200 0 50%
> > Shake-V:100 99 99%
> > Shake-Vir:100 99 99%
> > Settle: 97 102 97%
> > Dummy2:200 0 50%
> >
> > Total Force: 99 100 99%
> >
> >
> > Total Shake: 98 101 98%
> >
> >
> > Total Scaling: 99% of max performance
> >
> > $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
> >
> > Here I found that the calculations of LJ parts which consuming lots of
> > cpus are not paralleled at all. Maybe this is the reasons why scaling
> > factor is not increaded greatly from 2 cpus to 4 cpus.
> > DO you agree with me ?
> > How to improve ?
> > I use gromacs 3.1.4.
>
> PME scaling is poor, but it depends on your system size too.
>
on the other hand, with myrinet scaling should be considerably better.
Are you sure that you're actually making full use of the hardware?
(ie is the scaling at least better (for that system) than for fast ethernet?)
Bert
____________________________________________________________________________
Dr. Bert de Groot
Max Planck Institute for Biophysical Chemistry
Theoretical molecular biophysics group
Am Fassberg 11
37077 Goettingen, Germany
tel: +49-551-2011306, fax: +49-551-2011089
email: bgroot at gwdg.de
http://www.mpibpc.gwdg.de/abteilungen/071/bgroot
____________________________________________________________________________
More information about the gromacs.org_gmx-users
mailing list