[gmx-users] Problem: parallel inefficient scaling with Mac Intel Core Duo

Yiannis ioannis.nicolis at free.fr
Thu May 25 22:35:51 CEST 2006


Hello,
Has anybody tried to use gromacs in parallel with the Intel Core Duo  
macintoshes?
I installed gromacs and lam on some iMac with intel core duo  
2x1.83GHz running the latest OSX with all updates.
I have everything working OK but the scaling is very inefficient.  
Here are the results of the dppc benchmark:

1 process:	1 CPU on 1 Mac			216 ps/day
2 process:	2 CPU on 1 Mac			437 ps/day
2 process:	1 CPU on each of 2 Mac		275 ps/day
4 process:	2 CPU on each of 2 Mac		499 ps/day
6 process:	2 CPU on each of 3 Mac		471 ps/day

Clearly, I have a great gain when I use both processors on the same  
computer but a very small gain when I use different computers. This  
is a 100 Mbs ethernet (I tested a transfer rate of 88Mbs with scp) so  
it's not a network problem.

All runs with:

grompp -shuffle -sort -np #
mpirun -np # mdrun_mpi

I compiled fftw3 with
./configure --enable-float --with-gcc-arch=prescott

and lam 7.1.2 with
./configure --with-trillium --with-fc=gfortran --with-rpi=usysv -- 
with-tcp-short=524288 --with-rsh=ssh --with-shm-short=524288

and gromacs is 3.3.1 compiled just with
./configure --enable-mpi --program-suffix=_mpi

Is there something I can do to improve scaling?
Have you any scaling experience with such hardware to compare with my  
benchmarks?
Should I try another program than lam? Openmpi? Any experience with  
xgrid? Is gromacs parallelisation working with xgrid?

What is the meaning of the messages in the md#.log such as :

"Load imbalance reduced performance to 600% of max" (that's for the 6  
jobs run)

and what is the meaning of the following table on md0.log).
It reports "Total Scaling: 99% of max performance" but in reality its  
speed is 471ps/day for  6 processors compared to 216 for one, that  
gives 471/216/6=36%. (in other words on 1 processor it takes 4006 s  
and in 6 it takes 1833 s).

That's for the 6 jobs run:

Detailed load balancing info in percentage of average
Type        NODE:  0   1   2   3   4   5 Scaling
-----------------------------------------------
              LJ:100  99 100  99  97 101     98%
         Coulomb:103  87  95 101 103 107     93%
    Coulomb [W3]:141 126  97  89  71  73     70%
Coulomb [W3-W3]: 92 105  99 100 105  96     94%
    Coulomb + LJ:105  91  96 100 101 105     94%
Coulomb + LJ [W3]:168 134 100  83  57  55     59%
Coulomb + LJ [W3-W3]: 94 103  99  99 103  98     96%
Outer nonbonded loop:111  94  92  93  91 116     85%
1,4 nonbonded interactions:100 100 100 100  99  99     99%
        NS-Pairs:100 100  99  99  99 100     99%
    Reset In Box:100 100 100 100  99  99     99%
         Shift-X:100 100 100 100  99  99     99%
          CG-CoM:100 100 100 100  99  99     99%
      Sum Forces:100 100  99  99  99 100     99%
          Angles:100 100 100 100  99  99     99%
         Propers:100 100 100 100  99  99     99%
       Impropers:100 100 100 100  99  99     99%
    RB-Dihedrals:100 100 100 100  99  99     99%
          Virial:100 100 100 100  99  99     99%
          Update:100 100 100 100  99  99     99%
         Stop-CM:100 100 100 100  99  99     99%
       Calc-Ekin:100 100 100 100  99  99     99%
           Lincs:100 100 100 100  99  99     99%
       Lincs-Mat:100 100 100 100  99  99     99%
    Constraint-V:100 100 100 100  99  99     99%
Constraint-Vir:100 100 100 100  99  99     99%
          Settle: 99  99  99  99 100 100     99%

     Total Force:100 100  98  99 100  99     99%


     Total Shake: 99  99  99  99 100 100     99%


Total Scaling: 99% of max performance

Thanks for any help,

Ioannis




More information about the gromacs.org_gmx-users mailing list