[gmx-users] Parallel Use of gromacs - parallel run is 20 times SLOWER than single node run

Jim Kress acstoc at kressworks.com
Wed Jul 18 00:33:07 CEST 2007


I ran a parallel (mpi) compiled version of gromacs using the following
command line:

$ mpirun -np 5 mdrun_mpi -s topol.tpr -np 5 -v

At the end of the file md0.log I found:

	M E G A - F L O P S   A C C O U N T I N G

	Parallel run - timing based on wallclock.
   RF=Reaction-Field  FE=Free Energy  SCFE=Soft-Core/Free Energy
   T=Tabulated        W3=SPC/TIP3p    W4=TIP4p (single or pairs)
   NF=No Forces

 Computing:                        M-Number         M-Flops  % of Flops
-----------------------------------------------------------------------
 Coulomb + LJ [W4-W4]            876.631638   234060.647346    88.0
 Outer nonbonded loop            692.459088     6924.590880     2.6
 NS-Pairs                        457.344228     9604.228788     3.6
 Reset In Box                     13.782888      124.045992     0.0
 Shift-X                         137.773776      826.642656     0.3
 CG-CoM                            3.445722       99.925938     0.0
 Sum Forces                      206.660664      206.660664     0.1
 Virial                           70.237023     1264.266414     0.5
 Update                           68.886888     2135.493528     0.8
 Stop-CM                          68.880000      688.800000     0.3
 P-Coupling                       68.886888      413.321328     0.2
 Calc-Ekin                        68.893776     1860.131952     0.7
 Constraint-V                     68.886888      413.321328     0.2
 Constraint-Vir                   51.675498     1240.211952     0.5
 Settle                           17.225166     5563.728618     2.1
 Virtual Site 3                   17.221722      637.203714     0.2
-----------------------------------------------------------------------
 Total                                        266063.221098   100.0
-----------------------------------------------------------------------

               NODE (s)   Real (s)      (%)
       Time:   3344.000   3344.000    100.0
                       55:44
               (Mnbf/s)   (MFlops)   (ns/day)  (hour/ns)
Performance:      0.262     79.564      0.517     46.444

Detailed load balancing info in percentage of average
Type        NODE:  0   1   2   3   4 Scaling
-------------------------------------------
Coulomb + LJ [W4-W4]:118  94 101 104  80     84%
Outer nonbonded loop: 97  98  98 103 102     96%
       NS-Pairs:116  94 101 104  82     85%
   Reset In Box: 99 100  99 100  99     99%
        Shift-X: 99 100  99 100  99     99%
         CG-CoM: 99 100  99 100  99     99%
     Sum Forces: 99 100  99  99  99     99%
         Virial: 99 100  99 100  99     99%
         Update: 99 100  99 100  99     99%
        Stop-CM: 99 100  99 100  99     99%
     P-Coupling: 99 100  99 100  99     99%
      Calc-Ekin: 99 100  99 100  99     99%
   Constraint-V: 99 100  99 100  99     99%
 Constraint-Vir: 99 100  99 100  99     99%
         Settle: 99 100  99 100  99     99%
 Virtual Site 3: 99 100  99 100  99     99%

    Total Force:118  94 101 104  81     84%


    Total Shake: 99 100  99 100  99     99%


Total Scaling: 85% of max performance

Finished mdrun on node 0 Sat Jul 14 23:32:32 2007


Now, I tried the same calculation on one node and found the following at the
end of the file md.log:

	M E G A - F L O P S   A C C O U N T I N G

   RF=Reaction-Field  FE=Free Energy  SCFE=Soft-Core/Free Energy
   T=Tabulated        W3=SPC/TIP3p    W4=TIP4p (single or pairs)
   NF=No Forces

 Computing:                        M-Number         M-Flops  % of Flops
-----------------------------------------------------------------------
 Coulomb + LJ [W4-W4]            875.182588   233673.750996    88.0
 Outer nonbonded loop            688.853376     6888.533760     2.6
 NS-Pairs                        456.997574     9596.949054     3.6
 Reset In Box                     13.782888      124.045992     0.0
 Shift-X                         137.773776      826.642656     0.3
 CG-CoM                            3.445722       99.925938     0.0
 Virial                           69.156915     1244.824470     0.5
 Update                           68.886888     2135.493528     0.8
 Stop-CM                          68.880000      688.800000     0.3
 P-Coupling                       68.886888      413.321328     0.2
 Calc-Ekin                        68.893776     1860.131952     0.7
 Constraint-V                     68.886888      413.321328     0.2
 Constraint-Vir                   51.675498     1240.211952     0.5
 Settle                           17.225166     5563.728618     2.1
 Virtual Site 3                   17.221722      637.203714     0.2
-----------------------------------------------------------------------
 Total                                        265406.885286   100.0
-----------------------------------------------------------------------

               NODE (s)   Real (s)      (%)
       Time:    165.870    167.000     99.3
                       2:45
               (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
Performance:      5.276      1.600     10.418      2.304
Finished mdrun on node 0 Thu Jul 12 15:17:49 2007


While I didn't expect to find pure linear scaling with gromacs.  However, I
didn't expect to find a massive INCREASE in computational effort across my 5
node, gigabit ethernet cluster.

Anybody understand why this happened?

Thanks.

Jim Kress
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20070717/4057946b/attachment.html>


More information about the gromacs.org_gmx-users mailing list