[gmx-users] FEP and loss of performance

Justin A. Lemkul jalemkul at vt.edu
Mon Apr 4 17:01:39 CEST 2011



Luca Bellucci wrote:
> Hi Chris,
> thank for the suggestions,
> in the previous mail there is a mistake because   
> couple-moltype = SOL (for solvent) and not "Protein_chaim_P".
> Now the problem of the load balance seems reasonable, because
> the water box is large ~9.0 nm.

Now your outcome makes a lot more sense.  You're decoupling all of the solvent? 
  I don't see how that is going to be physically stable or terribly meaningful, 
but it explains your performance loss.  You're annihilating a significant number 
of interactions (probably the vast majority of all the nonbonded interactions in 
the system), which I would expect would cause continuous load balancing issues.

-Justin

> However the problem exist and the performance loss is very high, so I have 
> redone calculations with this command:
> 
> grompp -f 
> md.mdp -c ../Run-02/confout.gro -t ../Run-02/state.cpt -p ../topo.top -n ../index.ndx -o 
> md.tpr -maxwarn 1
> 
> mdrun -s md.tpr -o md
> 
> this is part of the md.mdp file: 
> 
> ; Run parameters
> ; define          = -DPOSRES
> integrator	= md		; 
> nsteps		= 1000 	; 
> dt		= 0.002		; 
> [..]
> free_energy    = yes     ; /no
> init_lambda    = 0.9    
> delta_lambda   = 0.0
> couple-moltype = SOL    ; solvent water
> couple-lambda0 = vdw-q
> couple-lambda1 = none
> couple-intramol= yes
> 
> Result for free energy calculation  
>  Computing:         Nodes     Number     G-Cycles    Seconds     %
> -----------------------------------------------------------------------
>  Domain decomp.       8        126       22.050        8.3     0.1
>  DD comm. load          8         15        0.009        0.0     0.0
>  DD comm. bounds     8         12        0.031        0.0     0.0
>  Comm. coord.            8       1001       17.319        6.5     0.0
>  Neighbor search        8        127      436.569      163.7     1.1
>  Force                           8       1001    34241.576    12840.9    87.8
>  Wait + Comm. F        8       1001       19.486        7.3     0.0
>  PME mesh                  8       1001     4190.758     1571.6    10.7
>  Write traj.                  8          7        1.827        0.7     0.0
>  Update                      8       1001       12.557        4.7     0.0
>  Constraints               8       1001       26.496        9.9     0.1
>  Comm. energies      8       1002       10.710        4.0     0.0
>  Rest                   8                  25.142        9.4     0.1
> -----------------------------------------------------------------------
>  Total                  8               39004.531    14627.1   100.0
> -----------------------------------------------------------------------
> -----------------------------------------------------------------------
>  PME redist. X/F          8       3003     3479.771     1304.9     8.9
>  PME spread/gather   8       4004      277.574      104.1     0.7
>  PME 3D-FFT               8       4004      378.090      141.8     1.0
>  PME solve                  8       2002       55.033       20.6     0.1
> -----------------------------------------------------------------------
> 	Parallel run - timing based on wallclock.
> 
>                NODE (s)   Real (s)      (%)
>        Time:   1828.385   1828.385    100.0
>                        30:28
>                              (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
> Performance:      3.115      3.223      0.095    253.689
> 
>  I Switched off only the free_energy keyword and I redone the calculation 
> I have:
>  Computing:         Nodes     Number     G-Cycles    Seconds     %
> -----------------------------------------------------------------------
>  Domain decomp.      8         77       10.975        4.1     0.6
>  DD comm. load         8          1        0.001        0.0     0.0
>  Comm. coord.           8       1001       14.480        5.4     0.8
>  Neighbor search       8         78      136.479       51.2     7.3
>  Force                         8       1001     1141.115      427.9    61.3
>  Wait + Comm. F      8       1001       17.845        6.7     1.0
>  PME mesh                8       1001      484.581      181.7    26.0
>  Write traj.               8          5        1.221        0.5     0.1
>  Update                   8       1001        9.976        3.7     0.5
>  Constraints            8       1001       20.275        7.6     1.1
>  Comm. energies     8        992        5.933        2.2     0.3
>  Rest                         8                  19.670        7.4     1.1
> -----------------------------------------------------------------------
>  Total                  8                1862.552      698.5   100.0
> -----------------------------------------------------------------------
> -----------------------------------------------------------------------
>  PME redist. X/F        8       2002       92.204       34.6     5.0
>  PME spread/gather      8       2002      192.337       72.1    10.3
>  PME 3D-FFT             8       2002      177.373       66.5     9.5
>  PME solve              8       1001       22.512        8.4     1.2
> -----------------------------------------------------------------------
> 	Parallel run - timing based on wallclock.
> 
>                NODE (s)   Real (s)      (%)
>        Time:     87.309     87.309    100.0
>                        1:27
>                          (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
> Performance:    439.731     23.995      1.981     12.114
> Finished mdrun on node 0 Mon Apr  4 16:52:04 2011
> 
> Luca	
> 
> 
> 
> 
>> If we accept your text at face value, then the simulation slowed down
>> by a factor of 1500%, certainly not the 16% of the load balancing.
>>
>> Please let us know what version of gromacs and cut and paste your
>> cammands that you used to run gromacs (so we can verify that you ran
>> on the same number of processors) and cut and paste a diff of the .mdp
>> files (so that we can verify that you ran for the same number of steps).
>>
>> You might be correct about the slowdown, but let's rule out some other
>> more obvious problems first.
>>
>> Chris.
>>
>> -- original message --
>>
>>
>> Dear all,
>> when I run a single free energy simulation
>> i noticed that there is a loss of performace with respect to
>> the normal MD
>>
>> free_energy    = yes
>> init_lambda    = 0.9
>> delta_lambda   = 0.0
>> couple-moltype = Protein_Chain_P
>> couple-lambda0 = vdw-q
>> couple-lambda0 = none
>> couple-intramol= yes
>>
>>     Average load imbalance: 16.3 %
>>     Part of the total run time spent waiting due to load imbalance: 12.2 %
>>     Steps where the load balancing was limited by -rdd, -rcon and/or -dds:
>> X0 % Time:   1852.712   1852.712    100.0
>>
>> free_energy    = no
>>     Average load imbalance: 2.7 %
>>     Part of the total run time spent waiting due to load imbalance: 1.7 %
>>     Time:    127.394    127.394    100.0
>>
>> It seems that the loss of performace is due in part to in the load
>> imbalance in the domain decomposition, however I tried to change
>> these keywords without benefit
>> Any comment is welcome.
>>
>> Thanks
> 
> 

-- 
========================================

Justin A. Lemkul
Ph.D. Candidate
ICTAS Doctoral Scholar
MILES-IGERT Trainee
Department of Biochemistry
Virginia Tech
Blacksburg, VA
jalemkul[at]vt.edu | (540) 231-9080
http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin

========================================



More information about the gromacs.org_gmx-users mailing list