[gmx-users] Re: Is there a way to omit particles with, q=0, from, Coulomb-/PME-calculations?

Tue Jan 17 11:47:24 CET 2012

Thanks Carsten. Now i see the problem.

>
> Hi Thomas,
>
> Am Jan 17, 2012 um 10:29 AM schrieb Thomas Schlesier:
>
>> But would there be a way to optimize it further?
>> In my real simulation i would have a charged solute and the uncharged solvent (both have nearly the same number of particles). If i could omit the uncharged solvent from the long-ranged coulomb-calculation (PME) it would save much time.
>> Or is there a reason that some of the PME stuff is also calculated for uncharged particles?
>
> For PME you need the Fourier-transformed charge grid and you get back the potential
> grid from which you interpolate the forces on the charged atoms. The charges are spread
> each on typically 4x4x4 (=PME order) grid points, and in this spreading only
> charged atoms will take part. So the spreading part (and also the force interpolation part)
> will become faster with less charges. However, the rest of PME (the Fourier transforms
> and calculations in reciprocal space) are unaffected by the number of charges. For
> this only the size of the whole PME grid matters. You could try to lower the number of
> PME grid points (enlarge fourierspacing) and at the same time enhance the PME order
> (to 6 for example) to keep a comparable force accuracy. You could also try to shift
> more load to real space, which will also lower the number of PME grid points (g_tune_pme
> can do that for you). But I am not shure that you can get large performance benefits
> from that.
>
> Best,
>     Carsten
>
>
>> (Ok, i know that this is a rather specical system, in so far that in most md-simulations the number of uncharged particles is negligible.)
>> Would it be probably better to move the question to the developer-list?
>>
>> Greetings
>> Thomas
>>
>>
>>> On 17/01/2012 7:32 PM, Thomas Schlesier wrote:
>>>> On 17/01/2012 4:55 AM, Thomas Schlesier wrote:
>>>>>> Dear all,
>>>>>> Is there a way to omit particles with zero charge from calculations
>>>>>> for Coulomb-interactions or PME?
>>>>>> In my calculations i want to coarse-grain my solvent, but the solute
>>>>>> should be still represented by atoms. In doing so the
>>>>>> solvent-molecules have a zero charge. I noticed that for a simulation
>>>>>> with only the CG-solvent significant time was spent for the PME-part
>>>>>> of the simulation.
>>>>>> If i would simulate the complete system (atomic solute +
>>>>>> coarse-grained solvent), i would save only time for the reduced
>>>> number
>>>>>> of particles (compared to atomistic solvent). But if i could omit the
>>>>>> zero-charge solvent from the Coulomb-/PME-part, it would save much
>>>>>> additional time.
>>>>>>
>>>>>> Is there an easy way for the omission, or would one have to hack the
>>>>>> code? If the latter is true, how hard would it be and where do i have
>>>>>> to look?
>>>>>> (First idea would be to create an index-file group with all
>>>>>> non-zero-charged particles and then run in the loops needed for
>>>>>> Coulomb/PME only over this subset of particles.)
>>>>>> I have only experience with Fortran and not with C++.
>>>>>>
>>>>>> Only other solution which comes to my mind would be to use plain
>>>>>> cut-offs for the Coulomb-part. This would save time required for
>>>> doing
>>>>>> PME but will in turn cost time for the calculations of zeros
>>>>>> (Coulomb-interaction for the CG-solvent). But more importantly would
>>>>>> introduce artifacts from the plain cut-off :(
>>>>
>>>>> Particles with zero charge are not included in neighbour lists used
>>>>> for calculating Coulomb interactions. The statistics in the "M E G A
>>>> ->F L O P S   A C C O U N T I N G" section of the .log file will show
>>>>> that there is significant use of loops that do not have "Coul"
>>>>> component. So already these have no effect on half of the PME
>>>>> calculation. I don't know whether the grid part is similarly
>>>>> optimized, but you can test this yourself by comparing timing of runs
>>>>> with and without charged solvent.
>>>>>
>>>>> Mark
>>>>
>>>> Ok, i will test this.
>>>> But here is the data i obtained for two simulations, one with plain
>>>> cut-off and the other with PME. As one sees the simulation with plain
>>>> cut-offs is much faster (by a factor of 6).
>>>
>>> Yes. I think I have seen this before for PME when (some grid cells) are
>>> lacking (many) charged particles.
>>>
>>> You will see that the nonbonded loops are always "VdW(T)" for tabulated
>>> VdW - you have no charges at all in this system and GROMACS has already
>>> optimized its choice of nonbonded loops accordingly. You would see
>>> "Coul(T) + VdW(T)" if your solvent had charge.
>>>
>>> It's not a meaningful test of the performance of PME vs cut-off, either,
>>> because there are no charges.
>>>
>>> Mark
>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------------
>>>>
>>>> With PME:
>>>>
>>>>          M E G A - F L O P S   A C C O U N T I N G
>>>>
>>>>     RF=Reaction-Field  FE=Free Energy  SCFE=Soft-Core/Free Energy
>>>>     T=Tabulated        W3=SPC/TIP3p    W4=TIP4p (single or pairs)
>>>>     NF=No Forces
>>>>
>>>>   Computing:                         M-Number         M-Flops  % Flops
>>>> -----------------------------------------------------------------------
>>>>   VdW(T)                          1132.029152       61129.574     0.1
>>>>   Outer nonbonded loop            1020.997718       10209.977     0.0
>>>>   Calc Weights                   16725.001338      602100.048     0.6
>>>>   Spread Q Bspline              356800.028544      713600.057     0.7
>>>>   Gather F Bspline              356800.028544     4281600.343     4.4
>>>>   3D-FFT                       9936400.794912    79491206.359    81.6
>>>>   Solve PME                     180000.014400    11520000.922    11.8
>>>>   NS-Pairs                        2210.718786       46425.095     0.0
>>>>   Reset In Box                    1115.000000        3345.000     0.0
>>>>   CG-CoM                          1115.000446        3345.001     0.0
>>>>   Virial                          7825.000626      140850.011     0.1
>>>>   Ext.ens. Update                 5575.000446      301050.024     0.3
>>>>   Stop-CM                         5575.000446       55750.004     0.1
>>>>   Calc-Ekin                       5575.000892      150525.024     0.2
>>>> -----------------------------------------------------------------------
>>>>   Total                                          97381137.440   100.0
>>>> -----------------------------------------------------------------------
>>>>      D O M A I N   D E C O M P O S I T I O N   S T A T I S T I C S
>>>>
>>>>   av. #atoms communicated per step for force:  2 x 94.1
>>>>
>>>>   Average load imbalance: 10.7 %
>>>>   Part of the total run time spent waiting due to load imbalance: 0.1 %
>>>>
>>>>
>>>>       R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
>>>>
>>>>   Computing:         Nodes     Number     G-Cycles    Seconds     %
>>>> -----------------------------------------------------------------------
>>>>   Domain decomp.         4    2500000      903.835      308.1     1.8
>>>>   Comm. coord.           4   12500001      321.930      109.7     0.6
>>>>   Neighbor search        4    2500001     1955.330      666.5     3.8
>>>>   Force                  4   12500001      696.668      237.5     1.4
>>>>   Wait + Comm. F         4   12500001      384.107      130.9     0.7
>>>>   PME mesh               4   12500001    43854.818    14948.2    85.3
>>>>   Write traj.            4       5001        1.489        0.5     0.0
>>>>   Update                 4   12500001     1137.630      387.8     2.2
>>>>   Comm. energies         4   12500001     1074.541      366.3     2.1
>>>>   Rest                   4                1093.194      372.6     2.1
>>>> -----------------------------------------------------------------------
>>>>   Total                  4               51423.541    17528.0   100.0
>>>> -----------------------------------------------------------------------
>>>>
>>>>          Parallel run - timing based on wallclock.
>>>>
>>>>                 NODE (s)   Real (s)      (%)
>>>>         Time:   4382.000   4382.000    100.0
>>>>                         1h13:02
>>>>                 (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
>>>> Performance:      0.258     22.223    492.926      0.049
>>>>
>>>> -----------------------------------------------------------------------------------
>>>>
>>>> -----------------------------------------------------------------------------------
>>>>
>>>>
>>>> With plain cut-offs
>>>>
>>>>          M E G A - F L O P S   A C C O U N T I N G
>>>>
>>>>     RF=Reaction-Field  FE=Free Energy  SCFE=Soft-Core/Free Energy
>>>>     T=Tabulated        W3=SPC/TIP3p    W4=TIP4p (single or pairs)
>>>>     NF=No Forces
>>>>
>>>>   Computing:                         M-Number         M-Flops  % Flops
>>>> -----------------------------------------------------------------------
>>>>   VdW(T)                          1137.009596       61398.518     7.9
>>>>   Outer nonbonded loop            1020.973338       10209.733     1.3
>>>>   NS-Pairs                        2213.689975       46487.489     6.0
>>>>   Reset In Box                    1115.000000        3345.000     0.4
>>>>   CG-CoM                          1115.000446        3345.001     0.4
>>>>   Virial                          7825.000626      140850.011    18.2
>>>>   Ext.ens. Update                 5575.000446      301050.024    38.9
>>>>   Stop-CM                         5575.000446       55750.004     7.2
>>>>   Calc-Ekin                       5575.000892      150525.024    19.5
>>>> -----------------------------------------------------------------------
>>>>   Total                                            772960.806   100.0
>>>> -----------------------------------------------------------------------
>>>>      D O M A I N   D E C O M P O S I T I O N   S T A T I S T I C S
>>>>
>>>>   av. #atoms communicated per step for force:  2 x 93.9
>>>>
>>>>   Average load imbalance: 16.0 %
>>>>   Part of the total run time spent waiting due to load imbalance: 0.9 %
>>>>
>>>>
>>>>       R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
>>>>
>>>>   Computing:         Nodes     Number     G-Cycles    Seconds     %
>>>> -----------------------------------------------------------------------
>>>>   Domain decomp.         4    2500000      856.561      291.8    12.7
>>>>   Comm. coord.           4   12500001      267.036       91.0     3.9
>>>>   Neighbor search        4    2500001     2077.236      707.6    30.7
>>>>   Force                  4   12500001      377.606      128.6     5.6
>>>>   Wait + Comm. F         4   12500001      347.270      118.3     5.1
>>>>   Write traj.            4       5001        1.166        0.4     0.0
>>>>   Update                 4   12500001     1109.008      377.8    16.4
>>>>   Comm. energies         4   12500001      841.530      286.7    12.4
>>>>   Rest                   4                 886.195      301.9    13.1
>>>> -----------------------------------------------------------------------
>>>>   Total                  4                6763.608     2304.0   100.0
>>>> -----------------------------------------------------------------------
>>>>
>>>> NOTE: 12 % of the run time was spent communicating energies,
>>>>        you might want to use the -nosum option of mdrun
>>>>
>>>>
>>>>          Parallel run - timing based on wallclock.
>>>>
>>>>                 NODE (s)   Real (s)      (%)
>>>>         Time:    576.000    576.000    100.0
>>>>                         9:36
>>>>                 (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
>>>> Performance:      1.974      1.342   3750.001      0.006
>> --
>> gmx-users mailing list    gmx-users at gromacs.org
>> http://lists.gromacs.org/mailman/listinfo/gmx-users
>> Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
>> Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-request at gromacs.org.
>> Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
>
>
> ------------------------------
>