[gmx-users] Is there a way to omit particles with, q=0, from Coulomb-/PME-calculations?

Tue Jan 17 10:59:03 CET 2012

Hi Thomas,

Am Jan 17, 2012 um 10:29 AM schrieb Thomas Schlesier:

> But would there be a way to optimize it further?
> In my real simulation i would have a charged solute and the uncharged solvent (both have nearly the same number of particles). If i could omit the uncharged solvent from the long-ranged coulomb-calculation (PME) it would save much time.
> Or is there a reason that some of the PME stuff is also calculated for uncharged particles?

For PME you need the Fourier-transformed charge grid and you get back the potential
grid from which you interpolate the forces on the charged atoms. The charges are spread
each on typically 4x4x4 (=PME order) grid points, and in this spreading only
charged atoms will take part. So the spreading part (and also the force interpolation part)
will become faster with less charges. However, the rest of PME (the Fourier transforms
and calculations in reciprocal space) are unaffected by the number of charges. For
this only the size of the whole PME grid matters. You could try to lower the number of
PME grid points (enlarge fourierspacing) and at the same time enhance the PME order 
(to 6 for example) to keep a comparable force accuracy. You could also try to shift
more load to real space, which will also lower the number of PME grid points (g_tune_pme
can do that for you). But I am not shure that you can get large performance benefits
from that.

Best,
   Carsten

> (Ok, i know that this is a rather specical system, in so far that in most md-simulations the number of uncharged particles is negligible.)
> Would it be probably better to move the question to the developer-list?
> 
> Greetings
> Thomas
> 
> 
>> On 17/01/2012 7:32 PM, Thomas Schlesier wrote:
>>> On 17/01/2012 4:55 AM, Thomas Schlesier wrote:
>>>>> Dear all,
>>>>> Is there a way to omit particles with zero charge from calculations
>>>>> for Coulomb-interactions or PME?
>>>>> In my calculations i want to coarse-grain my solvent, but the solute
>>>>> should be still represented by atoms. In doing so the
>>>>> solvent-molecules have a zero charge. I noticed that for a simulation
>>>>> with only the CG-solvent significant time was spent for the PME-part
>>>>> of the simulation.
>>>>> If i would simulate the complete system (atomic solute +
>>>>> coarse-grained solvent), i would save only time for the reduced
>>> number
>>>>> of particles (compared to atomistic solvent). But if i could omit the
>>>>> zero-charge solvent from the Coulomb-/PME-part, it would save much
>>>>> additional time.
>>>>> 
>>>>> Is there an easy way for the omission, or would one have to hack the
>>>>> code? If the latter is true, how hard would it be and where do i have
>>>>> to look?
>>>>> (First idea would be to create an index-file group with all
>>>>> non-zero-charged particles and then run in the loops needed for
>>>>> Coulomb/PME only over this subset of particles.)
>>>>> I have only experience with Fortran and not with C++.
>>>>> 
>>>>> Only other solution which comes to my mind would be to use plain
>>>>> cut-offs for the Coulomb-part. This would save time required for
>>> doing
>>>>> PME but will in turn cost time for the calculations of zeros
>>>>> (Coulomb-interaction for the CG-solvent). But more importantly would
>>>>> introduce artifacts from the plain cut-off :(
>>> 
>>>> Particles with zero charge are not included in neighbour lists used
>>>> for calculating Coulomb interactions. The statistics in the "M E G A
>>> ->F L O P S   A C C O U N T I N G" section of the .log file will show
>>>> that there is significant use of loops that do not have "Coul"
>>>> component. So already these have no effect on half of the PME
>>>> calculation. I don't know whether the grid part is similarly
>>>> optimized, but you can test this yourself by comparing timing of runs
>>>> with and without charged solvent.
>>>> 
>>>> Mark
>>> 
>>> Ok, i will test this.
>>> But here is the data i obtained for two simulations, one with plain
>>> cut-off and the other with PME. As one sees the simulation with plain
>>> cut-offs is much faster (by a factor of 6).
>> 
>> Yes. I think I have seen this before for PME when (some grid cells) are
>> lacking (many) charged particles.
>> 
>> You will see that the nonbonded loops are always "VdW(T)" for tabulated
>> VdW - you have no charges at all in this system and GROMACS has already
>> optimized its choice of nonbonded loops accordingly. You would see
>> "Coul(T) + VdW(T)" if your solvent had charge.
>> 
>> It's not a meaningful test of the performance of PME vs cut-off, either,
>> because there are no charges.
>> 
>> Mark
>> 
>>> 
>>> 
>>> ---------------------------------------------------------------------------
>>> 
>>> With PME:
>>> 
>>>         M E G A - F L O P S   A C C O U N T I N G
>>> 
>>>    RF=Reaction-Field  FE=Free Energy  SCFE=Soft-Core/Free Energy
>>>    T=Tabulated        W3=SPC/TIP3p    W4=TIP4p (single or pairs)
>>>    NF=No Forces
>>> 
>>>  Computing:                         M-Number         M-Flops  % Flops
>>> -----------------------------------------------------------------------
>>>  VdW(T)                          1132.029152       61129.574     0.1
>>>  Outer nonbonded loop            1020.997718       10209.977     0.0
>>>  Calc Weights                   16725.001338      602100.048     0.6
>>>  Spread Q Bspline              356800.028544      713600.057     0.7
>>>  Gather F Bspline              356800.028544     4281600.343     4.4
>>>  3D-FFT                       9936400.794912    79491206.359    81.6
>>>  Solve PME                     180000.014400    11520000.922    11.8
>>>  NS-Pairs                        2210.718786       46425.095     0.0
>>>  Reset In Box                    1115.000000        3345.000     0.0
>>>  CG-CoM                          1115.000446        3345.001     0.0
>>>  Virial                          7825.000626      140850.011     0.1
>>>  Ext.ens. Update                 5575.000446      301050.024     0.3
>>>  Stop-CM                         5575.000446       55750.004     0.1
>>>  Calc-Ekin                       5575.000892      150525.024     0.2
>>> -----------------------------------------------------------------------
>>>  Total                                          97381137.440   100.0
>>> -----------------------------------------------------------------------
>>>     D O M A I N   D E C O M P O S I T I O N   S T A T I S T I C S
>>> 
>>>  av. #atoms communicated per step for force:  2 x 94.1
>>> 
>>>  Average load imbalance: 10.7 %
>>>  Part of the total run time spent waiting due to load imbalance: 0.1 %
>>> 
>>> 
>>>      R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
>>> 
>>>  Computing:         Nodes     Number     G-Cycles    Seconds     %
>>> -----------------------------------------------------------------------
>>>  Domain decomp.         4    2500000      903.835      308.1     1.8
>>>  Comm. coord.           4   12500001      321.930      109.7     0.6
>>>  Neighbor search        4    2500001     1955.330      666.5     3.8
>>>  Force                  4   12500001      696.668      237.5     1.4
>>>  Wait + Comm. F         4   12500001      384.107      130.9     0.7
>>>  PME mesh               4   12500001    43854.818    14948.2    85.3
>>>  Write traj.            4       5001        1.489        0.5     0.0
>>>  Update                 4   12500001     1137.630      387.8     2.2
>>>  Comm. energies         4   12500001     1074.541      366.3     2.1
>>>  Rest                   4                1093.194      372.6     2.1
>>> -----------------------------------------------------------------------
>>>  Total                  4               51423.541    17528.0   100.0
>>> -----------------------------------------------------------------------
>>> 
>>>         Parallel run - timing based on wallclock.
>>> 
>>>                NODE (s)   Real (s)      (%)
>>>        Time:   4382.000   4382.000    100.0
>>>                        1h13:02
>>>                (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
>>> Performance:      0.258     22.223    492.926      0.049
>>> 
>>> -----------------------------------------------------------------------------------
>>> 
>>> -----------------------------------------------------------------------------------
>>> 
>>> 
>>> With plain cut-offs
>>> 
>>>         M E G A - F L O P S   A C C O U N T I N G
>>> 
>>>    RF=Reaction-Field  FE=Free Energy  SCFE=Soft-Core/Free Energy
>>>    T=Tabulated        W3=SPC/TIP3p    W4=TIP4p (single or pairs)
>>>    NF=No Forces
>>> 
>>>  Computing:                         M-Number         M-Flops  % Flops
>>> -----------------------------------------------------------------------
>>>  VdW(T)                          1137.009596       61398.518     7.9
>>>  Outer nonbonded loop            1020.973338       10209.733     1.3
>>>  NS-Pairs                        2213.689975       46487.489     6.0
>>>  Reset In Box                    1115.000000        3345.000     0.4
>>>  CG-CoM                          1115.000446        3345.001     0.4
>>>  Virial                          7825.000626      140850.011    18.2
>>>  Ext.ens. Update                 5575.000446      301050.024    38.9
>>>  Stop-CM                         5575.000446       55750.004     7.2
>>>  Calc-Ekin                       5575.000892      150525.024    19.5
>>> -----------------------------------------------------------------------
>>>  Total                                            772960.806   100.0
>>> -----------------------------------------------------------------------
>>>     D O M A I N   D E C O M P O S I T I O N   S T A T I S T I C S
>>> 
>>>  av. #atoms communicated per step for force:  2 x 93.9
>>> 
>>>  Average load imbalance: 16.0 %
>>>  Part of the total run time spent waiting due to load imbalance: 0.9 %
>>> 
>>> 
>>>      R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
>>> 
>>>  Computing:         Nodes     Number     G-Cycles    Seconds     %
>>> -----------------------------------------------------------------------
>>>  Domain decomp.         4    2500000      856.561      291.8    12.7
>>>  Comm. coord.           4   12500001      267.036       91.0     3.9
>>>  Neighbor search        4    2500001     2077.236      707.6    30.7
>>>  Force                  4   12500001      377.606      128.6     5.6
>>>  Wait + Comm. F         4   12500001      347.270      118.3     5.1
>>>  Write traj.            4       5001        1.166        0.4     0.0
>>>  Update                 4   12500001     1109.008      377.8    16.4
>>>  Comm. energies         4   12500001      841.530      286.7    12.4
>>>  Rest                   4                 886.195      301.9    13.1
>>> -----------------------------------------------------------------------
>>>  Total                  4                6763.608     2304.0   100.0
>>> -----------------------------------------------------------------------
>>> 
>>> NOTE: 12 % of the run time was spent communicating energies,
>>>       you might want to use the -nosum option of mdrun
>>> 
>>> 
>>>         Parallel run - timing based on wallclock.
>>> 
>>>                NODE (s)   Real (s)      (%)
>>>        Time:    576.000    576.000    100.0
>>>                        9:36
>>>                (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
>>> Performance:      1.974      1.342   3750.001      0.006
> -- 
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/Support/Mailing_Lists