[gmx-users] Re: mdrun -nosum still complains that 15 % of > the runtime was spent communicating energies

chris.neale at utoronto.ca chris.neale at utoronto.ca
Tue Jul 21 14:47:09 CEST 2009


Thanks mark, I'll respond inline.

> Chris Neale wrote:
>> I have now tested with and without -nosum and it appears that the option
>> is working (see 51 vs. 501 Number of communications) but that the total
>> amount of time communicating energies didn't go down by very much. Seems
>> strange to me. Anybody have any ideas if this is normal?
>
> Seems strange, but perhaps a 45-second test is not sufficiently long to
> demonstrate suitable scaling.

Agreed, although I find it a good jumping off point, especially now  
that we need to optimize -npme. If I use quick tests to narrow down  
the range of nodes/npme that is going to scale the best then I fine  
tune it with longer scaling tests.

There's no discussion in the 4.0.5 release
> notes of a relevant change to -nosum, but there has been a change:
> http://oldwww.gromacs.org/content/view/181/132/.

Thanks, I did see this but don't think that it is related to this  
issue, which I have now confirmed in both 4.0.4 and 4.0.5.

>
>> At the very least, I suggest adding an if statement to mdrun so that it
>> doesn't output the -nosum usage note if the user did in fact use -nosum
>> in that run.
>>
>>
>> Without using -nosum:
>>
>>    R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
>>
>> Computing:         Nodes     Number     G-Cycles    Seconds     %
>> -----------------------------------------------------------------------
>> ...
>> Write traj.          256          2      233.218       93.7     0.5
>> Update               256        501      777.511      312.5     1.7
>> Constraints          256       1002     1203.894      483.9     2.7
>> Comm. energies       256        501     7397.995     2973.9    16.5
>> Rest                 256                 128.058       51.5     0.3
>> -----------------------------------------------------------------------
>> Total                384               44897.468    18048.0   100.0
>> -----------------------------------------------------------------------
>>
>> NOTE: 16 % of the run time was spent communicating energies,
>>      you might want to use the -nosum option of mdrun
>>
>>
>>        Parallel run - timing based on wallclock.
>>
>>               NODE (s)   Real (s)      (%)
>>       Time:     47.000     47.000    100.0
>>               (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
>> Performance:  13485.788    712.634      1.842     13.029
>> Finished mdrun on node 0 Mon Jul 20 12:53:41 2009
>>
>> #########
>>
>> And using -nosum:
>>
>>    R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
>> Computing:         Nodes     Number     G-Cycles    Seconds     %
>> -----------------------------------------------------------------------
>> ...
>> Write traj.          256          2      213.521       83.3     0.5
>> Update               256        501      776.606      303.0     1.8
>> Constraints          256       1002     1200.285      468.2     2.7
>> Comm. energies       256         51     6926.667     2702.1    15.6
>> Rest                 256                 127.503       49.7     0.3
>> -----------------------------------------------------------------------
>> Total                384               44296.670    17280.0   100.0
>> -----------------------------------------------------------------------
>>
>> NOTE: 16 % of the run time was spent communicating energies,
>>      you might want to use the -nosum option of mdrun
>>
>>
>>        Parallel run - timing based on wallclock.
>>
>>               NODE (s)   Real (s)      (%)
>>       Time:     45.000     45.000    100.0
>>               (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
>> Performance:  14084.547    744.277      1.924     12.475
>>
>> #########
>>
>> Thanks,
>> Chris.
>>
>> Chris Neale wrote:
>>> Hello,
>>>
>>> I have been running simulations on a larger number of processors
>>> recently and am confused about the message regarding -nosum that
>>> occurs at the end of the .log file. In this case, I have included the
>>> -nosum option to mdrun and I still get this warning (gromacs 4.0.4).
>>>
>>> My command was:
>>> mpirun -np $(wc -l $PBS_NODEFILE | gawk '{print $1}') -machinefile
>>> $PBS_NODEFILE /scratch/cneale/exe/intel/gromacs-4.0.4/exec/bin/mdrun
>>> -deffnm test -nosum -npme 128
>
> Perhaps assigning the result of this to a variable and printing it
> before executing it would help confirm that -nosum really was there.

I am not sure what you mean... the while line as a variable? I'm  
pretty sure that -nosum is there.

>
> Your mdrun output from your first email was...
>
>>> #########
>>>
>>> To confirm that I am asking mdrun for -nosum, to stderr I get:
>>> ...
>>> Option       Type   Value   Description
>>> ------------------------------------------------------
>>> -[no]h       bool   no      Print help info and quit
>>> -nice        int    0       Set the nicelevel
>>> -deffnm      string test    Set the default filename for all file options
>>> -[no]xvgr    bool   yes     Add specific codes (legends etc.) in the
>>> output
>>>                            xvg files for the xmgrace program
>>> -[no]pd      bool   no      Use particle decompostion
>>> -dd          vector 0 0 0   Domain decomposition grid, 0 is optimize
>>> -npme        int    128     Number of separate nodes to be used for
>>> PME, -1
>>>                            is guess
>>> -ddorder     enum   interleave  DD node order: interleave, pp_pme or
>>> cartesian
>>> -[no]ddcheck bool   yes     Check for all bonded interactions with DD
>>> -rdd         real   0       The maximum distance for bonded
>>> interactions with
>>>                            DD (nm), 0 is determine from initial
>>> coordinates
>>> -rcon        real   0       Maximum distance for P-LINCS (nm), 0 is
>>> estimate
>>> -dlb         enum   auto    Dynamic load balancing (with DD): auto, no
>>> or yes
>>> -dds         real   0.8     Minimum allowed dlb scaling of the DD cell
>>> size
>>> -[no]sum     bool   no      Sum the energies at every step
>>> -[no]v       bool   no      Be loud and noisy
>>> -[no]compact bool   yes     Write a compact log file
>>> -[no]seppot  bool   no      Write separate V and dVdl terms for each
>>>                            interaction type and node to the log file(s)
>>> -pforce      real   -1      Print all forces larger than this (kJ/mol nm)
>>> -[no]reprod  bool   no      Try to avoid optimizations that affect binary
>>>                            reproducibility
>>> -cpt         real   15      Checkpoint interval (minutes)
>>> -[no]append  bool   no      Append to previous output files when
>>> continuing
>>>                            from checkpoint
>>> -[no]addpart bool   yes     Add the simulation part number to all output
>>>                            files when continuing from checkpoint
>>> -maxh        real   -1      Terminate after 0.99 times this time (hours)
>>> -multi       int    0       Do multiple simulations in parallel
>>> -replex      int    0       Attempt replica exchange every # steps
>>> -reseed      int    -1      Seed for replica exchange, -1 is generate
>>> a seed
>>> -[no]glas    bool   no      Do glass simulation with special long range
>>>                            corrections
>>> -[no]ionize  bool   no      Do a simulation including the effect of an
>>> X-Ray
>>>                            bombardment on your system
>>> ...
>>>
>>> ########
>
> ... and this does not demonstrate -nosum. Either you've mismatched, or
> the command line has lost the -nosum, or there's a bug.

The sections are not mismatched, but thanks for looking in such  
detail. Perhaps I misunderstand the -[no]sum line below that I have  
parsed from the above:

  -[no]sum     bool   no      Sum the energies at every step

The fact that
> the number for "Comm. energies" decreases suggests you have done it
> correctly, though. Perhaps the contents of this variable are being
> incorrectly propagated through the code.

This is what I think is most likely the case. I'll take a look, but my  
gromacs coding endeavours have not been highly successful in the past.

Thanks again,
Chris.

>
> Mark
>
>>> And the message at the end of the .log file is:
>>> ...
>>>    D O M A I N   D E C O M P O S I T I O N   S T A T I S T I C S
>>>
>>> av. #atoms communicated per step for force:  2 x 3376415.3
>>> av. #atoms communicated per step for LINCS:  2 x 192096.6
>>>
>>> Average load imbalance: 11.7 %
>>> Part of the total run time spent waiting due to load imbalance: 7.9 %
>>> Steps where the load balancing was limited by -rdd, -rcon and/or -dds:
>>> X 0 % Y 0 % Z 0 %
>>> Average PME mesh/force load: 0.620
>>> Part of the total run time spent waiting due to PP/PME imbalance: 10.0 %
>>>
>>> NOTE: 7.9 % performance was lost due to load imbalance
>>>      in the domain decomposition.
>>>
>>> NOTE: 10.0 % performance was lost because the PME nodes
>>>      had less work to do than the PP nodes.
>>>      You might want to decrease the number of PME nodes
>>>      or decrease the cut-off and the grid spacing.
>>>
>>>
>>>     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
>>>
>>> Computing:         Nodes     Number     G-Cycles    Seconds     %
>>> -----------------------------------------------------------------------
>>> Domain decomp.       256         51      337.551      131.2     0.7
>>> Send X to PME        256        501       59.454       23.1     0.1
>>> Comm. coord.         256        501      289.936      112.7     0.6
>>> Neighbor search      256         51     1250.088      485.9     2.8
>>> Force                256        501    16105.584     6259.9    35.4
>>> Wait + Comm. F       256        501     2441.390      948.9     5.4
>>> PME mesh             128        501     5552.336     2158.1    12.2
>>> Wait + Comm. X/F     128        501     9586.486     3726.1    21.1
>>> Wait + Recv. PME F   256        501      459.752      178.7     1.0
>>> Write traj.          256          2      223.993       87.1     0.5
>>> Update               256        501      777.618      302.2     1.7
>>> Constraints          256       1002     1223.093      475.4     2.7
>>> Comm. energies       256         51     7011.309     2725.1    15.4
>>> Rest                 256                 127.710       49.6     0.3
>>> -----------------------------------------------------------------------
>>> Total                384               45446.299    17664.0   100.0
>>> -----------------------------------------------------------------------
>>>
>>> NOTE: 15 % of the run time was spent communicating energies,
>>>      you might want to use the -nosum option of mdrun
>>>
>>>
>>>        Parallel run - timing based on wallclock.
>>>
>>>               NODE (s)   Real (s)      (%)
>>>       Time:     46.000     46.000    100.0
>>>               (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
>>> Performance:  13778.036    728.080      1.882     12.752
>>>
>>> ########
>>>
>>> Thanks,
>>> Chris
>>
>> _______________________________________________
>> gmx-users mailing list    gmx-users at gromacs.org
>> http://lists.gromacs.org/mailman/listinfo/gmx-users
>> Please search the archive at http://www.gromacs.org/search before posting!
>> Please don't post (un)subscribe requests to the list. Use the www
>> interface or send it to gmx-users-request at gromacs.org.
>> Can't post? Read http://www.gromacs.org/mailing_lists/users.php
>>
>
>
> ------------------------------
>
> Message: 4
> Date: Tue, 21 Jul 2009 09:12:31 +0200
> From: patrick fuchs <patrick.fuchs at univ-paris-diderot.fr>
> Subject: Re: [gmx-users] problem with 53a6 simulating a coiled-coil
> 	fiber	like protein
> To: Discussion list for GROMACS users <gmx-users at gromacs.org>
> Message-ID: <4A656A5F.4060104 at univ-paris-diderot.fr>
> Content-Type: text/plain; charset=UTF-8; format=flowed
>
>>> Well, not coiled coil, but I have observed serious distortions in a
>>> all-helix protein with G53A6 and reaction field. Using PME solved the
>>> problem.
>>
>> Indeed. rlist = 0.8 and any kind of electostatic cut-off was acceptable
>> in about the 1980s :-)
> Well, this is how the force field was parameterized with the force field
> correction in 2004... But I do agree that G53a6 with RF is 'helix
> unfriendly', never tried it with PME though.
>
> Patrick
>
>>
>> Mark
>>
>>> Marcos
>>>
>>> On Mon, 2009-07-20 at 13:50 +0200, Lory Montout wrote:
>>>> Dear all
>>>>
>>>> I recently performed MD simulations using 53A6 force field with
>>>> Gromacs4.0 The system includes a protein+water and ions for
>>>> neutralizing, The protocol is quite classical: NPT ensemble, 300K and
>>>> reaction field for electrostatics, 2fs for integration, bond lengths are
>>>> constrained.
>>>> The protein is a coiled-coil fiber like protein, including different
>>>> repeat units. At the
>>>> starting point, the protein  roughly adopts a cylinder shape. After
>>>> few ns ( less than 5), some helices are broken,  even unfold. Finally,
>>>> the protein is kinked,with a kink angle  ~ 90°. I tested different
>>>> constructions but observed similar results.
>>>> The same system was simulated with NAMD, charmm force field, the
>>>> structure remains stable all along the simulation (10ns for now ).
>>>> Did anyone obtain similar results for a coiled coil  system with 53A6
>>>> force field?
>>>>
>>>> here is my .mdp file :
>>>>
>>>> nstvout             =  10000
>>>> nstfout             =  0
>>>> nstxtcout           =  2500
>>>> xtc_precision       =  1000
>>>> nstlog              =  500
>>>> nstenergy           =  500
>>>> nstlist             =  5
>>>> rlist               =  0.8
>>>> coulombtype         =  generalized-reaction-field
>>>> rcoulomb            =  1.4
>>>> rvdw                =  1.4
>>>> epsilon_rf          =  62.0
>>>> ; Temperature coupling is on in two groups
>>>> Tcoupl              =  Berendsen
>>>> tc-grps             =  Protein  Non-Protein
>>>> tau_t               =  0.1      0.1
>>>> ref_t               =  300      300
>>>> ; Energy monitoring
>>>> energygrps          =  Protein SOL NA+
>>>> ; Pressure coupling is not on
>>>> Pcoupl              =  Berendsen
>>>> tau_p               =  1.0
>>>> compressibility     =  4.5e-5
>>>> ref_p               =  1.0
>>>>
>>>> Thanks a lot for your
>>>> answers._______________________________________________
>>>> gmx-users mailing list    gmx-users at gromacs.org
>>>> http://lists.gromacs.org/mailman/listinfo/gmx-users
>>>> Please search the archive at http://www.gromacs.org/search before
>>>> posting!
>>>> Please don't post (un)subscribe requests to the list. Use the www
>>>> interface or send it to gmx-users-request at gromacs.org.
>>>> Can't post? Read http://www.gromacs.org/mailing_lists/users.php
>>>
>>> _______________________________________________
>>> gmx-users mailing list    gmx-users at gromacs.org
>>> http://lists.gromacs.org/mailman/listinfo/gmx-users
>>> Please search the archive at http://www.gromacs.org/search before
>>> posting!
>>> Please don't post (un)subscribe requests to the list. Use the www
>>> interface or send it to gmx-users-request at gromacs.org.
>>> Can't post? Read http://www.gromacs.org/mailing_lists/users.php
>>>
>> _______________________________________________
>> gmx-users mailing list    gmx-users at gromacs.org
>> http://lists.gromacs.org/mailman/listinfo/gmx-users
>> Please search the archive at http://www.gromacs.org/search before posting!
>> Please don't post (un)subscribe requests to the list. Use the www
>> interface or send it to gmx-users-request at gromacs.org.
>> Can't post? Read http://www.gromacs.org/mailing_lists/users.php
>>
>
> --
> _______________________________________________________________________
> !!!! new E-mail address: patrick.fuchs at univ-paris-diderot.fr !!!!
> Patrick FUCHS
> Dynamique des Structures et Interactions des Macromolécules Biologiques
> INTS, INSERM UMR-S665, Université Paris Diderot,
> 6 rue Alexandre Cabanel, 75015 Paris
> Tel : +33 (0)1-44-49-30-57 - Fax : +33 (0)1-47-34-74-31
> Web Site: http://www.dsimb.inserm.fr/~fuchs
>
>
> ------------------------------
>
> _______________________________________________
> gmx-users mailing list
> gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> Please search the archive at http://www.gromacs.org/search before posting!
>
> End of gmx-users Digest, Vol 63, Issue 102
> ******************************************
>






More information about the gromacs.org_gmx-users mailing list