[gmx-users] System volume "jumps" on exact continuations

Elizabeth Ploetz ploetz at ksu.edu
Thu Jun 1 23:15:34 CEST 2017


On Thu, Jun 1, 2017 at 9:39 PM, Elizabeth Ploetz <ploetz at ksu.edu> wrote:

>>> However, if most runs are group scheme, a quick check could show whether
>>> jumps are present in runs that i) do PP-PME tuning ii) if logs go truncated
>>> during continuation at least whether they do use separate PME ranks
>>> (because otherwise CPU-only runs don't tune).
>>
>> i) If grepping "timed" from the LOG file does not give any output, does
>> that mean there was no PP-PME tuning? (Sorry for the stupid question. I'm
>> not sure which piece of information from the LOG file is going to answer
>> whether or not there was PP-PME tuning.)

> Do you run with -append? If so, the log file too gets truncated, but I do
> not recall exactly where and whether the PP-PME balancing messages are
> removed or not, but it's not hard to try -- just run with separate PME and
> too few of them (e.g. 1 out of 12) and that will trigger load balancing.

I run with -noappend, so I think the LOG files are intact. I get load balancing messages.

> On a second thought, instead of testing with Verlet, you might want to just
> do the above and try to directly observe the anomalies after the balancer.

>> If so, perhaps there is a correlation between having PP-PME tuning and
>> having a jump. Please see this link<http://i1243.photobucket.
>> com/albums/gg545/ploetz/volumeJumps_zps8hmlghtn.png>. *If* the volume for
>> 40-60ns of row 3 is the correct system volume, then all the data in this
>> figure is consistent with there being a jump when there is PP-PME tuning.
>> (Please note that while the data at 1 bar looks okay in this case, and
>> elevated pressures do not, this is not always true. We get jumps at 1 bar
>> as well sometimes.)
>> ii) These are all CPU-only runs. The simulations always use separate PME
>> ranks.
>> Please let me know if any particular data from the LOG file would be
>> helpful.
>>

> It would be easier if you provided logs that we can look through.

Please see three log files here: https://drive.google.com/open?id=0BznaVquT5XVyVkNjMHh2eXJ5amc . These correspond to the third row (6 kbar) data I just linked to in my previous post (directly above, the volumeJumps.png image). The 40-60ns log doesn't have the "PP-PME Load Balancing" section, but the other two do.

>>>
>>> If I understood correctly, it's only group scheme runs where this has been
>>> observed, so it could be some newer feature/change that interacts badly
>>> with the group scheme.
>>
>> You are correct, so far we have not seen any jumps with Verlet.
>>
>>> BTW, do you have any data with 4.5?
>>>
>> I have a few old simulations with version 4.5.3 (none with 4.5, sorry).
>> They were all ran with inexact continuations (i.e., I did not provide
>> checkpoint files when running multiple short runs to create one long
>> simulation) or single trajectories that I had killed at various points and
>> then continued using checkpoint files and -append. I don't have a huge data
>> set with 4.5.3, but none of them exhibited jumps!
>>
>>> I'd suggest that (especially if if investigation of current data does not
>>> reveal the reasons) pick a setup where you seemed to get the anomaly and
>>> run with the same settings using the Verlet scheme lots of short runs with
>>> restarts in a loop.
>>>
>> Thanks, we are doing this test.


More information about the gromacs.org_gmx-users mailing list