Skip to content

stats: Resolve loss of importMode for gauges that are initialized first, then merged.#7255

Merged
mattklein123 merged 7 commits intoenvoyproxy:masterfrom
jmarantz:multi-scope-gauge-lose-import-mode
Jun 14, 2019
Merged

stats: Resolve loss of importMode for gauges that are initialized first, then merged.#7255
mattklein123 merged 7 commits intoenvoyproxy:masterfrom
jmarantz:multi-scope-gauge-lose-import-mode

Conversation

@jmarantz
Copy link
Copy Markdown
Contributor

@jmarantz jmarantz commented Jun 13, 2019

Description: Fixes a problem where a gauge that was first created by the child process would then not import correctly from the parent, adding unit-test and system-test for this. The code looked like it handled this, but the problem was that we were doing a scope lookup for the old stat and that was failing, but the stat was in the alloc table and so should have been re-used.

The cleanest way I could think of to fix this was to change the flag-bits so that all zeros mean Uninitialized, which is how HeapStatData::alloc allocates new stats. Then, when constructing the Gauge object in a scope, we can simply respect those bits if they are set.
Risk Level: low
Testing: //test/...
Docs Changes: n/a
Release Notes: n/a
Fixes: #7227

jmarantz added 2 commits June 12, 2019 19:58
…ating the same stat in two different scopes.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
@jmarantz
Copy link
Copy Markdown
Contributor Author

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 2 pipeline(s).

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, agreed this is much simpler to understand. Thanks so much for the quick turnaround and the integration test also.

I was thinking about it more and I think it's not that hard to fix the race condition we were talking about. if you look at the code here:

auto& gauge_ref = temp_scope_->gaugeFromStatName(stat_name, import_mode);

I think all you need to do is check to make sure the gauge is still uninitialized and not no import before continuing with the add/sub logic. This would fix the race in which the stat gets created after you do the find check. I think this will be a little hard to test, and it's probably not worth doing in this change, but if you agree can you add a TODO and potentially file an issue for us to follow up?

/wait

…ortModeAfterMerge

Signed-off-by: Joshua Marantz <jmarantz@google.com>
@jmarantz jmarantz marked this pull request as ready for review June 13, 2019 13:29
@jmarantz jmarantz changed the title WiP stats: Resolve loss of importMode for gauges that are initialized first, then merged. stats: Resolve loss of importMode for gauges that are initialized first, then merged. Jun 13, 2019
@jmarantz
Copy link
Copy Markdown
Contributor Author

/retest

@repokitteh-read-only
Copy link
Copy Markdown

🔨 rebuilding ci/circleci: coverage (failed build)
🔨 rebuilding ci/circleci: ipv6_tests (failed build)

🐱

Caused by: a #7255 (comment) was created by @jmarantz.

see: more, trace.

@mattklein123
Copy link
Copy Markdown
Member

@jmarantz did you see my comment about the race above? It would be good to add a TODO/comment about that. I think my proposed fix is still not correct since it could still race after the check, but maybe just put a comment in and we can think about it after this goes in?

Signed-off-by: Joshua Marantz <jmarantz@google.com>
@jmarantz
Copy link
Copy Markdown
Contributor Author

per slack conversation I think there isn't a race there and I added commentary in stat_merger.cc to describe the various combinations..

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this is awesome. A few small comments.

/wait

// thinks the gauge is Accumulate do we consider cases 2-4.
// 2. Child thinks gauge is Accumulate : data is combined in
// gauge_ref.add() below.
// 3. Child thinks gauge is NeverImport: we skip this loop entry via
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify that this case only happens if a gauge has been changed to NeverImport by the child? Is that right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, added comment.

} else {
gauge_ref.sub(old_parent_value - new_parent_value);
}
// Note that new_parent_value may be less than old_parent_value, in which
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have sufficient tests for this wrapping?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so; one is:

TEST_F(StatMergerTest, multipleImportsWithAccumulationLogic) {
  {
    Protobuf::Map<std::string, uint64_t> gauges;
    gauges["whywassixafraidofseven"] = 100;
    stat_merger_.mergeStats(empty_counter_deltas_, gauges);
    // Initial combined values: 678+100 and 1+2.
    EXPECT_EQ(778, whywassixafraidofseven_.value());
  }
  {
    Protobuf::Map<std::string, uint64_t> gauges;
    // The parent's gauge drops by 1, and its counter increases by 1.
    gauges["whywassixafraidofseven"] = 99;
    stat_merger_.mergeStats(empty_counter_deltas_, gauges);
    EXPECT_EQ(777, whywassixafraidofseven_.value());

dropping whywassixafraidofseven from 100 to 99 triggers that behavior.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
mattklein123
mattklein123 previously approved these changes Jun 13, 2019
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks.

…t gauges when they are re-accessed.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
…led.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks.

@mattklein123 mattklein123 merged commit ea63c3a into envoyproxy:master Jun 14, 2019
@jmarantz jmarantz deleted the multi-scope-gauge-lose-import-mode branch June 14, 2019 02:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Issue with no import stats during hot restart merge

2 participants