stats: Resolve loss of importMode for gauges that are initialized first, then merged.#7255
Conversation
…ating the same stat in two different scopes. Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
|
/azp run |
|
Azure Pipelines successfully started running 2 pipeline(s). |
mattklein123
left a comment
There was a problem hiding this comment.
Thanks, agreed this is much simpler to understand. Thanks so much for the quick turnaround and the integration test also.
I was thinking about it more and I think it's not that hard to fix the race condition we were talking about. if you look at the code here:
envoy/source/common/stats/stat_merger.cc
Line 41 in 0243ded
I think all you need to do is check to make sure the gauge is still uninitialized and not no import before continuing with the add/sub logic. This would fix the race in which the stat gets created after you do the find check. I think this will be a little hard to test, and it's probably not worth doing in this change, but if you agree can you add a TODO and potentially file an issue for us to follow up?
/wait
…ortModeAfterMerge Signed-off-by: Joshua Marantz <jmarantz@google.com>
|
/retest |
|
🔨 rebuilding |
|
@jmarantz did you see my comment about the race above? It would be good to add a TODO/comment about that. I think my proposed fix is still not correct since it could still race after the check, but maybe just put a comment in and we can think about it after this goes in? |
Signed-off-by: Joshua Marantz <jmarantz@google.com>
|
per slack conversation I think there isn't a race there and I added commentary in stat_merger.cc to describe the various combinations.. |
mattklein123
left a comment
There was a problem hiding this comment.
Thanks this is awesome. A few small comments.
/wait
| // thinks the gauge is Accumulate do we consider cases 2-4. | ||
| // 2. Child thinks gauge is Accumulate : data is combined in | ||
| // gauge_ref.add() below. | ||
| // 3. Child thinks gauge is NeverImport: we skip this loop entry via |
There was a problem hiding this comment.
Can you clarify that this case only happens if a gauge has been changed to NeverImport by the child? Is that right?
There was a problem hiding this comment.
yes, added comment.
| } else { | ||
| gauge_ref.sub(old_parent_value - new_parent_value); | ||
| } | ||
| // Note that new_parent_value may be less than old_parent_value, in which |
There was a problem hiding this comment.
Do we have sufficient tests for this wrapping?
There was a problem hiding this comment.
I think so; one is:
TEST_F(StatMergerTest, multipleImportsWithAccumulationLogic) {
{
Protobuf::Map<std::string, uint64_t> gauges;
gauges["whywassixafraidofseven"] = 100;
stat_merger_.mergeStats(empty_counter_deltas_, gauges);
// Initial combined values: 678+100 and 1+2.
EXPECT_EQ(778, whywassixafraidofseven_.value());
}
{
Protobuf::Map<std::string, uint64_t> gauges;
// The parent's gauge drops by 1, and its counter increases by 1.
gauges["whywassixafraidofseven"] = 99;
stat_merger_.mergeStats(empty_counter_deltas_, gauges);
EXPECT_EQ(777, whywassixafraidofseven_.value());
dropping whywassixafraidofseven from 100 to 99 triggers that behavior.
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…t gauges when they are re-accessed. Signed-off-by: Joshua Marantz <jmarantz@google.com>
…led. Signed-off-by: Joshua Marantz <jmarantz@google.com>
Description: Fixes a problem where a gauge that was first created by the child process would then not import correctly from the parent, adding unit-test and system-test for this. The code looked like it handled this, but the problem was that we were doing a scope lookup for the old stat and that was failing, but the stat was in the alloc table and so should have been re-used.
The cleanest way I could think of to fix this was to change the flag-bits so that all zeros mean Uninitialized, which is how HeapStatData::alloc allocates new stats. Then, when constructing the Gauge object in a scope, we can simply respect those bits if they are set.
Risk Level: low
Testing: //test/...
Docs Changes: n/a
Release Notes: n/a
Fixes: #7227