common: introduce new "safe" init manager#6296
common: introduce new "safe" init manager#6296htuch merged 9 commits intoenvoyproxy:masterfrom mergeconflict:safe_init
Conversation
Signed-off-by: Dan Rosen <mergeconflict@google.com>
|
/review @htuch @mattklein123 @lizan |
|
@mergeconflict my initial reaction is this is a lot of code churn to fix a relatively small issue. Before we start the review process, can you describe why it would not be better to just fix the other code like we already discussed? (Cancel callbacks when init manager is destroyed.) Thank you. |
Yeah, I understand, it does feel like a lot. The issue I found with the explicit cancelation approach is that we just end up with a use-after-free in the opposite direction, where the init manager attempts to cancel a target that had been destroyed before it. This was demonstrated in one or two unit and integration tests IIRC. I still think the lightest-touch fix is the one with a |
|
@mergeconflict if you discussed with @htuch and you both think the churn is worth it, I'm happy to review, I just wanted to make sure we were not doing change for the sake of change. |
|
@mattklein123 yep, I will review tomorrow. I realize this is a lot of churn, but there have been things that have been bugging me around
|
|
@htuch @mergeconflict sg will review also. |
htuch
left a comment
There was a problem hiding this comment.
I'm think this is a really nice design, some low level nits and probably needs some non-Googler to take a pass (@lizan or @mattklein123).
/wait
Signed-off-by: Dan Rosen <mergeconflict@google.com>
Signed-off-by: Dan Rosen <mergeconflict@google.com>
|
@lizan @mattklein123 by the way, I guess I should say, I think the best place to start reviewing this might be Also, let me know if it'd be helpful to chat on Slack or Zoom. And thanks again for your time reviewing and helping me with this! |
mattklein123
left a comment
There was a problem hiding this comment.
Flushing out a bunch of comments. Thanks this is very nice and well done. I think we should move forward with this, though I do want to make one general comment. That comment is that although this design will certainly fix the issue that you initially found, and is certainly much better from a use after free scenario, it may actually hide subtle initialization bugs that would have crashed previously but now might just not work correctly, and in some sense will be harder to debug than outright crashing. (I actually want to do a blog post on this exact topic: basically that sometimes increased memory safety can actually lead to different bugs that are harder to identify). Anyhow, that's really a non-actionable comment, but I thought I would throw it out there in case you can think of additional ways to try to prevent us getting into a situation in which "we don't crash but things juts don't work correctly." That's probably as many ASSERTS as possible and not being wishy-washy about how people should use the interface (I had one comment on that).
Thank you!
/wait
| /** | ||
| * Signal to the init manager that this target has finished initializing. This should ideally | ||
| * only be called once, after `initialize` was called. Calling it before initialization begins | ||
| * or after it has already been called before will have no effect. |
There was a problem hiding this comment.
If callers aren't supposed to call this way, can we at least ASSERT that is the case?
There was a problem hiding this comment.
I should actually probably change the comment, since I think it'll be actually called multiple times on purpose in some places. For example, in https://github.com/envoyproxy/envoy/blob/master/source/common/router/rds_impl.cc#L136:
void RdsRouteConfigSubscription::runInitializeCallbackIfAny() {
if (initialize_callback_) {
initialize_callback_();
initialize_callback_ = nullptr;
}
}That's called from a whole bunch of places in the file. The same pattern occurs in other implementations of Init::Target, so I figured I'd encapsulate it here.
|
@mattklein123 yeah, @mergeconflict and I discussed the hiding vs. memory safety aspect a bit in the design phase of this. We reached a conclusion that there are definitely some legitimate scenarios where ownership needs to be yanked by both parties (e.g. multiple listeners warming on the same route), even though in some cases this would potentially hide problematic ownership or lifetime organization. OTOH, the status quo is that if you mess up lifetime management, you get subtle errors anyway, since a lot of the issues we discovered around |
Signed-off-by: Dan Rosen <mergeconflict@google.com>
Signed-off-by: Dan Rosen <mergeconflict@google.com>
|
I just realized I have a minor design issue in /wait |
Signed-off-by: Dan Rosen <mergeconflict@google.com>
|
@mattklein123 I think I've addressed everything so far. Have another look if you get a chance, and let me know if there's anybody else you'd recommend to have a look. I have a local branch queued up to actually change all the use sites (https://github.com/mergeconflict/envoy/tree/safe_init_conversion), which is mostly mechanical and boring changes. So whenever this lands, I can rebase that and put up a PR. |
mattklein123
left a comment
There was a problem hiding this comment.
Thanks, at a high level LGTM. Very nice! I will defer to @htuch for the remainder of the review.
Signed-off-by: Dan Rosen <mergeconflict@google.com>
Signed-off-by: Dan Rosen <mergeconflict@google.com>
htuch
left a comment
There was a problem hiding this comment.
LGTM, but I'd like to discuss the mocking approach before merging.
test/mocks/safe_init/mocks.h
Outdated
| namespace SafeInit { | ||
|
|
||
| /** | ||
| * MockWatcher is a real WatcherImpl, subclassed to add a mock `ready` method that you can set |
There was a problem hiding this comment.
I'm not such a fan of this style TBH. Ideally mocks should be pure, so that when used in some unrelated code, which only needs to use part of the mocked class, e.g. just the add(), it only needs the single EXPECT_CALL and nothing else. This approach pushes a lot of policy into the mock. Sometimes we do this for convenience, when we don't want a zillion tests reinventing how to mock ClusterManager, for example, but in this case I think we could have a purer mock here and override it based on WatcherImpl in the specific tests.
There was a problem hiding this comment.
Updated: these really aren't mocks, they are real WatcherImpl and TargetImpl subclassed to make tests less cumbersome to write. Renamed.
Signed-off-by: Dan Rosen <mergeconflict@google.com>
htuch
left a comment
There was a problem hiding this comment.
Thanks, this is a great improvement and reduction in cognitive load around the init flow.
Description: Introduce a new "safe" init manager, to replace the existing one that's prone to use-after-free issues (see e.g. #6116). Users of the existing init manager will be upgraded one-by-one in subsequent PRs if this design is approved. See also previous false starts in PRs #6136 and #6245.
Risk Level: Low, no existing users of the existing init manager are changed in this PR.
Testing: New unit tests added.
Docs Changes: n/a
Release Notes: n/a
Signed-off-by: Dan Rosen mergeconflict@google.com