common: introduce new "safe" init manager by mergeconflict · Pull Request #6296 · envoyproxy/envoy

mergeconflict · 2019-03-15T01:55:58Z

Description: Introduce a new "safe" init manager, to replace the existing one that's prone to use-after-free issues (see e.g. #6116). Users of the existing init manager will be upgraded one-by-one in subsequent PRs if this design is approved. See also previous false starts in PRs #6136 and #6245.
Risk Level: Low, no existing users of the existing init manager are changed in this PR.
Testing: New unit tests added.
Docs Changes: n/a
Release Notes: n/a

Signed-off-by: Dan Rosen mergeconflict@google.com

Signed-off-by: Dan Rosen <mergeconflict@google.com>

mergeconflict · 2019-03-15T02:00:11Z

/review @htuch @mattklein123 @lizan

mattklein123 · 2019-03-15T03:05:51Z

@mergeconflict my initial reaction is this is a lot of code churn to fix a relatively small issue. Before we start the review process, can you describe why it would not be better to just fix the other code like we already discussed? (Cancel callbacks when init manager is destroyed.) Thank you.

mergeconflict · 2019-03-15T03:29:01Z

@mergeconflict my initial reaction is this is a lot of code churn to fix a relatively small issue. Before we start the review process, can you describe why it would not be better to just fix the other code like we already discussed? (Cancel callbacks when init manager is destroyed.) Thank you.

Yeah, I understand, it does feel like a lot.

The issue I found with the explicit cancelation approach is that we just end up with a use-after-free in the opposite direction, where the init manager attempts to cancel a target that had been destroyed before it. This was demonstrated in one or two unit and integration tests IIRC.

I still think the lightest-touch fix is the one with a shared_ptr<boolean> indicating whether the InitManager has been destroyed:
https://github.com/mergeconflict/envoy/blob/9b27afc25b363aac63d76622c6cce49378dbcabd/source/server/init_manager_impl.cc
It doesn't solve all possible UAFs, so @htuch encouraged me to push forward with this more comprehensive approach, but it does have the virtue that no other code has to change. I'll let you guys discuss; I'm happy to go either way.

mattklein123 · 2019-03-15T04:18:25Z

@mergeconflict if you discussed with @htuch and you both think the churn is worth it, I'm happy to review, I just wanted to make sure we were not doing change for the sake of change.

htuch · 2019-03-15T04:57:31Z

@mattklein123 yep, I will review tomorrow. I realize this is a lot of churn, but there have been things that have been bugging me around InitManager for ages and then more recently:

The naming of this class and methods, as well as continuation passing style, makes it really hard to page back in how it works every 6 months or so I return back to this code.
We have had a number of memory safety and lifetime issues exposed by fuzzers and crash reports around the existing setup.
So, I'm excited that @mergeconflict has put together a structural approach to solving both these problems.

mattklein123 · 2019-03-15T20:15:21Z

@htuch @mergeconflict sg will review also.

htuch

I'm think this is a really nice design, some low level nits and probably needs some non-Googler to take a pass (@lizan or @mattklein123).

/wait

include/envoy/safe_init/manager.h

source/common/safe_init/manager_impl.cc

source/common/safe_init/manager_impl.h

include/envoy/safe_init/BUILD

Signed-off-by: Dan Rosen <mergeconflict@google.com>

mergeconflict · 2019-03-15T21:26:23Z

@lizan @mattklein123 by the way, I guess I should say, I think the best place to start reviewing this might be test/safe_init/manager_impl_test.cc. I think the UnavailableTarget, UnavailableManager and UnavailableWatcher test cases show what I'm fundamentally trying to accomplish.

Also, let me know if it'd be helpful to chat on Slack or Zoom. And thanks again for your time reviewing and helping me with this!

mattklein123

Flushing out a bunch of comments. Thanks this is very nice and well done. I think we should move forward with this, though I do want to make one general comment. That comment is that although this design will certainly fix the issue that you initially found, and is certainly much better from a use after free scenario, it may actually hide subtle initialization bugs that would have crashed previously but now might just not work correctly, and in some sense will be harder to debug than outright crashing. (I actually want to do a blog post on this exact topic: basically that sometimes increased memory safety can actually lead to different bugs that are harder to identify). Anyhow, that's really a non-actionable comment, but I thought I would throw it out there in case you can think of additional ways to try to prevent us getting into a situation in which "we don't crash but things juts don't work correctly." That's probably as many ASSERTS as possible and not being wishy-washy about how people should use the interface (I had one comment on that).

Thank you!

/wait

source/common/safe_init/manager_impl.h

source/common/safe_init/manager_impl.cc

source/common/safe_init/target_impl.h

source/common/safe_init/watcher_impl.h

mattklein123 · 2019-03-17T18:16:56Z

source/common/safe_init/target_impl.h

+  /**
+   * Signal to the init manager that this target has finished initializing. This should ideally
+   * only be called once, after `initialize` was called. Calling it before initialization begins
+   * or after it has already been called before will have no effect.


If callers aren't supposed to call this way, can we at least ASSERT that is the case?

I should actually probably change the comment, since I think it'll be actually called multiple times on purpose in some places. For example, in https://github.com/envoyproxy/envoy/blob/master/source/common/router/rds_impl.cc#L136:

void RdsRouteConfigSubscription::runInitializeCallbackIfAny() { if (initialize_callback_) { initialize_callback_(); initialize_callback_ = nullptr; } }

That's called from a whole bunch of places in the file. The same pattern occurs in other implementations of Init::Target, so I figured I'd encapsulate it here.

source/common/safe_init/target_impl.h

source/common/safe_init/watcher_impl.h

htuch · 2019-03-18T03:20:26Z

@mattklein123 yeah, @mergeconflict and I discussed the hiding vs. memory safety aspect a bit in the design phase of this. We reached a conclusion that there are definitely some legitimate scenarios where ownership needs to be yanked by both parties (e.g. multiple listeners warming on the same route), even though in some cases this would potentially hide problematic ownership or lifetime organization.

OTOH, the status quo is that if you mess up lifetime management, you get subtle errors anyway, since a lot of the issues we discovered around InitManager only occurred as a result of flakes in TSAN or random production crash reports. So, ¯\_(ツ)_/¯

Signed-off-by: Dan Rosen <mergeconflict@google.com>

mergeconflict · 2019-03-18T19:28:24Z

I just realized I have a minor design issue in TargetImpl: it is currently designed to be used as a mix-in class, but it also inherits from Logger::Loggable. This creates a problem for any class that inherits from it and also inherits from Logger::Loggable. So I'll change TargetImpl to not be a mix-in, just a regular data member like WatcherImpl.

/wait

Signed-off-by: Dan Rosen <mergeconflict@google.com>

mergeconflict · 2019-03-19T15:35:26Z

@mattklein123 I think I've addressed everything so far. Have another look if you get a chance, and let me know if there's anybody else you'd recommend to have a look.

I have a local branch queued up to actually change all the use sites (https://github.com/mergeconflict/envoy/tree/safe_init_conversion), which is mostly mechanical and boring changes. So whenever this lands, I can rebase that and put up a PR.

mattklein123

Thanks, at a high level LGTM. Very nice! I will defer to @htuch for the remainder of the review.

source/common/safe_init/target_impl.cc

Signed-off-by: Dan Rosen <mergeconflict@google.com>

htuch

LGTM, but I'd like to discuss the mocking approach before merging.

htuch · 2019-03-21T18:33:28Z

test/mocks/safe_init/mocks.h

+namespace SafeInit {
+
+/**
+ * MockWatcher is a real WatcherImpl, subclassed to add a mock `ready` method that you can set


I'm not such a fan of this style TBH. Ideally mocks should be pure, so that when used in some unrelated code, which only needs to use part of the mocked class, e.g. just the add(), it only needs the single EXPECT_CALL and nothing else. This approach pushes a lot of policy into the mock. Sometimes we do this for convenience, when we don't want a zillion tests reinventing how to mock ClusterManager, for example, but in this case I think we could have a purer mock here and override it based on WatcherImpl in the specific tests.

Updated: these really aren't mocks, they are real WatcherImpl and TargetImpl subclassed to make tests less cumbersome to write. Renamed.

Signed-off-by: Dan Rosen <mergeconflict@google.com>

htuch

Thanks, this is a great improvement and reduction in cognitive load around the init flow.

common: introduce new "safe" init manager

28ccab4

Signed-off-by: Dan Rosen <mergeconflict@google.com>

repokitteh-read-only bot requested review from htuch, lizan and mattklein123 March 15, 2019 02:00

htuch self-assigned this Mar 15, 2019

htuch suggested changes Mar 15, 2019

View reviewed changes

include/envoy/safe_init/manager.h Show resolved Hide resolved

source/common/safe_init/manager_impl.cc Show resolved Hide resolved

source/common/safe_init/manager_impl.cc Outdated Show resolved Hide resolved

source/common/safe_init/manager_impl.h Outdated Show resolved Hide resolved

repokitteh-read-only bot added the waiting label Mar 15, 2019

lizan reviewed Mar 15, 2019

View reviewed changes

include/envoy/safe_init/BUILD Show resolved Hide resolved

fix gmock compilation issue in ci

86cc5cf

Signed-off-by: Dan Rosen <mergeconflict@google.com>

repokitteh-read-only bot removed the waiting label Mar 15, 2019

no doc comments for overridden methods

c85df15

Signed-off-by: Dan Rosen <mergeconflict@google.com>

mattklein123 self-assigned this Mar 17, 2019

mattklein123 requested changes Mar 17, 2019

View reviewed changes

repokitteh-read-only bot added the waiting label Mar 17, 2019

Dan Rosen added 2 commits March 18, 2019 10:11

Merge branch 'master' into safe_init

d863ffa

Signed-off-by: Dan Rosen <mergeconflict@google.com>

address comments from matt & harvey

ec99a04

Signed-off-by: Dan Rosen <mergeconflict@google.com>

repokitteh-read-only bot removed the waiting label Mar 18, 2019

repokitteh-read-only bot added the waiting label Mar 18, 2019

change SafeInit::Target to be a data member rather than a mix-in

afef8b8

Signed-off-by: Dan Rosen <mergeconflict@google.com>

repokitteh-read-only bot removed the waiting label Mar 18, 2019

htuch added the waiting label Mar 19, 2019

htuch removed the waiting label Mar 19, 2019

mattklein123 previously approved these changes Mar 21, 2019

View reviewed changes

source/common/safe_init/target_impl.cc Outdated Show resolved Hide resolved

source/common/safe_init/target_impl.cc Outdated Show resolved Hide resolved

Dan Rosen added 2 commits March 21, 2019 13:09

Merge branch 'master' into safe_init

98e90c1

Signed-off-by: Dan Rosen <mergeconflict@google.com>

address matt's comments

162044e

Signed-off-by: Dan Rosen <mergeconflict@google.com>

mergeconflict dismissed mattklein123’s stale review via 162044e March 21, 2019 17:17

htuch suggested changes Mar 21, 2019

View reviewed changes

htuch added the waiting label Mar 21, 2019

renaming mock watcher and target

8da0732

Signed-off-by: Dan Rosen <mergeconflict@google.com>

repokitteh-read-only bot removed the waiting label Mar 22, 2019

htuch approved these changes Mar 22, 2019

View reviewed changes

htuch merged commit 1301f11 into envoyproxy:master Mar 22, 2019

mergeconflict deleted the safe_init branch March 27, 2019 14:02

Conversation

mergeconflict commented Mar 15, 2019

Uh oh!

mergeconflict commented Mar 15, 2019

Uh oh!

mattklein123 commented Mar 15, 2019

Uh oh!

mergeconflict commented Mar 15, 2019

Uh oh!

mattklein123 commented Mar 15, 2019

Uh oh!

htuch commented Mar 15, 2019

Uh oh!

mattklein123 commented Mar 15, 2019

Uh oh!

htuch left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergeconflict commented Mar 15, 2019

Uh oh!

mattklein123 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mattklein123 Mar 17, 2019

Choose a reason for hiding this comment

Uh oh!

mergeconflict Mar 18, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

htuch commented Mar 18, 2019

Uh oh!

mergeconflict commented Mar 18, 2019

Uh oh!

mergeconflict commented Mar 19, 2019

Uh oh!

mattklein123 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

htuch left a comment

Choose a reason for hiding this comment

Uh oh!

htuch Mar 21, 2019

Choose a reason for hiding this comment

Uh oh!

mergeconflict Mar 22, 2019

Choose a reason for hiding this comment

Uh oh!

htuch left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants