Skip to content

[health checking] prep for outlier detection ejecting via health watch#33340

Merged
markdroth merged 10 commits into
grpc:masterfrom
markdroth:outlier_detection_ejection_via_health_watch
Jun 26, 2023
Merged

[health checking] prep for outlier detection ejecting via health watch#33340
markdroth merged 10 commits into
grpc:masterfrom
markdroth:outlier_detection_ejection_via_health_watch

Conversation

@markdroth

@markdroth markdroth commented Jun 4, 2023

Copy link
Copy Markdown
Member

Currently, the outlier_detection policy reports ejection by intercepting a subchannel's raw connectivity state watch. In the dualstack backend design, we will instead want to report ejection via the health watch. This PR is a first step toward that goal.

Specific changes in this PR:

  • Add type() method to InternalSubchannelDataWatcherInterface, to make it possible for LB policies to intercept data watchers.
  • Use that mechanism in the outlier_detection policy to report ejection both via raw connectivity state watches and via health watches. The hack to prevent outlier_detection from working with pick_first from [outlier detection] hack to prevent OD from working with pick_first #33336 has been changed to affect only the raw connectivity watch, not the health watch.
  • Change health check client to fall back to reporting the raw connectivity state if client-side health checking is not enabled. This will allow pick_first to unconditionally start a health watch when it is running under a petiole policy, which will be a no-op if neither health checking nor outlier detection are configured.

Once we are done changing all of the petiole policies to delegate to pick_first, we will remove the code that allows outlier_detection to work via the raw connectivity state, so it will work only via the health watch.

@markdroth markdroth marked this pull request as ready for review June 26, 2023 17:22
@markdroth markdroth requested a review from eugeneo June 26, 2023 17:22

@eugeneo eugeneo left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor questions/comments.

auto it =
health_checkers_.emplace(*health_check_service_name, nullptr).first;
auto& health_checker = it->second;
if (health_checker == nullptr) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand this code. Isn't this value always nullptr? Shouldn't MakeOrphanable... from line 372 replace nullptr on line 369?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two possible outcomes from the emplace() call on line 369:

  1. The entry already existed in the map. In this case, no new entry is added (and the nullptr value passed to emplace() is not actually stored anywhere), and the iterator will point to the preexisting entry in the map.
  2. The entry did not already exist in the map. In this case, emplace() will add the entry, and the iterator will point to the newly added entry, whose value is nullptr.

If health_checker is null here, that means we're in case 2, and we need to construct the new object.

The reason we don't call MakeOrphanable() before calling emplace() is that we don't want to actually allocate the object if the entry already exists in the map, because in that case, the newly allocated object would just wind up getting immediately destroyed, so we'd do unnecessary work.

non_health_watchers_.erase(watcher);
} else {
auto it = health_checkers_.find(*health_check_service_name);
if (it == health_checkers_.end()) return;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find these several lines confusing, with multiple conditions and temp variable.

Suggested change
if (it == health_checkers_.end()) return;
if (it != health_checkers_.end() && it->second->RemoveWatcherLocked(watcher)) {
health_checkers_.erase(it);
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the temp variable actually makes the code more readable. Without it, someone reading this code needs to go look at the comment on the RemoveWatcherLocked() method to see that it returns true when this was the last watcher in the HealthChecker.

@markdroth markdroth merged commit 8393319 into grpc:master Jun 26, 2023
@markdroth markdroth deleted the outlier_detection_ejection_via_health_watch branch June 26, 2023 21:27
@copybara-service copybara-service Bot added the imported Specifies if the PR has been imported to the internal repository label Jun 26, 2023
markdroth added a commit that referenced this pull request Jun 28, 2023
…state (#33427)

More work on the dualstack backend design:
- Now that all petiole policies have been changed to delegate to
pick_first, outlier detection no longer needs to eject via the
subchannel's raw connectivity state; it can now eject only via the
health state. See #33340.
- This also removes the now-unnecessary hack to explicitly disable
outlier detection in pick_first. See #33336.
mario-vimal pushed a commit to mario-vimal/grpc that referenced this pull request Jul 13, 2023
grpc#33340)

Currently, the outlier_detection policy reports ejection by intercepting
a subchannel's raw connectivity state watch. In the dualstack backend
design, we will instead want to report ejection via the health watch.
This PR is a first step toward that goal.

Specific changes in this PR:
- Add `type()` method to `InternalSubchannelDataWatcherInterface`, to
make it possible for LB policies to intercept data watchers.
- Use that mechanism in the outlier_detection policy to report ejection
both via raw connectivity state watches and via health watches. The hack
to prevent outlier_detection from working with pick_first from grpc#33336
has been changed to affect only the raw connectivity watch, not the
health watch.
- Change health check client to fall back to reporting the raw
connectivity state if client-side health checking is not enabled. This
will allow pick_first to unconditionally start a health watch when it is
running under a petiole policy, which will be a no-op if neither health
checking nor outlier detection are configured.

Once we are done changing all of the petiole policies to delegate to
pick_first, we will remove the code that allows outlier_detection to
work via the raw connectivity state, so it will work only via the health
watch.
mario-vimal pushed a commit to mario-vimal/grpc that referenced this pull request Jul 13, 2023
…state (grpc#33427)

More work on the dualstack backend design:
- Now that all petiole policies have been changed to delegate to
pick_first, outlier detection no longer needs to eject via the
subchannel's raw connectivity state; it can now eject only via the
health state. See grpc#33340.
- This also removes the now-unnecessary hack to explicitly disable
outlier detection in pick_first. See grpc#33336.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bloat/low imported Specifies if the PR has been imported to the internal repository lang/core per-call-memory/neutral per-channel-memory/neutral release notes: no Indicates if PR should not be in release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants