Skip to content

[serve] Improve health check failure semantics#22297

Merged
edoakes merged 35 commits intoray-project:masterfrom
edoakes:serve-consecutive-health-check
Feb 14, 2022
Merged

[serve] Improve health check failure semantics#22297
edoakes merged 35 commits intoray-project:masterfrom
edoakes:serve-consecutive-health-check

Conversation

@edoakes
Copy link
Copy Markdown
Collaborator

@edoakes edoakes commented Feb 10, 2022

Why are these changes needed?

Follow-up from #22297

Enhances the health check failure detection logic.

  • A user-defined health check is required to fail 3 times consecutively before the replica is marked unhealthy.
  • If the actor fails, we short-circuit and mark the replica unhealthy immediately.
  • A replica becoming unhealthy causes the deployment status to be updated to UNHEALTHY.

TODO:

  • Integration tests for the health checking behaviors.
  • Unit tests for the deployment status updating logic.

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@edoakes edoakes force-pushed the serve-consecutive-health-check branch from 938cd48 to 6949a45 Compare February 11, 2022 16:08
@edoakes edoakes force-pushed the serve-consecutive-health-check branch from 92ed219 to 15927ae Compare February 11, 2022 16:11
@edoakes edoakes changed the title [WIP] Require health check to fail 3 times before marking replica unhealthy [WIP] Improve health check failure semantics Feb 11, 2022
@edoakes edoakes changed the title [WIP] Improve health check failure semantics [serve] Improve health check failure semantics Feb 11, 2022
@edoakes
Copy link
Copy Markdown
Collaborator Author

edoakes commented Feb 11, 2022

@shrekris-anyscale @simon-mo ready for review

Copy link
Copy Markdown
Contributor

@shrekris-anyscale shrekris-anyscale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job! I added a couple notes about some of the comments. Once those are addressed, this should be good to go.

@edoakes
Copy link
Copy Markdown
Collaborator Author

edoakes commented Feb 11, 2022

@shrekris-anyscale I updated the docstrings and am using Enum to make the code easier to read, please take another quick look

@shrekris-anyscale
Copy link
Copy Markdown
Contributor

@edoakes I like the enum changes. I added a couple comments about them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants