Skip to content

HTTP2 Proactive GOAWAY on Drain - Preamble#17026

Merged
zuercher merged 5 commits intoenvoyproxy:mainfrom
murray-stripe:murray/http2-drain-proactive-goaway--plumbing
Jun 28, 2021
Merged

HTTP2 Proactive GOAWAY on Drain - Preamble#17026
zuercher merged 5 commits intoenvoyproxy:mainfrom
murray-stripe:murray/http2-drain-proactive-goaway--plumbing

Conversation

@murray-stripe
Copy link
Copy Markdown
Contributor

Commit Message:

The initial set of changes (broken up to reduce PR/change size) to
enable proactive GOAWAY signals on drain events. These changes
(currently a no-op) open the way for a push-based notification model to
the Http::ConnectionManagerImpl class (or any other interested party).
Allowing a move away from the pull-based model that currently exists,
which causes issues such as #14350.

The primary change here is to introduce the idea of a "drain tree" which
allows the drain-manager to be structured with parent and child
managers. Each manager may drain independent of it's parent and all
drain actions cascade downward through the tree. The parent <-> child
relationship is thread-safe.

In addition, drain events can now be registered via callbacks to be
invoked when draining starts. This is a non-thread-safe operation.

A new thread-safe callback manager utility has been written to support
the parent <-> child drain-manager relationship.

Additional test coverage has been added for the drain-manager and
thread-safe callback-manager, as well as updates to necessary mock
classes.

Because the callback registration impacts the DrainDecision interface,
necessary no-op implementations have been addedto various classes
implementing this interface. The final versions will be wired up with
correct behavior in subsequent PRs that take advantage of the new
functionality.

Additional Description: This PR was created at the advisement of @antoniovicente in relation to #16201 as an effort to reduce the amount of change being reviewed in a single PR. The content of this PR is meant to be a no-op with #16201 enacting the change (and will be updated once this is merged to reduce it's scope).

Risk Level: Low. Changes included should be no-op.
Testing: Additional test coverage included.
Docs Changes: None.
Release Notes: None.
Platform Specific Features: N/A
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Deprecated:]
[Optional API Considerations:]

The initial set of changes (broken up to reduce PR/change size) to
enable proactive GOAWAY signals on drain events. These changes
(currently a no-op) open the way for a push-based notification model to
the Http::ConnectionManagerImpl class (or any other interested party).
Aloowing a move _away_ from the pull-based model that currently exists,
which causes issues such as envoyproxy#14350.

The primary change here is to introduce the idea of a "drain tree" which
allows the drain-manager to be structured with parent and child
managers. Each manager may drain independent of it's parent and all
drain actions cascade downward through the tree. The parent <-> child
relationship is thread-safe.

In addition, drain events can now be registered via callbacks to be
invoked when draining starts. This is a non-thread-safe operation.

A new thread-safe callback manager utility has been written to support
the parent <-> child drain-manager relationship.

Additional test coverage has been added for the drain-manager and
thread-safe callback-manager, as well as updates to necessary mock
classes.

Because the callback registration impacts the DrainDecision interface,
necessary no-op implementations have been addedto various classes
implementing this interface. The final versions will be wired up with
correct behavior in subsequent PRs that take advantage of the new
functionality.

Signed-off-by: John Murray <murray@stripe.com>
Signed-off-by: John Murray <murray@stripe.com>
@murray-stripe
Copy link
Copy Markdown
Contributor Author

startDrainSequence can only be called once on each DrainManagerImpl as enforced by ASSERTs. It seems to me that PerFilterChainFactoryContextImpl and InstanceImpl::drainListeners() can both invoke startDrainSequence, which would result in startDrainSequence being called twice on the same object: once via child relations on the global drain impl owned by the server instance and again directly via the filter chain manager.

This was a super good callout. I had anticipated for double-calling in the order of 1. direct, 2. parent; but not the other way around. I have updated the code to allow for multiple invocations and simply handling multiple on-drain-complete callbacks. I've added additional test cases to validate this behavior.

@murray-stripe
Copy link
Copy Markdown
Contributor Author

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #17026 (comment) was created by @murray-stripe.

see: more, trace.

@murray-stripe
Copy link
Copy Markdown
Contributor Author

The MacOS tests keep being cancelled (I'm guessing it's timing out), but are otherwise passing up until that point. This is ready for review.

Signed-off-by: John Murray <murray@stripe.com>
@murray-stripe
Copy link
Copy Markdown
Contributor Author

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #17026 (comment) was created by @murray-stripe.

see: more, trace.

@murray-stripe
Copy link
Copy Markdown
Contributor Author

@antoniovicente I have incorporated your feedback and this is ready for review.

Copy link
Copy Markdown
Contributor

@antoniovicente antoniovicente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there.

/wait

Signed-off-by: John Murray <murray@stripe.com>
@murray-stripe
Copy link
Copy Markdown
Contributor Author

@antoniovicente I've addressed your feedback. Really liking how things have simplified. 😄

antoniovicente
antoniovicente previously approved these changes Jun 22, 2021
Copy link
Copy Markdown
Contributor

@antoniovicente antoniovicente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks great. Thanks for this refactor in preparation for H2 drain propagation.

/assign-from @envoyproxy/senior-maintainers for a final approval.

@repokitteh-read-only
Copy link
Copy Markdown

@envoyproxy/senior-maintainers assignee is @zuercher
for assignee is @None
a assignee is @None
final assignee is @None
approval. assignee is @None

🐱

Caused by: a #17026 (review) was submitted by @antoniovicente.

see: more, trace.

@antoniovicente
Copy link
Copy Markdown
Contributor

retrying macos timeout

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #17026 (comment) was created by @antoniovicente.

see: more, trace.

@murray-stripe
Copy link
Copy Markdown
Contributor Author

macos still timing out (seems to be the case most of the time)

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #17026 (comment) was created by @murray-stripe.

see: more, trace.

zuercher
zuercher previously approved these changes Jun 23, 2021
Copy link
Copy Markdown
Member

@zuercher zuercher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I found some small issues, in comments and had one question about a test. None of these are blocking.

Signed-off-by: John Murray <murray@stripe.com>
@murray-stripe murray-stripe dismissed stale reviews from zuercher and antoniovicente via 2de1367 June 23, 2021 22:09
@murray-stripe
Copy link
Copy Markdown
Contributor Author

@zuercher Thank you for the review! I have addressed your feedback.

@murray-stripe
Copy link
Copy Markdown
Contributor Author

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #17026 (comment) was created by @murray-stripe.

see: more, trace.

@murray-stripe
Copy link
Copy Markdown
Contributor Author

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Check envoy-presubmit isn't fully completed, but will still attempt retrying.
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #17026 (comment) was created by @murray-stripe.

see: more, trace.

@murray-stripe
Copy link
Copy Markdown
Contributor Author

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #17026 (comment) was created by @murray-stripe.

see: more, trace.

@murray-stripe
Copy link
Copy Markdown
Contributor Author

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #17026 (comment) was created by @murray-stripe.

see: more, trace.

@murray-stripe
Copy link
Copy Markdown
Contributor Author

gah, keep getting some failures in the coverage build for //test/extensions/filters/http/kill_request:kill_request_filter_integration_test. Seems to be pretty flakey. will give one more re-test and then merge in main if one more doesn't work.

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #17026 (comment) was created by @murray-stripe.

see: more, trace.

@murray-stripe
Copy link
Copy Markdown
Contributor Author

@zuercher This is ready for your review/merge again.

Copy link
Copy Markdown
Member

@zuercher zuercher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@zuercher zuercher merged commit 96dd735 into envoyproxy:main Jun 28, 2021
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very cool stuff. A few post-submit drive by comments for your next PR. Thank you @murray-stripe!

@alyssawilk
Copy link
Copy Markdown
Contributor

Hey @murray-stripe could you take a look at
https://storage.googleapis.com/envoy-postsubmit/main/coverage/source/common/common/callback_impl.cc.gcov.html and do a follow-up with some extra unit testing? I think this PR has caused some flakes with falling under our coverage bar. I'm addressing with #17190 but it'd be great to add another test and bump the number back up.

Thanks!

chrisxrepo pushed a commit to chrisxrepo/envoy that referenced this pull request Jul 8, 2021
The initial set of changes (broken up to reduce PR/change size) to
enable proactive GOAWAY signals on drain events. These changes
(currently a no-op) open the way for a push-based notification model to
the Http::ConnectionManagerImpl class (or any other interested party).
Aloowing a move _away_ from the pull-based model that currently exists,
which causes issues such as envoyproxy#14350.

The primary change here is to introduce the idea of a "drain tree" which
allows the drain-manager to be structured with parent and child
managers. Each manager may drain independent of it's parent and all
drain actions cascade downward through the tree. The parent <-> child
relationship is thread-safe.

In addition, drain events can now be registered via callbacks to be
invoked when draining starts. This is a non-thread-safe operation.

A new thread-safe callback manager utility has been written to support
the parent <-> child drain-manager relationship.

Additional test coverage has been added for the drain-manager and
thread-safe callback-manager, as well as updates to necessary mock
classes.

Because the callback registration impacts the DrainDecision interface,
necessary no-op implementations have been addedto various classes
implementing this interface. The final versions will be wired up with
correct behavior in subsequent PRs that take advantage of the new
functionality.

Risk Level: Low. Changes included should be no-op.
Testing: Additional test coverage included.
Docs Changes: n/a
Release Notes: n/a
Signed-off-by: John Murray <murray@stripe.com>
Signed-off-by: chris.xin <xinchuantao@qq.com>
leyao-daily pushed a commit to leyao-daily/envoy that referenced this pull request Sep 30, 2021
The initial set of changes (broken up to reduce PR/change size) to
enable proactive GOAWAY signals on drain events. These changes
(currently a no-op) open the way for a push-based notification model to
the Http::ConnectionManagerImpl class (or any other interested party).
Aloowing a move _away_ from the pull-based model that currently exists,
which causes issues such as envoyproxy#14350.

The primary change here is to introduce the idea of a "drain tree" which
allows the drain-manager to be structured with parent and child
managers. Each manager may drain independent of it's parent and all
drain actions cascade downward through the tree. The parent <-> child
relationship is thread-safe.

In addition, drain events can now be registered via callbacks to be
invoked when draining starts. This is a non-thread-safe operation.

A new thread-safe callback manager utility has been written to support
the parent <-> child drain-manager relationship.

Additional test coverage has been added for the drain-manager and
thread-safe callback-manager, as well as updates to necessary mock
classes.

Because the callback registration impacts the DrainDecision interface,
necessary no-op implementations have been addedto various classes
implementing this interface. The final versions will be wired up with
correct behavior in subsequent PRs that take advantage of the new
functionality.

Risk Level: Low. Changes included should be no-op.
Testing: Additional test coverage included.
Docs Changes: n/a
Release Notes: n/a
Signed-off-by: John Murray <murray@stripe.com>
alyssawilk added a commit to alyssawilk/envoy that referenced this pull request Dec 2, 2024
…roxy#17026)"

This reverts commit 96dd735.

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
alyssawilk added a commit to alyssawilk/envoy that referenced this pull request Dec 2, 2024
…roxy#17026)"

This reverts commit 96dd735.

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
alyssawilk added a commit to alyssawilk/envoy that referenced this pull request Dec 2, 2024
…roxy#17026)"

This reverts commit 96dd735.

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
alyssawilk added a commit to alyssawilk/envoy that referenced this pull request Dec 3, 2024
…roxy#17026)"

This reverts commit 96dd735.

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
alyssawilk added a commit that referenced this pull request Dec 9, 2024
…reamble (#17026)" (#37465)

This reverts commit 96dd735.


96dd735
was a precursor to
#16201
which never landed.

Unwinding the change
envoyproxy/envoy-mobile#176

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
yanavlasov added a commit to yanavlasov/envoy that referenced this pull request Feb 6, 2025
…rain - Preamble (envoyproxy#17026)" (envoyproxy#37465)"

This reverts commit 072831e.

Signed-off-by: Yan Avlasov <yavlasov@google.com>
yanavlasov added a commit that referenced this pull request Feb 11, 2025
…rain - Preamble (#17026)" (#37465)" (#38333)

Commit Message:
This reverts commit 072831e.

Revert in PR-37465 created problems for boundary proxy for Google
partner cloud, as it depends on the reverted internal API
`addOnDrainCloseCb`. Boundary proxy is open source and Google cloud
partners will be unable to build OSS version, since it depends on OSS
Envoy.

Putting back the reverted code for now, as it is unclear what exactly
can be removed from Envoy at this point.

Risk Level: Low
Testing: Unit tests
Docs Changes: N/A
Release Notes: N/A
Platform Specific Features: N/A

---------

Signed-off-by: Yan Avlasov <yavlasov@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants