Skip to content

http: send GOAWAY for H2 downstreams for envoy.overload_actions.disable_http_keepalive#12498

Merged
mattklein123 merged 12 commits intoenvoyproxy:masterfrom
akonradi:http2-no-keepalive
Aug 26, 2020
Merged

http: send GOAWAY for H2 downstreams for envoy.overload_actions.disable_http_keepalive#12498
mattklein123 merged 12 commits intoenvoyproxy:masterfrom
akonradi:http2-no-keepalive

Conversation

@akonradi
Copy link
Copy Markdown
Contributor

@akonradi akonradi commented Aug 5, 2020

Commit Message: Send H2 GOAWAY for disable keepalive overload action
Additional Description:
When the overload manager is configured with the "disable keepalive" action, if the client is speaking H2, send a GOAWAY frame when the trigger state is active. This can be disabled via runtime override.

Risk Level: low
Testing: added unit tests, ran integration tests
Docs Changes: updated overload action description
Release Notes: http: changed Envoy to send GOAWAY to HTTP2 downstreams when the disable_keepalive overload action is active. This behavior may be temporarily reverted by setting envoy.reloadable_features.connection_manager_drain_http2 to false.
Runtime guard: set envoy.reloadable_features.connection_manager_drain_http2 to false to disable the new behavior

When the overload manager is configured with the "disable keepalive"
action, if the client is speaking H2, send a GOAWAY frame when the
trigger state is active. This is guarded behind a runtime configuration
flag.

Signed-off-by: Alex Konradi <akonradi@google.com>
Signed-off-by: Alex Konradi <akonradi@google.com>
@ggreenway
Copy link
Copy Markdown
Member

I don't know the nuances of h2-goaway well enough to say if this is right or not. Reassigning to Matt.

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this. A few comments. Thank you!

/wait

* decompressor: headers-only requests were incorrectly not advertising accept-encoding when configured to do so. This is now fixed.
* http: added :ref:`headers_to_add <envoy_v3_api_field_extensions.filters.network.http_connection_manager.v3.ResponseMapper.headers_to_add>` to :ref:`local reply mapper <config_http_conn_man_local_reply>` to allow its users to add/append/override response HTTP headers to local replies.
* http: added HCM level configuration of :ref:`error handling on invalid messaging <envoy_v3_api_field_extensions.filters.network.http_connection_manager.v3.HttpConnectionManager.stream_error_on_invalid_http_message>` which substantially changes Envoy's behavior when encountering invalid HTTP/1.1 defaulting to closing the connection instead of allowing reuse. This can temporarily be reverted by setting `envoy.reloadable_features.hcm_stream_error_on_invalid_message` to false, or permanently reverted by setting the :ref:`HCM option <envoy_v3_api_field_extensions.filters.network.http_connection_manager.v3.HttpConnectionManager.stream_error_on_invalid_http_message>` to true to restore prior HTTP/1.1 beavior and setting the *new* HTTP/2 configuration :ref:`override_stream_error_on_invalid_http_message <envoy_v3_api_field_config.core.v3.Http2ProtocolOptions.override_stream_error_on_invalid_http_message>` to false to retain prior HTTP/2 behavior.
* http: changed Envoy to send GOAWAY to HTTP2 downstreams when the disable_keepalive overload action is active. This behavior may be temporarily reverted by setting `envoy.reloadable_features.connection_manager_drain_http2` to false.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you ref link to the option?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a runtime flag. Is there a place to link it to?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I meant disable_keepalive overload action. Can we reflink that somewhere useful?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we name this feature flag something more specific to overload manager? We already do drain HTTP/2 connections.

"envoy.reloadable_features.connection_manager_drain_http2")) {
// For H2, send a GOAWAY frame to tell the client that no further requests will be accepted
// on this connection.
connection_manager_.codec_->goAway();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you probably want to call startDrainSequence() here? This will do the 2 stage goaway and eventually close the connection?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

startDrainSequence() asserts that the drain state is NotDraining, but on line 1884, we check that the connection is already draining, so startDrainSequence()'s ASSERT will be tripped.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did the connection enter draining in the first place? We would have already started the H2 drain sequence. I'm actually not sure why this change is needed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answering based on the re-jiggered implementation. This change ensures that a GOAWAY frame is sent to H2 clients when the proxy wants to stop accepting connections from clients, as opposed to waiting longer and giving them time to start new requests.

Instead of sending send an ad-hoc GOAWAY, use the existing
startDrainSequence method to force a drain either immediately or on a
delay.

Signed-off-by: Alex Konradi <akonradi@google.com>
Signed-off-by: Alex Konradi <akonradi@google.com>
Verify that if the disable keepalive action is active, HTTP/2 and HTTP/3
connections receive GOAWAY frames since the server wants to reject new
streams.

Signed-off-by: Alex Konradi <akonradi@google.com>
Signed-off-by: Alex Konradi <akonradi@google.com>
@akonradi
Copy link
Copy Markdown
Contributor Author

I've reworked this to use the exising startDrainSequence() method instead of sneakily sending a GOAWAY frame on the connection. I had to update some quic code since quiche handles GOAWAY differently for HTTP/2 and HTTP/3 and added tests for that too.

@yanavlasov yanavlasov self-assigned this Aug 14, 2020
Signed-off-by: Alex Konradi <akonradi@google.com>
Signed-off-by: Alex Konradi <akonradi@google.com>
@akonradi
Copy link
Copy Markdown
Contributor Author

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines, to retry CircleCI checks, use /retest-circle.
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #12498 (comment) was created by @akonradi.

see: more, trace.

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the code more consistent. Flushing a few more comments.

/wait

* decompressor: headers-only requests were incorrectly not advertising accept-encoding when configured to do so. This is now fixed.
* http: added :ref:`headers_to_add <envoy_v3_api_field_extensions.filters.network.http_connection_manager.v3.ResponseMapper.headers_to_add>` to :ref:`local reply mapper <config_http_conn_man_local_reply>` to allow its users to add/append/override response HTTP headers to local replies.
* http: added HCM level configuration of :ref:`error handling on invalid messaging <envoy_v3_api_field_extensions.filters.network.http_connection_manager.v3.HttpConnectionManager.stream_error_on_invalid_http_message>` which substantially changes Envoy's behavior when encountering invalid HTTP/1.1 defaulting to closing the connection instead of allowing reuse. This can temporarily be reverted by setting `envoy.reloadable_features.hcm_stream_error_on_invalid_message` to false, or permanently reverted by setting the :ref:`HCM option <envoy_v3_api_field_extensions.filters.network.http_connection_manager.v3.HttpConnectionManager.stream_error_on_invalid_http_message>` to true to restore prior HTTP/1.1 beavior and setting the *new* HTTP/2 configuration :ref:`override_stream_error_on_invalid_http_message <envoy_v3_api_field_config.core.v3.Http2ProtocolOptions.override_stream_error_on_invalid_http_message>` to false to retain prior HTTP/2 behavior.
* http: changed Envoy to send GOAWAY to HTTP2 downstreams when the disable_keepalive overload action is active. This behavior may be temporarily reverted by setting `envoy.reloadable_features.connection_manager_drain_http2` to false.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I meant disable_keepalive overload action. Can we reflink that somewhere useful?

drain_timer_ = read_callbacks_->connection().dispatcher().createTimer(
[this]() -> void { onDrainTimeout(); });
drain_timer_->enableTimer(config_.drainTimeout());
if (wait_for_timeout) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we not do the timer in all cases? This timer is designed to avoid races in which the client races with the goaway and then has the streams reset. Given that we will need to wait until all existing streams drain anyway I don't see any reason to not do the normal drain sequence here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal of this PR is to make H2 clients more responsive to overload conditions. We actually want to stop the clients from making any more requests, even if that means racing with incoming requests. If we wait for the timer, then there is a lag between when the proxy is overloaded and when it actually takes action.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default drain timer is I think 5s, and in general we try to make things graceful, similar to what happens with H1. I'm very skeptical that it's worth the extra code complexity and non-graceful behavior for this special case when the default is only around 5s?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I've switched this to drain in the normal way, sending the shutdown notice first and then the GOAWAY after the delay.

TestScopedRuntime runtime_;
};

TEST_P(DrainH2HttpConnectionManagerImplTest, DisableHttp2KeepAliveWhenOverloaded) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a '//' comment above the test with a summary of what it tests? Also, I think we should have a test that covers when there are multiple in flight streams. Even better if there were an integration test somehow.

Avoid races by sending the shutdown notice GOAWAY frame before the real
one.

Signed-off-by: Alex Konradi <akonradi@google.com>
The tests were expecting a GOAWAY frame, but Envoy's QUIC codec doesn't
send those on shutdown notification. Instead, wait for the connection to
be closed, and check for the presence of a GOAWAY frame after.

Signed-off-by: Alex Konradi <akonradi@google.com>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks looks great. One small comment and we can ship!

/wait

* decompressor: headers-only requests were incorrectly not advertising accept-encoding when configured to do so. This is now fixed.
* http: added :ref:`headers_to_add <envoy_v3_api_field_extensions.filters.network.http_connection_manager.v3.ResponseMapper.headers_to_add>` to :ref:`local reply mapper <config_http_conn_man_local_reply>` to allow its users to add/append/override response HTTP headers to local replies.
* http: added HCM level configuration of :ref:`error handling on invalid messaging <envoy_v3_api_field_extensions.filters.network.http_connection_manager.v3.HttpConnectionManager.stream_error_on_invalid_http_message>` which substantially changes Envoy's behavior when encountering invalid HTTP/1.1 defaulting to closing the connection instead of allowing reuse. This can temporarily be reverted by setting `envoy.reloadable_features.hcm_stream_error_on_invalid_message` to false, or permanently reverted by setting the :ref:`HCM option <envoy_v3_api_field_extensions.filters.network.http_connection_manager.v3.HttpConnectionManager.stream_error_on_invalid_http_message>` to true to restore prior HTTP/1.1 beavior and setting the *new* HTTP/2 configuration :ref:`override_stream_error_on_invalid_http_message <envoy_v3_api_field_config.core.v3.Http2ProtocolOptions.override_stream_error_on_invalid_http_message>` to false to retain prior HTTP/2 behavior.
* http: changed Envoy to send GOAWAY to HTTP2 downstreams when the disable_keepalive overload action is active. This behavior may be temporarily reverted by setting `envoy.reloadable_features.connection_manager_drain_http2` to false.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we name this feature flag something more specific to overload manager? We already do drain HTTP/2 connections.

Signed-off-by: Alex Konradi <akonradi@google.com>
Signed-off-by: Alex Konradi <akonradi@google.com>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thansk!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants