Skip to content

http: stream error on invalid messaging#11748

Merged
alyssawilk merged 29 commits intoenvoyproxy:masterfrom
alyssawilk:stream_error
Jul 16, 2020
Merged

http: stream error on invalid messaging#11748
alyssawilk merged 29 commits intoenvoyproxy:masterfrom
alyssawilk:stream_error

Conversation

@alyssawilk
Copy link
Copy Markdown
Contributor

This unifies HTTP/1.1 and HTTP/2 stream error on invalid messaging. Previously HTTP/1.1 defaulted permissive and HTTP/2 defaulted to strict. This defaults both to strict, resetting connections on invalid requests. This will have a major latency impact if downstream is sending a mix of valid and invalid requests over HTTP/1.1

Additional Description:
This change is runtime guarded per default behavioral change rules. It can also be reverted by setting the default to permissive (for prior HTTP/1 behvior) then overriding HTTP/2 to struct (for prior HTTP/2 behavior).

This works in conjunction with #11714, as the HTTP connection manager enforces the strictness, so the responses need to be sent via the HTTP connection manager to have strictness applied correctly.

Risk Level: High (HCM changes)
Testing: new unit tests, updated integration tests
Docs Changes: n/a
Release Notes: inline
Runtime guard: envoy.reloadable_features.hcm_stream_error_on_invalid_message
Fixes #9846

@repokitteh-read-only
Copy link
Copy Markdown

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/envoy/.
CC @envoyproxy/api-watchers: FYI only for changes made to api/envoy/.

🐱

Caused by: #11748 was opened by alyssawilk.

see: more, trace.

@alyssawilk
Copy link
Copy Markdown
Contributor Author

This fully cleaned up and tested, and marked as draft because I'm not sure when we're cutting the release. If we're cutting early next week I would prefer to land only the HTTP/2 field changes (to start the Envoy deprecation / warning clock) and will defer merging this until after the release. If we're cutting in 2 weeks I don't think it makes sense to defer.

Either way I'd like to discuss the API now, so I can land whatever subsection of it makes more sense.

cc @antoniovicente

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@alyssawilk
Copy link
Copy Markdown
Contributor Author

coverage failure appears unrelated

Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API LGTM, just some minor comments.

// <envoy_v3_api_field_extensions.filters.network.http_connection_manager.v3.HttpConnectionManager.upgrade_configs>`
// iff present.
//
// This is deprecated in favor of stream_error_on_invalid_http_message.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you :ref: link to the field below? Given how similar they are name wise, it would be good to have zero ambiguity on which one is being referred to.

//
// See `RFC7540, sec. 8.1 <https://tools.ietf.org/html/rfc7540#section-8.1>`_ for details.
bool stream_error_on_invalid_http_messaging = 12;
google.protobuf.BoolValue stream_error_on_invalid_http_message = 14;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about stream_error_on_invalid_http_messaging_hcm_override? That's very very long, but I worry about the similarity in names between new and deprecated.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree on name similarity but I'd mildly prefer a clean name in the long run. @mattklein123 WDYT?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meet half way on stream_error_on_invalid_http_messaging_override? :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, deal, modulo messaging -> message for consistency with the long term HCM value.

Alternately would you also be OK with override_stream_error_on_invalid_http_message? I think that reads a bit better but it's arguably easier to overlook the difference so I'm fine if you prefer stream_error_on_invalid_http_message_override

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, either is fine.

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@mattklein123
Copy link
Copy Markdown
Member

At a high level this LGTM. What is your plan with this change? Do you want to land after the next release or before? Just wondering what will take it out of draft.

/wait-any

@alyssawilk
Copy link
Copy Markdown
Contributor Author

I basically didn't want it landing too close to #11714 or too close to cutting the release.

Given master is frozen and most folks are out Thursday and Friday, I assume we're not cutting the release until next week and I'm cool landing it this week. I'd say if it doesn't land by Wednesday I'd split out the HTTP/2 bits and land them before release, and land the HCM and behavioral changes after.

@alyssawilk alyssawilk marked this pull request as ready for review June 29, 2020 20:31
@alyssawilk alyssawilk requested a review from mattklein123 as a code owner June 29, 2020 20:31
@htuch
Copy link
Copy Markdown
Member

htuch commented Jun 29, 2020

/lgtm api

@mattklein123
Copy link
Copy Markdown
Member

Looks like legit CI issue.

/wait

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
// 2020/05/13 10531 44425 44600 Refactor resource manager
// 2020/05/20 11223 44491 44600 Add primary clusters tracking to cluster manager.
// 2020/06/10 11561 44491 44811 Make upstreams pluggable
// 2020/04/23 10661 44425 46000 per-listener connection limits
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

latest CI failure was this test again (second time this PR has failed CI only due to stats numbers)
I'm really not sure what to do about the per-listener limit, which had a size and date from the setec upstream PR which were both out of kilter with actual dates and sizes so I just removed it.

I still question the value of this test being worth the toil, especially if it's going to be confusing and inaccurate for things like backports :-/

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@alyssawilk
Copy link
Copy Markdown
Contributor Author

This should be ready for another pass.

snowp
snowp previously requested changes Jul 15, 2020
ENVOY_STREAM_LOG(debug, "drain closing connection", *this);
}

// The BadRequest error code indicates there has been a messaging error.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just so I understand: does this code path not get hit if the upstream returns BadRequest?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does. I can change it so only Envoy invalid messaging triggers closes if we prefer that.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've definitely seen people use BadRequest to indicate a malformed message body (ie invalid JSON key or whatever) out in the wild, so it might make sense to only close the connection when we know that it actually was related to invalid HTTP messages?

I'm not sure what the right thing to do here tbh, not clear what the scope of "HTTP messaging" is

bool hcm_stream_error_set,
const Protobuf::BoolValue& hcm_stream_error) {
auto ret = initializeAndValidateOptions(options);
if (Runtime::runtimeFeatureEnabled(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, that makes sense

MOCK_METHOD(Stats::Scope&, getRootScope, ());

testing::NiceMock<MockSnapshot> snapshot_;
SnapshotConstSharedPtr snapshot_ptr_;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not used? was this supposed to be used with the default ON_CALL in mocks.cc?

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@htuch
Copy link
Copy Markdown
Member

htuch commented Jul 15, 2020

/lgtm api

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Can you add tests that cover the kill switch? (Unless I missed them.)

/wait


**In summary, if you run level two Envoy version 1.11.1 or greater which terminates
HTTP/2, we strongly advise you to change the HTTP/2 configuration of your level
HTTP/2, we strongly advise you to change the HCM configuration of your level
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit I would spell out HCM since I'm not sure users will know what this means (same elsewhere if applicable)

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@alyssawilk
Copy link
Copy Markdown
Contributor Author

Given the delta between this and the last commit is a comment-only change and macos build was green, I'd be inclined to merge over the macos flake especially as CI has been wedged 2 hours. WDYT?

@mattklein123
Copy link
Copy Markdown
Member

@alyssawilk what about a test for the kill switch? Did I miss that?

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@alyssawilk
Copy link
Copy Markdown
Contributor Author

alyssawilk commented Jul 16, 2020 via email

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@alyssawilk alyssawilk merged commit 88dcb29 into envoyproxy:master Jul 16, 2020
KBaichoo pushed a commit to KBaichoo/envoy that referenced this pull request Jul 30, 2020
This unifies HTTP/1.1 and HTTP/2 stream error on invalid messaging. Previously HTTP/1.1 defaulted permissive and HTTP/2 defaulted to strict. This defaults both to strict, resetting connections on invalid requests. This will have a major latency impact if downstream is sending a mix of valid and invalid requests over HTTP/1.1

Additional Description:
This change is runtime guarded per default behavioral change rules. It can also be reverted by setting the default to permissive (for prior HTTP/1 behvior) then overriding HTTP/2 to struct (for prior HTTP/2 behavior).

This works in conjunction with envoyproxy#11714, as the HTTP connection manager enforces the strictness, so the responses need to be sent via the HTTP connection manager to have strictness applied correctly.

Risk Level: High (HCM changes)
Testing: new unit tests, updated integration tests
Docs Changes: n/a
Release Notes: inline
Runtime guard: envoy.reloadable_features.hcm_stream_error_on_invalid_message
Fixes envoyproxy#9846

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Kevin Baichoo <kbaichoo@google.com>
scheler pushed a commit to scheler/envoy that referenced this pull request Aug 4, 2020
This unifies HTTP/1.1 and HTTP/2 stream error on invalid messaging. Previously HTTP/1.1 defaulted permissive and HTTP/2 defaulted to strict. This defaults both to strict, resetting connections on invalid requests. This will have a major latency impact if downstream is sending a mix of valid and invalid requests over HTTP/1.1

Additional Description:
This change is runtime guarded per default behavioral change rules. It can also be reverted by setting the default to permissive (for prior HTTP/1 behvior) then overriding HTTP/2 to struct (for prior HTTP/2 behavior).

This works in conjunction with envoyproxy#11714, as the HTTP connection manager enforces the strictness, so the responses need to be sent via the HTTP connection manager to have strictness applied correctly.

Risk Level: High (HCM changes)
Testing: new unit tests, updated integration tests
Docs Changes: n/a
Release Notes: inline
Runtime guard: envoy.reloadable_features.hcm_stream_error_on_invalid_message
Fixes envoyproxy#9846

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: scheler <santosh.cheler@appdynamics.com>
@alyssawilk alyssawilk deleted the stream_error branch October 26, 2020 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

http: require stream_error_on_invalid_http_messaging to be set for every HCM

4 participants