Skip to content

http/2: better handling for messaging error resets#3140

Merged
mattklein123 merged 6 commits intomasterfrom
h2_stat
Apr 24, 2018
Merged

http/2: better handling for messaging error resets#3140
mattklein123 merged 6 commits intomasterfrom
h2_stat

Conversation

@mattklein123
Copy link
Copy Markdown
Member

This patch stems from an operational issue at Lyft that took a really long
time to debug. The issue was a client sending a content-length along with a
204 response which is against spec and nghttp2 does not like.

There are a few different changes here:

  1. Treat all messaging errors as stream level errors vs. connection level
    errors.
  2. Increment a specific stat so its easier to debug these cases.
  3. Charge resets that stem from these issues as local resets vs. remote
    resets which is what was happening previously.

Risk Level: Low/Medium (I think the risk is low, but changing this code is always
a bit scary).
Testing: Additional/fixed unit tests.
Docs Changes: New stat added.
Release Notes: N/A

This patch stems from an operational issue at Lyft that took a really long
time to debug. The issue was a client sending a content-length along with a
204 response which is against spec and nghttp2 does not like.

There are a few different changes here:
1) Treat all messaging errors as stream level errors vs. connection level
   errors.
2) Increment a specific stat so its easier to debug these cases.
3) Charge resets that stem from these issues as local resets vs. remote
   resets which is what was happening previously.

Signed-off-by: Matt Klein <mklein@lyft.com>
@mattklein123
Copy link
Copy Markdown
Member Author

@alyssawilk PTAL when you have time. Not urgent.

Copy link
Copy Markdown
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+100 to not always killing the connection on bad individual requests. I'm a bit fuzzy on the details so mostly questions here.

if (error_code == NGHTTP2_ERR_HTTP_HEADER) {
// The stream is about to be closed due to an invalid header or messaging. Don't kill the
// entire connection if one stream has bad headers or messaging.
if (error_code == NGHTTP2_ERR_HTTP_HEADER || error_code == NGHTTP2_ERR_HTTP_MESSAGING) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does nghttp2 save us from incomplete hpack state? If the stream is closed based on an error in an initial headers frame but there's a contination, does it do the right thing or can we get wedged? Worth adding a test for multi-frame request headers?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fairly certain it will do the right thing. Let me see about adding a test case for that. I'm not sure how hard it will be as nghttp2 generally hides continuation frames from us.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a really large header frames should work, worst case.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTTP semantics error occurs after successful HPACK decoding and its state is completely fine even with those errors.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to hear, thank you for chiming in.

Curious, does that mean if the request headers are broken into multiple frames, we'll not send the stream cancellation until processing all of the headers, or we kill the stream off right away but continue to update HPACK state for internal consistency?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Unless we get unrecoverable error during HPACK decoding, we will feed incoming bits into HPACK even if its output is just thrown away.

? StreamResetReason::RemoteRefusedStreamReset
: StreamResetReason::RemoteReset);
StreamResetReason reason;
if (stream->reset_due_to_messaging_error_) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://nghttp2.org/documentation/types.html#c.nghttp2_on_stream_close_callback indicates that error_code is generally nghttp2_error_code here.
Are there cases where we'll get onInvalidFrame(NGHTTP2_ERR_HTTP_MESSAGING) and onStreamClose(SomeOtherError)? I'm not terribly familiar with the internals of NGHTTP2_ERR_HTTP_MESSAGING but I wouldn't have assumed all messaging errors triggered invalidFrame so I would assume setting reset_due_to_messaging_error_ where we reset the stream would more often be correct.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this is the issue that I was commenting on re: gymnastics of doing this change. By the time we get to onStreamClose(), we only get one of these errors: https://nghttp2.org/documentation/enums.html#c.nghttp2_error_code which are RFC wire errors. So all lib errors (https://nghttp2.org/documentation/enums.html#c.nghttp2_error) have been collapsed into a generic error. It would be nice if nghttp2 also gave us the lib error that caused the wire error but I don't think there is any way to get that. What I did here is I think the "cleanest" way of moving from lib error to reset reason. AFAIK, all messaging errors will trigger invalid frame, because a messaging error came from some frame (headers, data, etc.) so that frame will be rejected before it is dispatched.

@tatsuhiro-t in case you have a few moments to review this change, is there any cleaner way of accomplishing what I'm trying to do?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh, I was conflating nghttp2_error and nghttp2_error_code

Totally PEBKAC but I'm going to claim it's an understandable mistake :-P

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR is best thing I can think of if you want to treat HTTP messaging error differently.

alyssawilk
alyssawilk previously approved these changes Apr 23, 2018
Copy link
Copy Markdown
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(though I still wouldn't mind the unit test being added just in case =P)

Signed-off-by: Matt Klein <mklein@lyft.com>
Signed-off-by: Matt Klein <mklein@lyft.com>
Signed-off-by: Matt Klein <mklein@lyft.com>
Signed-off-by: Matt Klein <mklein@lyft.com>
@mattklein123
Copy link
Copy Markdown
Member Author

@alyssawilk I modified one of the tests to force a continuation frame. PTAL.

Signed-off-by: Matt Klein <mklein@lyft.com>
Copy link
Copy Markdown
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thank you for doing the extra work there :-)

@mattklein123 mattklein123 merged commit e293ffb into master Apr 24, 2018
@mattklein123 mattklein123 deleted the h2_stat branch April 24, 2018 15:54
mattklein123 added a commit that referenced this pull request Apr 25, 2018
This reverts commit e293ffb.

Signed-off-by: Matt Klein <mklein@lyft.com>
mattklein123 added a commit that referenced this pull request Apr 25, 2018
…3213)

This reverts commit e293ffb.

Signed-off-by: Matt Klein <mklein@lyft.com>
mattklein123 added a commit that referenced this pull request Apr 26, 2018
…3140)" (#3213)"

This reverts commit 3b4badd.

Signed-off-by: Matt Klein <mklein@lyft.com>
mattklein123 added a commit that referenced this pull request Apr 26, 2018
…3140)" (#3213)" (#3233)

This reverts commit 3b4badd.

Signed-off-by: Matt Klein <mklein@lyft.com>
ramaraochavali pushed a commit to ramaraochavali/envoy that referenced this pull request May 3, 2018
This patch stems from an operational issue at Lyft that took a really long
time to debug. The issue was a client sending a content-length along with a
204 response which is against spec and nghttp2 does not like.

There are a few different changes here:
1) Treat all messaging errors as stream level errors vs. connection level
   errors.
2) Increment a specific stat so its easier to debug these cases.
3) Charge resets that stem from these issues as local resets vs. remote
   resets which is what was happening previously.

Signed-off-by: Matt Klein <mklein@lyft.com>

Signed-off-by: Rama <rama.rao@salesforce.com>
ramaraochavali pushed a commit to ramaraochavali/envoy that referenced this pull request May 3, 2018
…y#3140)" (envoyproxy#3213)

This reverts commit e293ffb.

Signed-off-by: Matt Klein <mklein@lyft.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
ramaraochavali pushed a commit to ramaraochavali/envoy that referenced this pull request May 3, 2018
…nvoyproxy#3140)" (envoyproxy#3213)" (envoyproxy#3233)

This reverts commit 3b4badd.

Signed-off-by: Matt Klein <mklein@lyft.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants