http: fixing a bug in common continue by alyssawilk · Pull Request #7708 · envoyproxy/envoy

alyssawilk · 2019-07-24T14:35:39Z

Risk Level: High (change to HCM)
Testing: new integration test, with header
Docs Changes: n/a
Release Notes: n/a
Fixes #7595

alyssawilk · 2019-07-24T14:36:03Z

I don't think this is the cleanest fix, but I do think it's the safest fix.
Happy to do a higher risk (maybe runtime protected?) clean up refactor as a follow-up if you have ideas

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

alyssawilk · 2019-07-24T17:14:30Z

OK, cleaned up a bit - there was an HCM test exposing an issue with buffered data and now I have a confusing but hopefully sufficiently well explained workaround for that.

mattklein123

Thanks for the quick fix and digging into this. I have one question as this is complicated stuff.

mattklein123 · 2019-07-24T18:14:47Z

source/common/http/conn_manager_impl.cc

+  //    if the fin went with headers but then data or metadata was added (so there is now buffered
+  //    data) the fin should still be sent with the buffered data.
+  const bool end_stream_with_data =
+      complete() && !trailers() && (!end_stream_with_headers_ || bufferedData());


Probably stupid question: Instead of adding a new end_stream_with_headers_ variable can't we just check end_stream_? I think this would cover all the cases in which we either already set end_stream_ or we have buffered data to send? cc @soya3129 and @snowp for thoughts.

Not as the filter chain currently operates - that's the bug in the current implementation.
If we have a header only stream (end stream is true) and it calls stop iteration, when we call common continue to resume processing we see if there's more work to do for this filter, at which point since end stream is true, we'd send an empty data frame with end stream, confusing things. There's no way to differentiate between "end stream with a fin already sent" and "end stream but we need to send a fin" at this point (though we could theoretically eagerly create the buffer on decodeHeaders returning stop iteration if you'd prefer)

We also can't check end stream at the beginning of commonContinue because when we addDecodedData (or recently when we add metadata) we don't set end stream false. That's one of the high risk refactors I'd be willing to (carefully, probably guarding) tackle as a follow up.

I'm probably missing something but I think checking complete() is different from end_stream_? I'm just looking at the code changes above, and it seems like in both of those cases, checking end_stream_ here along with whether there is buffered data would cover those cases?

So if I swap end_stream_with_data to end_stream_ here, I get a segfault in the new regression test, which I believe is reproing the associated bug

[ RUN ] IpVersions/Http2IntegrationTest.PauseAndResumeHeadersOnly/IPv4
[2019-07-24 20:42:06.590][400][critical][assert] [source/common/http/http2/codec_impl.cc:277] assert failure: !local_end_stream_.

As I understand it what's happening in this case is the StopIterationAndContinueFilter gets decodeHeaders(headers, end_stream = true)
sets end_stream_ = true
returns StopIterattion
When the alarm fires it calls commonContinue, hits this code path with end_stream_ already true, so if we used end_stream_ here we end up sending the onData with end stream after headers have been sent with end stream, and failing the assert.

We need to differentiate the case in PauseAndResumeHeadersOnly where commmonContinue is called after end_stream_ was set true in decode headers, and the case in PauseAndResume test, where common Continue is called where end_stream was false in decode headers and needs to be communicated via a 0 byte data frame.

When the alarm fires it calls commonContinue, hits this code path with end_stream_ already true, so if we used end_stream_ here we end up sending the onData with end stream after headers have been sent with end stream, and failing the assert.

But don't you check !end_stream_ so we shouldn't send any data frame?

intuitively we ought to be able to use end stream to determine if we need to create the synthetic data frame. I don't think we can.

looking at the state of things when we hit commonContinue, end_stream_ is true both in PauseAndResume and PauseAndResueHeadersOnly at the point we call commonContinue

for PauseAndResume, we set end stream when we process data with the fin, then need to create the synthetic data frame to result in doData kicking the pipeline (the original bug) so that further filters process that end stream.
for PauseAndReuseHeadersOnly we set end stream true when we process the headers, and need to not create a synthetic data frame (the new bug) and double fin.

Somehow we need to differentiate between the two cases. I'm cool reusing an existing variable but I don't think we can use end_stream_ in this case since we want a different behavioral outcome but it's the same value in both integration tests when we hit commonContinue. Unless you're suggesting somewhere else in the code base should be paying attention to end stream to avoid the double send?

OK after staring at this for a while I think I understand why we need the extra boolean, but I will admit that this is making my head explode, which gives me a strong indication that we need more comments and/or some way to make this code simpler in the future. Could you potentially add a comment that summarizes this discussion so that we don't lose the context of why we need a separate boolean? Bonus points for any TODO thoughts on how to make this code better. Perhaps @soya3129 has some thoughts also as she has thought about this a ton lately.

I think you already mentioned this, but yeah, I think it might make everything simpler to just add the synthetic empty data frame ahead of time vs. figure it out at continue time? Maybe that's the TODO?

mattklein123

Thanks for taking the time to help me understand this. Difficult stuff. This generally LGTM but I have some more questions and comment requests.

/wait

mattklein123 · 2019-07-25T16:47:29Z

source/common/http/conn_manager_impl.cc


    const bool new_metadata_added = processNewlyAddedMetadata();
-
+    if ((*entry)->end_stream_ && !new_metadata_added) {


Do we need the additional !new_metadata_added condition here? In this case we will add a data frame and that would be covered by the buffered data check in continue, right?

Basically I'm trying to reconcile why the logic is different in the decode case vs. the encode case. Is it different because metadata is not yet implemented in the encode path? Do we need a TODO there or a comment? cc @soya3129

sort of. It'll work without it because the metadata path adds its own fake buffer, but it's more accurate to say we didn't send the fin with data when we know we didn't
I believe the reason we need it here and not below, is the decode path has addDecodedMetadataData and the encode path doesn't have addEncodedMetadata yet.

Yeah, so the other route is #7720

It amounts to a roll back of the original PR and early buffer creation. I'm less comfortable with it because I understand what's going on less well. That said if you are more comfortable with it we can land the fix that way, and/or could runtime (default true or false) the eager buffer creation in case it causes problems of its own.

On the metadata encoding path, I didn't create addEncodedMetadata() because we can call callbacks_->encodeMetadata() directly (example) to insert new metadata, and the newly added metadata will be forwarded to downstream without waiting. But I think the inconsistency can cause confusing. I will implement addEncodedMetadata().

mattklein123 · 2019-07-25T16:49:10Z

source/common/http/conn_manager_impl.cc

+  //    if the fin went with headers but then data or metadata was added (so there is now buffered
+  //    data) the fin should still be sent with the buffered data.
+  const bool end_stream_with_data =
+      complete() && !trailers() && (!end_stream_with_headers_ || bufferedData());


OK after staring at this for a while I think I understand why we need the extra boolean, but I will admit that this is making my head explode, which gives me a strong indication that we need more comments and/or some way to make this code simpler in the future. Could you potentially add a comment that summarizes this discussion so that we don't lose the context of why we need a separate boolean? Bonus points for any TODO thoughts on how to make this code better. Perhaps @soya3129 has some thoughts also as she has thought about this a ton lately.

I think you already mentioned this, but yeah, I think it might make everything simpler to just add the synthetic empty data frame ahead of time vs. figure it out at continue time? Maybe that's the TODO?

alyssawilk assigned mattklein123 Jul 24, 2019

alyssawilk closed this Jul 24, 2019

alyssawilk reopened this Jul 24, 2019

alyssawilk force-pushed the continue branch 2 times, most recently from a676360 to 52d99cd Compare July 24, 2019 15:33

alyssawilk closed this Jul 24, 2019

alyssawilk reopened this Jul 24, 2019

alyssawilk force-pushed the continue branch from 52d99cd to e81821c Compare July 24, 2019 17:11

http: fixing a bug in common continue

ea31300

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

alyssawilk force-pushed the continue branch from e81821c to ea31300 Compare July 24, 2019 17:13

mattklein123 reviewed Jul 24, 2019

View reviewed changes

mattklein123 requested changes Jul 25, 2019

View reviewed changes

repokitteh-read-only bot added the waiting label Jul 25, 2019

alyssawilk mentioned this pull request Jul 25, 2019

http: fixing a bug in commonContinue #7720

Merged

alyssawilk closed this Jul 25, 2019

alyssawilk deleted the continue branch July 31, 2019 20:32


		const bool new_metadata_added = processNewlyAddedMetadata();

		if ((*entry)->end_stream_ && !new_metadata_added) {

Conversation

alyssawilk commented Jul 24, 2019

Uh oh!

alyssawilk commented Jul 24, 2019

Uh oh!

alyssawilk commented Jul 24, 2019

Uh oh!

mattklein123 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattklein123 Jul 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattklein123 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mattklein123 Jul 24, 2019 •

edited

Loading