Skip to content

http: fixing a bug in common continue#7708

Closed
alyssawilk wants to merge 1 commit intoenvoyproxy:masterfrom
alyssawilk:continue
Closed

http: fixing a bug in common continue#7708
alyssawilk wants to merge 1 commit intoenvoyproxy:masterfrom
alyssawilk:continue

Conversation

@alyssawilk
Copy link
Copy Markdown
Contributor

Risk Level: High (change to HCM)
Testing: new integration test, with header
Docs Changes: n/a
Release Notes: n/a
Fixes #7595

@alyssawilk
Copy link
Copy Markdown
Contributor Author

I don't think this is the cleanest fix, but I do think it's the safest fix.
Happy to do a higher risk (maybe runtime protected?) clean up refactor as a follow-up if you have ideas

@alyssawilk alyssawilk closed this Jul 24, 2019
@alyssawilk alyssawilk reopened this Jul 24, 2019
@alyssawilk alyssawilk force-pushed the continue branch 2 times, most recently from a676360 to 52d99cd Compare July 24, 2019 15:33
@alyssawilk alyssawilk closed this Jul 24, 2019
@alyssawilk alyssawilk reopened this Jul 24, 2019
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@alyssawilk
Copy link
Copy Markdown
Contributor Author

OK, cleaned up a bit - there was an HCM test exposing an issue with buffered data and now I have a confusing but hopefully sufficiently well explained workaround for that.

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick fix and digging into this. I have one question as this is complicated stuff.

// if the fin went with headers but then data or metadata was added (so there is now buffered
// data) the fin should still be sent with the buffered data.
const bool end_stream_with_data =
complete() && !trailers() && (!end_stream_with_headers_ || bufferedData());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably stupid question: Instead of adding a new end_stream_with_headers_ variable can't we just check end_stream_? I think this would cover all the cases in which we either already set end_stream_ or we have buffered data to send? cc @soya3129 and @snowp for thoughts.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not as the filter chain currently operates - that's the bug in the current implementation.
If we have a header only stream (end stream is true) and it calls stop iteration, when we call common continue to resume processing we see if there's more work to do for this filter, at which point since end stream is true, we'd send an empty data frame with end stream, confusing things. There's no way to differentiate between "end stream with a fin already sent" and "end stream but we need to send a fin" at this point (though we could theoretically eagerly create the buffer on decodeHeaders returning stop iteration if you'd prefer)

We also can't check end stream at the beginning of commonContinue because when we addDecodedData (or recently when we add metadata) we don't set end stream false. That's one of the high risk refactors I'd be willing to (carefully, probably guarding) tackle as a follow up.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably missing something but I think checking complete() is different from end_stream_? I'm just looking at the code changes above, and it seems like in both of those cases, checking end_stream_ here along with whether there is buffered data would cover those cases?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I swap end_stream_with_data to end_stream_ here, I get a segfault in the new regression test, which I believe is reproing the associated bug

[ RUN ] IpVersions/Http2IntegrationTest.PauseAndResumeHeadersOnly/IPv4
[2019-07-24 20:42:06.590][400][critical][assert] [source/common/http/http2/codec_impl.cc:277] assert failure: !local_end_stream_.

As I understand it what's happening in this case is the StopIterationAndContinueFilter gets decodeHeaders(headers, end_stream = true)
sets end_stream_ = true
returns StopIterattion
When the alarm fires it calls commonContinue, hits this code path with end_stream_ already true, so if we used end_stream_ here we end up sending the onData with end stream after headers have been sent with end stream, and failing the assert.

We need to differentiate the case in PauseAndResumeHeadersOnly where commmonContinue is called after end_stream_ was set true in decode headers, and the case in PauseAndResume test, where common Continue is called where end_stream was false in decode headers and needs to be communicated via a 0 byte data frame.

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 Jul 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the alarm fires it calls commonContinue, hits this code path with end_stream_ already true, so if we used end_stream_ here we end up sending the onData with end stream after headers have been sent with end stream, and failing the assert.

But don't you check !end_stream_ so we shouldn't send any data frame?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intuitively we ought to be able to use end stream to determine if we need to create the synthetic data frame. I don't think we can.

looking at the state of things when we hit commonContinue, end_stream_ is true both in PauseAndResume and PauseAndResueHeadersOnly at the point we call commonContinue

for PauseAndResume, we set end stream when we process data with the fin, then need to create the synthetic data frame to result in doData kicking the pipeline (the original bug) so that further filters process that end stream.
for PauseAndReuseHeadersOnly we set end stream true when we process the headers, and need to not create a synthetic data frame (the new bug) and double fin.

Somehow we need to differentiate between the two cases. I'm cool reusing an existing variable but I don't think we can use end_stream_ in this case since we want a different behavioral outcome but it's the same value in both integration tests when we hit commonContinue. Unless you're suggesting somewhere else in the code base should be paying attention to end stream to avoid the double send?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK after staring at this for a while I think I understand why we need the extra boolean, but I will admit that this is making my head explode, which gives me a strong indication that we need more comments and/or some way to make this code simpler in the future. Could you potentially add a comment that summarizes this discussion so that we don't lose the context of why we need a separate boolean? Bonus points for any TODO thoughts on how to make this code better. Perhaps @soya3129 has some thoughts also as she has thought about this a ton lately.

I think you already mentioned this, but yeah, I think it might make everything simpler to just add the synthetic empty data frame ahead of time vs. figure it out at continue time? Maybe that's the TODO?

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the time to help me understand this. Difficult stuff. This generally LGTM but I have some more questions and comment requests.

/wait


const bool new_metadata_added = processNewlyAddedMetadata();

if ((*entry)->end_stream_ && !new_metadata_added) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the additional !new_metadata_added condition here? In this case we will add a data frame and that would be covered by the buffered data check in continue, right?

Basically I'm trying to reconcile why the logic is different in the decode case vs. the encode case. Is it different because metadata is not yet implemented in the encode path? Do we need a TODO there or a comment? cc @soya3129

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sort of. It'll work without it because the metadata path adds its own fake buffer, but it's more accurate to say we didn't send the fin with data when we know we didn't
I believe the reason we need it here and not below, is the decode path has addDecodedMetadataData and the encode path doesn't have addEncodedMetadata yet.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so the other route is #7720

It amounts to a roll back of the original PR and early buffer creation. I'm less comfortable with it because I understand what's going on less well. That said if you are more comfortable with it we can land the fix that way, and/or could runtime (default true or false) the eager buffer creation in case it causes problems of its own.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the metadata encoding path, I didn't create addEncodedMetadata() because we can call callbacks_->encodeMetadata() directly (example) to insert new metadata, and the newly added metadata will be forwarded to downstream without waiting. But I think the inconsistency can cause confusing. I will implement addEncodedMetadata().

// if the fin went with headers but then data or metadata was added (so there is now buffered
// data) the fin should still be sent with the buffered data.
const bool end_stream_with_data =
complete() && !trailers() && (!end_stream_with_headers_ || bufferedData());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK after staring at this for a while I think I understand why we need the extra boolean, but I will admit that this is making my head explode, which gives me a strong indication that we need more comments and/or some way to make this code simpler in the future. Could you potentially add a comment that summarizes this discussion so that we don't lose the context of why we need a separate boolean? Bonus points for any TODO thoughts on how to make this code better. Perhaps @soya3129 has some thoughts also as she has thought about this a ton lately.

I think you already mentioned this, but yeah, I think it might make everything simpler to just add the synthetic empty data frame ahead of time vs. figure it out at continue time? Maybe that's the TODO?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

segfault in ActiveStreamFilterBase::commonContinue()

3 participants