http: fixing a bug in common continue#7708
http: fixing a bug in common continue#7708alyssawilk wants to merge 1 commit intoenvoyproxy:masterfrom
Conversation
|
I don't think this is the cleanest fix, but I do think it's the safest fix. |
a676360 to
52d99cd
Compare
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
|
OK, cleaned up a bit - there was an HCM test exposing an issue with buffered data and now I have a confusing but hopefully sufficiently well explained workaround for that. |
mattklein123
left a comment
There was a problem hiding this comment.
Thanks for the quick fix and digging into this. I have one question as this is complicated stuff.
| // if the fin went with headers but then data or metadata was added (so there is now buffered | ||
| // data) the fin should still be sent with the buffered data. | ||
| const bool end_stream_with_data = | ||
| complete() && !trailers() && (!end_stream_with_headers_ || bufferedData()); |
There was a problem hiding this comment.
There was a problem hiding this comment.
Not as the filter chain currently operates - that's the bug in the current implementation.
If we have a header only stream (end stream is true) and it calls stop iteration, when we call common continue to resume processing we see if there's more work to do for this filter, at which point since end stream is true, we'd send an empty data frame with end stream, confusing things. There's no way to differentiate between "end stream with a fin already sent" and "end stream but we need to send a fin" at this point (though we could theoretically eagerly create the buffer on decodeHeaders returning stop iteration if you'd prefer)
We also can't check end stream at the beginning of commonContinue because when we addDecodedData (or recently when we add metadata) we don't set end stream false. That's one of the high risk refactors I'd be willing to (carefully, probably guarding) tackle as a follow up.
There was a problem hiding this comment.
I'm probably missing something but I think checking complete() is different from end_stream_? I'm just looking at the code changes above, and it seems like in both of those cases, checking end_stream_ here along with whether there is buffered data would cover those cases?
There was a problem hiding this comment.
So if I swap end_stream_with_data to end_stream_ here, I get a segfault in the new regression test, which I believe is reproing the associated bug
[ RUN ] IpVersions/Http2IntegrationTest.PauseAndResumeHeadersOnly/IPv4
[2019-07-24 20:42:06.590][400][critical][assert] [source/common/http/http2/codec_impl.cc:277] assert failure: !local_end_stream_.
As I understand it what's happening in this case is the StopIterationAndContinueFilter gets decodeHeaders(headers, end_stream = true)
sets end_stream_ = true
returns StopIterattion
When the alarm fires it calls commonContinue, hits this code path with end_stream_ already true, so if we used end_stream_ here we end up sending the onData with end stream after headers have been sent with end stream, and failing the assert.
We need to differentiate the case in PauseAndResumeHeadersOnly where commmonContinue is called after end_stream_ was set true in decode headers, and the case in PauseAndResume test, where common Continue is called where end_stream was false in decode headers and needs to be communicated via a 0 byte data frame.
There was a problem hiding this comment.
When the alarm fires it calls commonContinue, hits this code path with end_stream_ already true, so if we used end_stream_ here we end up sending the onData with end stream after headers have been sent with end stream, and failing the assert.
But don't you check !end_stream_ so we shouldn't send any data frame?
There was a problem hiding this comment.
intuitively we ought to be able to use end stream to determine if we need to create the synthetic data frame. I don't think we can.
looking at the state of things when we hit commonContinue, end_stream_ is true both in PauseAndResume and PauseAndResueHeadersOnly at the point we call commonContinue
for PauseAndResume, we set end stream when we process data with the fin, then need to create the synthetic data frame to result in doData kicking the pipeline (the original bug) so that further filters process that end stream.
for PauseAndReuseHeadersOnly we set end stream true when we process the headers, and need to not create a synthetic data frame (the new bug) and double fin.
Somehow we need to differentiate between the two cases. I'm cool reusing an existing variable but I don't think we can use end_stream_ in this case since we want a different behavioral outcome but it's the same value in both integration tests when we hit commonContinue. Unless you're suggesting somewhere else in the code base should be paying attention to end stream to avoid the double send?
There was a problem hiding this comment.
OK after staring at this for a while I think I understand why we need the extra boolean, but I will admit that this is making my head explode, which gives me a strong indication that we need more comments and/or some way to make this code simpler in the future. Could you potentially add a comment that summarizes this discussion so that we don't lose the context of why we need a separate boolean? Bonus points for any TODO thoughts on how to make this code better. Perhaps @soya3129 has some thoughts also as she has thought about this a ton lately.
I think you already mentioned this, but yeah, I think it might make everything simpler to just add the synthetic empty data frame ahead of time vs. figure it out at continue time? Maybe that's the TODO?
mattklein123
left a comment
There was a problem hiding this comment.
Thanks for taking the time to help me understand this. Difficult stuff. This generally LGTM but I have some more questions and comment requests.
/wait
|
|
||
| const bool new_metadata_added = processNewlyAddedMetadata(); | ||
|
|
||
| if ((*entry)->end_stream_ && !new_metadata_added) { |
There was a problem hiding this comment.
Do we need the additional !new_metadata_added condition here? In this case we will add a data frame and that would be covered by the buffered data check in continue, right?
Basically I'm trying to reconcile why the logic is different in the decode case vs. the encode case. Is it different because metadata is not yet implemented in the encode path? Do we need a TODO there or a comment? cc @soya3129
There was a problem hiding this comment.
sort of. It'll work without it because the metadata path adds its own fake buffer, but it's more accurate to say we didn't send the fin with data when we know we didn't
I believe the reason we need it here and not below, is the decode path has addDecodedMetadataData and the encode path doesn't have addEncodedMetadata yet.
There was a problem hiding this comment.
Yeah, so the other route is #7720
It amounts to a roll back of the original PR and early buffer creation. I'm less comfortable with it because I understand what's going on less well. That said if you are more comfortable with it we can land the fix that way, and/or could runtime (default true or false) the eager buffer creation in case it causes problems of its own.
There was a problem hiding this comment.
On the metadata encoding path, I didn't create addEncodedMetadata() because we can call callbacks_->encodeMetadata() directly (example) to insert new metadata, and the newly added metadata will be forwarded to downstream without waiting. But I think the inconsistency can cause confusing. I will implement addEncodedMetadata().
| // if the fin went with headers but then data or metadata was added (so there is now buffered | ||
| // data) the fin should still be sent with the buffered data. | ||
| const bool end_stream_with_data = | ||
| complete() && !trailers() && (!end_stream_with_headers_ || bufferedData()); |
There was a problem hiding this comment.
OK after staring at this for a while I think I understand why we need the extra boolean, but I will admit that this is making my head explode, which gives me a strong indication that we need more comments and/or some way to make this code simpler in the future. Could you potentially add a comment that summarizes this discussion so that we don't lose the context of why we need a separate boolean? Bonus points for any TODO thoughts on how to make this code better. Perhaps @soya3129 has some thoughts also as she has thought about this a ton lately.
I think you already mentioned this, but yeah, I think it might make everything simpler to just add the synthetic empty data frame ahead of time vs. figure it out at continue time? Maybe that's the TODO?
Risk Level: High (change to HCM)
Testing: new integration test, with header
Docs Changes: n/a
Release Notes: n/a
Fixes #7595