http: fixing a resumption bug in pipelining#8352
http: fixing a resumption bug in pipelining#8352mattklein123 merged 9 commits intoenvoyproxy:masterfrom
Conversation
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
babad3e to
a16cdee
Compare
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
|
Yikes. @alyssawilk do we know when this regressed? I'm positive this worked at some point. |
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
|
I don't think this bug likely causes problems in the wild because almost every client that pipelines requests This totally works for Envoy, because in that case we've enabled reads and we actually get a real kernel read rather than a synthetic "kick off an event" read. Looking at the connection class, I'm not sure when it worked. When we did the refactor for half-close two years ago the behavior of "do not call onRead if there are no bytes read" already existed. We've got codec level tests "verifying" that the readEnable works, but because they used a mocked out connection class they didn't catch the bug. I picked this up turning up more internal integration tests which (unlike most browsers) ship both requests together for just this case. Integration tests FTW! :-) |
|
Interesting, great find. I swear this used to work as I remember writing the code to handle this case, but I'm probably not remembering correctly! |
Sorry could you expand on this? I'm having a hard time seeing the doubling. Wouldn't the first time generally drain? Or the point is that we didn't drain everything, and then on close we re-dispatch because there is remaining data?
This sounds like the right idea, agreed the other stuff is scary. I will look at this in more detail in the morning when I am fresher to see if I have any better ideas. |
Yeah, the second is exactly it. EmptyReadOnCloseTest verifies if we get a read event which is "close the connection" we don't do a spurious onData. I had to change that test to drain the buffer because with this current patch if we get the FIN we don't disambiguate between it being a real or fake kick, and so send the buffered data up.
SGTM, I'll plan on tackling that after you take a look with the extra context. It felt like a bit of a hacky one off and I was hoping for a more elegant solution but I'm happy to have the extra state to avoid the "ondata on close" weirdness. |
|
In looking at it more this is the code I was thinking of: envoy/source/common/http/conn_manager_impl.cc Lines 322 to 325 in e082812 I don't recall the history now, but yeah clearly that won't work in the case in which we finish the stream not during dispatch. Whatever we end up doing here though, I feel like we should remove ^ because it would now be redundant assuming readDisable() correctly redispatches, right? So I guess then my next question would be: can we just have readDisable() inline dispatch data instead of scheduling a fake event at all? I'm guessing the answer is no because we might get into a situation in which we recursively call onData() (the reason the code above was there in the first place), but I will throw that out for consideration, as I think it would simplify things. Assuming we can't do direct dispatch, I think your idea to keep track of the fake event and handle it differently is the right one. I don't think it would be too hard to implement. As an aside, if we add another bool state field, I might consider making all of those bools into a bit field at this point. There would probably be 64+ bytes of savings per-connection which is probably worthwhile. WDYT? |
|
Yeah, I think we should be able to clean up the redispatch logic if this works cleanly (though I think I'd prefer to TODO it and land one dangerous PR at a time!) I agree with avoiding a codec dispatch from onMessageComplete. It feels like asking for an opportunity to blow stack if a local reply can happen inline, and while if unwinding from one request is safe unwinding from multiple is probably safe it'd still make me twitchy. I'll boolean and bitfield when I get a chance, unless someone has any better suggestions. |
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
| // gets processed regardless. | ||
| // gets processed regardless and ensure that we push it up via onRead. | ||
| if (read_buffer_.length() > 0) { | ||
| force_on_read_ = true; |
There was a problem hiding this comment.
So we call this on the connection in two places, here and setReadBufferReady()
Looking at call sites of setReadBufferReady it looks like we're doing that where we actually want to resume reading from the socket (and don't want to cause a spurious onRead if there's no further data). I think I'm inclined to just leave this as-is, but if we think future users of the network connection might not fully read data and use setReadBufferReady to trigger the dispatch (rather than readEnable/Disable as we do here), we may want to add a boolean to force on read there as well.
| } | ||
| } | ||
|
|
||
| // TODO(alyssawilk) clean this up after #8352 is well vetted. |
There was a problem hiding this comment.
Also added this but now I'm wondering if we can land the connection fix or we have to clean this up inline or something weird will happen? I'll take a look tomorrow.
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
|
Ok, I think separate clean up is fine now
|
mattklein123
left a comment
There was a problem hiding this comment.
Thanks this looks great! Small test question/comment.
/wait-any
|
|
||
| // The HTTP/1 codec handles pipelined connections by relying on readDisable(false) resulting in the | ||
| // subsequent request being dispatched. Regression test this behavior. | ||
| TEST_P(ConnectionImplTest, ReadEnableDispatches) { |
There was a problem hiding this comment.
Is it possible to have an explicit test for the disconnect case to make sure we don't dispatch on close?
There was a problem hiding this comment.
We have one!
If you check out a16cdee
I had to change TEST_P(ConnectionImplTest, EmptyReadOnCloseTest)
to drain the buffer (or it got an extra dispatch) and added
TEST_P(ConnectionImplTest, ReadEventIfBufferedDataOnClose) {
to show with that draft (which I didn't like) how we got the event when we had buffered data.
Now that we have a fix which tests for explicit kick, ConnectionImplTest.EmptyReadOnCloseTest shows that when we close when there's buffered data, we don't push it up.
The one exception is if we explicitly ask for a kick, and then do a read and get the FIN, we will push the data before we push the close. I think that's the correct behavior, as it means the behavior when we have buffered data and ask for dispatch, matches when there's kernel buffered data, which is to say we always pass up the data before closing.
There was a problem hiding this comment.
OK, awesome, thanks for clarifying.
|
/azp run envoy-macos |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
The fundamental problem is the way we handle pipelining nowadays is by doing a readDisable(true) when processing a given request and readDisable(false) when we are done. If we've already read the next request, we won't ever resume. Kicking off readDisable(false) creates a fake event, onReadReady does a read, but then if no further data is read that doesn't result in a call to onRead() so the buffered data is not passed up the stack to the codec.
Now tracking that "kick" and making sure we call onRead() when force-kicked even if no new data is ready from the socket.
Intentionally leaving setReadBufferReady as-is as it's supposed to kick off a socket read, and does not currently need to interact with buffered data.
Risk Level: High (changes to network::connection)
Testing: new unit tests, integration test
Docs Changes: n/a
Release Notes: not added