transport: make the client send a RST_STREAM when it receives an END_STREAM from the server#8832
Conversation
…ilers if client is not half-closed
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #8832 +/- ##
==========================================
- Coverage 83.34% 82.97% -0.37%
==========================================
Files 410 411 +1
Lines 32598 32704 +106
==========================================
- Hits 27168 27137 -31
- Misses 4050 4175 +125
- Partials 1380 1392 +12
🚀 New features to boost your workflow:
|
arjan-bal
left a comment
There was a problem hiding this comment.
IIUC, the situation mentioned in #835 is not covered. Specifically, the server sends trailers with the EOS flag but does not send an RST_STREAM frame. The client is expected to send an RST_STREAM to inform proxies about the stream closure.
easwars
left a comment
There was a problem hiding this comment.
IIUC, the situation mentioned in #835 is not covered. Specifically, the server sends trailers with the EOS flag but does not send an RST_STREAM frame. The client is expected to send an RST_STREAM to inform proxies about the stream closure.
Added a test for it, and combined all three scenarios for the client into a single table driven test.
dfawley
left a comment
There was a problem hiding this comment.
It would probably be a good idea to write a test that ensures unread messages can still be read when this happens. I.e.:
- Start stream; client sits idle for now
- Server sends a few messages and a status
- After the server has done this and we know the client has finished calling
closeStream(somehow...maybe the server closes the connection (gracefully?) and we check the connectivity state?), client then reads the messages out of the stream.
Done. I added a test to to verify that the client can still read unread messages after sending a RST_STREAM upon receiving an EOS from the server. I also refactored the existing table-driven test into individual, clearer test functions using a shared helper as the test logic was getting more and more complicated. |
|
Oh, sorry, it seems I misinterpreted the github commit ordering. I'll look at the proper commit now! |
| return | ||
| } | ||
| serverFrames(t, framer, streamID) | ||
| <-seenResetFrame |
There was a problem hiding this comment.
What's the significance in making this goroutine block until the RST_STREAM happens? It doesn't do anything after so it seems like it could just be deleted (and all of seenResetFrame)
There was a problem hiding this comment.
The serverDone channel is closed when this goroutine exits. And the serverDone channel being closed is used as a signal in main the test goroutine to indicate that the server actually saw a RST_FRAME from the client.
There was a problem hiding this comment.
the serverDone channel being closed is used as a signal in main the test goroutine to indicate that the server actually saw a RST_FRAME from the client
This goroutine can exit for a number of reasons, so it seems <-serverDone doesn't actually verify anything in and of itself (aside from "the server goroutine has ended"). I guess those other paths happen to all t.Error? But <-serverDone doesn't directly prove <-seenResetFrame. Oh, and it does look like a listener error doesn't include a t.Error.
It would be safer, I suppose, if you did some kind of defer func() { if err != nil { t.Error(err.Error()) } } too?
There was a problem hiding this comment.
I changed this function to return two channels instead:
- to signal the receipt of the RST_FRAME (to verify test expectations)
- to signal the server being done (to ensure test cleanup)
I ended up adding a t.Errorf to the only place that was missing it instead of deferring a call to it, since I would have to change so many of the existing code paths in that case.
There was a problem hiding this comment.
So now what's the significance of serverDone besides (indirectly) telling you that the RST_STREAM frame was seen? Can it just be removed or is it still needed for correctness? This blocking here in general seems like it's only important to avoid closing the connection too soon? Could we fix that by closing the connection upon receipt of the RST_STREAM frame (or an error in that other goroutine), and allow this goroutine to exit as soon as serverFrames returns?
| }); err != nil { | ||
| t.Errorf("Server failed to write headers: %v", err) | ||
| } | ||
| if err := framer.WriteData(streamID, false, expectedResponse); err != nil { |
There was a problem hiding this comment.
I was expecting this would want to be true to exercise the newly added code path, right? (and then delete the trailer writing below) It's in handleData that we added something that would call closeStream.
There was a problem hiding this comment.
That's right. Changed it to send a EOS with the data, and no trailers.
Also, had to change the check for the status returned from the stream to Internal because that is what we are returning from handleData when we see EOS without trailers.
| }); err != nil { | ||
| t.Errorf("Server failed to write headers: %v", err) | ||
| } | ||
| if err := framer.WriteData(streamID, false, expectedResponse); err != nil { |
There was a problem hiding this comment.
This "expectedResponse" thing bugs me for some reason. It seems like if something were to overwrite it with zeros in any test case then this test case would be compromised if reading zeros out of the stream is the default behavior.
There was a problem hiding this comment.
I agree. I didn't change it because it was an existing pattern and it is used by other existing tests. I could fix it in a follow-up.
|
Looks like there is a build failure now |
I had to merge master into this branch to get rid of the build failure, because it wasn't failing for me locally. |
|
FYI there is an open question from me above: #8832 (comment) |
Addressed it now. Thanks. |
| @@ -3537,16 +3540,6 @@ func setupRSTStreamOnEOSTest(t *testing.T, serverFrames func(*testing.T, *http2. | |||
| }() | |||
|
|
|||
| serverFrames(t, framer, streamID) | |||
There was a problem hiding this comment.
I just realized this is potentially risky. You're using the framer to read in the above goroutine and you're sending the framer to some other place in the code to use, but only for writing. Just make sure it's very clear this can only be used for writing frames!
There was a problem hiding this comment.
Clarified this by changing the name of the parameter and mentioning it in the docstring of this function.
| t.Errorf("RST stream received with streamID: %d and code: %v, want streamID: %d and code: %v", frame.Header().StreamID, http2.ErrCode(frame.ErrCode), streamID, wantErrCode) | ||
| } | ||
| close(seenResetFrame) | ||
| conn.Close() |
There was a problem hiding this comment.
How do we know the server is done writing at this point? Closing the connection could cause errors for that other function, couldn't it? I'm curious why you changed the connection closing strategy from how it was before. I'd probably do exactly what you had before with one minor change to run serverFrames in a goroutine so that the timeout works for it, too. I.e. instead of using seenResetFrame, maybe do this?:
defer conn.Close() // as before
ctx := context.WithTimeout(ctx, defaultTestTimeout) // or make test body do it.
go func() {
readDone := make(chan struct{})
defer close(readDone)
// this reader goroutine
}()
go func() {
writeDone := make(chan struct{})
defer close(writeDone)
serverFrames(...)
}()
select {
case <-ctx.Done(): t.Errorf()
case <-readDone:
}
select {
case <-ctx.Done(): t.Errorf()
case <-writeDone:
}
// Now it's safe to exit and close the connection.There was a problem hiding this comment.
How do we know the server is done writing at this point? Closing the connection could cause errors for that other function, couldn't it?
In setupRSTStreamOnEOSTest, the sequence of events is:
- The server sends a frame with END_STREAM.
- The client receives and processes this frame, marking the stream as closed for reading (io.EOF is buffered) and then sends a RST_STREAM back to the server.
- The server's reader goroutine receives the RST_STREAM.
- Only after receiving the RST_STREAM does the server call conn.Close().
Because the client only sends the RST_STREAM after it has already processed the END_STREAM and the server does not write after the last frame where END_STREAM is set, we are safe.
But we could potentially add more scenarios in the future where this is not guaranteed. So, I ended up taking your suggestion and changed the code to run sendServerFrames in a goroutine.
I still needed some signaling between the top-level server goroutine started in setupRSTStreamOnEOSTest and the test goroutine because otherwise, the test goroutine can end earlier, and cancel the context and make this goroutine call t.Error. So, I ended up returning a channel from this function that is closed when the server is done-done.
There was a problem hiding this comment.
I also noticed that one of the existing tests was a subset of another, so ended up removing it.
There was a problem hiding this comment.
So, I ended up returning a channel from this function that is closed when the server is done-done.
Makes sense!
The only minor tweak I could suggest now would be, instead of returning the channel, either returning a shutdown function that is deferred that is
func {
select {
case <-done:
case <-ctx.Done():
t.Error()
}
}or using t.Cleanup with the same. That way you get proper timeout support in case there is a regression, without having to repeat that at every call site. The returned func would be better if we needed to block in the middle of the test case until this finishes, otherwise the t.Cleanup is even easier, but also magical so 🤷.
There was a problem hiding this comment.
I had initially not used a select when waiting for the server to be done (in each of the three tests) because I thought we were waiting for all both the reading and writing goroutines spawned by the test. But it's definitely better to wait on it with a timeout. The t.Cleanup was the first one that I tried, but it doesn't work because the context is canceled with a defer and there is no synchronization between the defers and the t.Cleanups. So, ended up returning a function that the callers would defer.
Fixes #835
This PR fixes the behavior of the client to send a RST_STREAM when it receives an END_STREAM from the server when the client-side of the stream is still open. It also adds tests for both the client and server side behaviors of sending RST_STREAM when they receive an END_STREAM from their peer.
RELEASE NOTES: