Original PR with repro: #8504
After upgrading go-grpc, we started see the following error:
stream terminated by RST_STREAM with error code: REFUSED_STREAM
Upon investigation, it appears that in some scenarios, we fail to properly delete streams from the activeStreams map, resulting in this error being wrongly returned.
I added a test case that seems to trigger the problem:
PS: To test the behavior i added a "ActiveStreamTracker" hook -> im not sure this is the best way but i wanted just to show the problem
end2end_test.go:4015: leak streams: 5
The issue seems to have been introduced in the following PR: #8071
Specifically, the check added here:
|
if oldState == streamDone { |
|
return |
|
} |
If this conditional is removed, all streams are correctly cleaned up from the activeStreams map, and the test passes without leaks.
Root cause
- The server stream exhausts the stream quota, and starts buffering data frames.
- The server attempts to finish the RPC gracefully by sending trailers by calling finishStream.
- The write of the header frame gets blocked due to the pending data frames that are awaiting flow control quota.
- Once the deadline expires, the server's deadline monitoring timer tries to call closeStream, which is a no-op since the stream state is already marked
streamDone by finishStream in point 1.
- The client sends and RST stream, the server calls closeStream which is again a no-op.
The simplest fix is to remove the guard block added in closeStream in #8071. We may end up pushing redundant cleanupStream events on to the controlbuf, but it doesn't cause correctness issues since cleanupStreamHandler returns early in such cases.
Original PR with repro: #8504
After upgrading
go-grpc, we started see the following error:Upon investigation, it appears that in some scenarios, we fail to properly delete streams from the
activeStreamsmap, resulting in this error being wrongly returned.I added a test case that seems to trigger the problem:
PS: To test the behavior i added a "ActiveStreamTracker" hook -> im not sure this is the best way but i wanted just to show the problem
The issue seems to have been introduced in the following PR: #8071
Specifically, the check added here:
grpc-go/internal/transport/http2_server.go
Lines 1357 to 1359 in 57b69b4
If this conditional is removed, all streams are correctly cleaned up from the
activeStreamsmap, and the test passes without leaks.Root cause
streamDonebyfinishStreamin point 1.The simplest fix is to remove the guard block added in
closeStreamin #8071. We may end up pushing redundantcleanupStreamevents on to the controlbuf, but it doesn't cause correctness issues since cleanupStreamHandler returns early in such cases.