Failing to delete the stream from the activeStreams map leading to REFUSED_STREAM errors.#8504
Failing to delete the stream from the activeStreams map leading to REFUSED_STREAM errors.#8504alanprot wants to merge 1 commit into
activeStreams map leading to REFUSED_STREAM errors.#8504Conversation
Signed-off-by: alanprot <alanprot@gmail.com>
ccc4735 to
d09c515
Compare
activeStreams map leading to "REFUSED_STREAM" errors.activeStreams map leading to REFUSED_STREAM errors.
|
Hi @alanprot, thanks for sharing the test to repro. I haven't debugged the test, but wanted to ask if you know why the check added causes stream deletion to be skipped? From a look quick look at the places grpc-go/internal/transport/http2_server.go Lines 1332 to 1343 in 57b69b4 grpc-go/internal/transport/http2_server.go Lines 1350 to 1360 in 57b69b4 In both places, Edit: It seems like the |
|
Hi @arjan-bal As far i could see the problem is we add set the clean and add the hdr on the controlbuff at grpc-go/internal/transport/http2_server.go Line 1338 in 19c720f And we get to: grpc-go/internal/transport/controlbuf.go Lines 692 to 696 in 19c720f And at this point, the stream is not on the grpc-go/internal/transport/controlbuf.go Lines 947 to 950 in 19c720f And i think this stream never become added back on the So it seems that it only happens when the ServerSide is sending tons of data and waiting for quota? |
|
Thank you @alanprot, I think I understand the sequence of events leading to this.
The simplest fix is to remove the guard block added in @dfawley what do you think? |
That makes sense to me. I can't think of any other way to do this. Does the transport properly handle the rest of this situation, especially: dropping the queued data frames when the RST_STREAM is sent (or shortly thereafter)? |
|
Also maybe we wanna consider decrementing this metric only if the stream was present on the activeStreams map? grpc-go/internal/transport/http2_server.go Lines 1307 to 1323 in 18ee309 |
All the pending Data and Header frames are dropped in cleanupStreamHandler.
The check for max concurrent streams depends on the size of the activeStreams map. Without timeout: With timeout: |
|
@alanprot are you interested in raising a PR with the fix and a unit test for catching regressions? |
|
@arjan-bal I can definitely do that, but I’m out until Tuesday, so I’d only be able to pick it up then. :) |
Description
Hi,
After upgrading
go-grpc, we started see the following error:Upon investigation, it appears that in some scenarios, we fail to properly delete streams from the
activeStreamsmap, resulting in this error being wrongly returned.I added a test case that seems to trigger the problem:
PS: To test the behavior i added a "ActiveStreamTracker" hook -> im not sure this is the best way but i wanted just to show the problem
The issue seems to have been introduced in the following PR: #8071
Specifically, the check added here:
grpc-go/internal/transport/http2_server.go
Lines 1357 to 1359 in 57b69b4
If this conditional is removed, all streams are correctly cleaned up from the
activeStreamsmap, and the test passes without leaks.