transport/server: fix race that could cause a stray header to be sent#5513
transport/server: fix race that could cause a stray header to be sent#5513dfawley merged 2 commits intogrpc:masterfrom
Conversation
| // TODO(zhaoq): Now it indicates the end of entire stream. Revisit if early | ||
| // OK is adopted. | ||
| func (t *http2Server) WriteStatus(s *Stream, st *status.Status) error { | ||
| s.hdrMu.Lock() |
There was a problem hiding this comment.
So what you wrote in issue -" It should be fine to hold the stream's header mutex during these entire functions; they are very infrequently called" is why it is ok for this unlock to be past stuff like stats handlers which would block any operations waiting on the mutex?
There was a problem hiding this comment.
We could unlock before calling the stats handler (we technically just need to call updateHeaderSent and finishStream to prevent any races), but this should be okay considering a race between multiple calls to WriteStatus and/or WriteHeader should be extremely rare.
There was a problem hiding this comment.
Sg. Seems very minimal, but was just wondering your thought process.
| } | ||
|
|
||
| s.hdrMu.Lock() | ||
| defer s.hdrMu.Unlock() |
There was a problem hiding this comment.
This is orthogonal to the bug fix right? I get that this fixes a correctness issue with s.updateHeaderSent, but you could provide me a concrete example of what this fixes?
There was a problem hiding this comment.
We want to hold the lock while calling s.updateHeaderSent and checking/setting s.getState at least. Otherwise it races with WriteStatus doing similar operations. So while this is not exactly the original bug, it's another related race that is possible if trying to send headers while returning from the method handler.
There was a problem hiding this comment.
Yeah makes sense to me. Although you need to hold mutex for writeHeaderLocked() too right? (What you have currently...but your comment just mentions updateHeaderSent and getState).
| t.Fatalf("Error starting endpoint server: %v", err) | ||
| } | ||
| defer ss.Stop() | ||
| ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) |
There was a problem hiding this comment.
Spaces?
We have a pretty healthy existing mix of spaces/no spaces, and gofmt is OK with either.
$ grep -r '\* time.Second' . | wc -l
235
$ grep -r '\*time.Second' . | wc -l
134
I'm inclined to just leave this.
There was a problem hiding this comment.
Yeah, talked about offline, this was thought carried over from two of my internship's linters. Didn't realize we have both for some reason. This is fine.
| } | ||
|
|
||
| // TestRecvWhileReturningStatus performs a Recv in a service handler while the | ||
| // handler returns its status. A race condition could result in the server |
There was a problem hiding this comment.
"A race condition could result in the server sending headers twice (once as a result of the status and once as a result of the failed Recv call)." This doesn't seem like the race to me? Isn't the race the bit where instead of writing the header with result from status first it writes the header from failed Recv call without status first due to s.updateHeaderSent is true). Regardless of ordering, both headers get sent out anyway? Hence your comment in issue "E.g. to ignore the extra headers on a stream after END_STREAM was already sent".
There was a problem hiding this comment.
Isn't the race the bit where instead of writing the header with result from status first it writes the header from failed Recv call without status first due to s.updateHeaderSent is true
I'm not sure of the order -- it probably doesn't matter. The failed Recv call will also write a status, though, here:
Lines 1586 to 1588 in b695a7f
Is there something you think needs to change here?
There was a problem hiding this comment.
Yeah, I saw that line when triaging. All I was getting at with this comment was that I think the explanation of the race in the comment is wrong. The race is the fact that the second WriteStatus call reads updateHeaderSent as true, doesn't attach :status, and then sends that BEFORE the first WriteStatus call on the wire without the Status, which is what broke howardjohn. Saying the race is sending headers twice is accurate but not the explanation as to why it broke client in this scenario. (And yes it also breaks HTTP_2 because sends another frame after END_STREAM)
There was a problem hiding this comment.
Aha, I hadn't completely linked the reason for the failure on the client side, actually, but that makes more sense than my original thought that two :status headers were written and the client saw the second one and errored. So my comment about "ignore the extra headers on a stream after END_STREAM was already sent" is probably bogus. Updated this comment.
| // TODO(zhaoq): Now it indicates the end of entire stream. Revisit if early | ||
| // OK is adopted. | ||
| func (t *http2Server) WriteStatus(s *Stream, st *status.Status) error { | ||
| s.hdrMu.Lock() |
There was a problem hiding this comment.
Sg. Seems very minimal, but was just wondering your thought process.
| onWrite: t.setResetPingStrikes, | ||
| } | ||
| s.hdrMu.Unlock() | ||
|
|
There was a problem hiding this comment.
Nit: do we want this blank line (used to have no blanks)
There was a problem hiding this comment.
I think so. Helps to visually separate building the frame from writing it.
| } | ||
|
|
||
| // TestRecvWhileReturningStatus performs a Recv in a service handler while the | ||
| // handler returns its status. A race condition could result in the server |
There was a problem hiding this comment.
Yeah, I saw that line when triaging. All I was getting at with this comment was that I think the explanation of the race in the comment is wrong. The race is the fact that the second WriteStatus call reads updateHeaderSent as true, doesn't attach :status, and then sends that BEFORE the first WriteStatus call on the wire without the Status, which is what broke howardjohn. Saying the race is sending headers twice is accurate but not the explanation as to why it broke client in this scenario. (And yes it also breaks HTTP_2 because sends another frame after END_STREAM)
| } | ||
|
|
||
| s.hdrMu.Lock() | ||
| defer s.hdrMu.Unlock() |
There was a problem hiding this comment.
Yeah makes sense to me. Although you need to hold mutex for writeHeaderLocked() too right? (What you have currently...but your comment just mentions updateHeaderSent and getState).
| t.Fatalf("Error starting endpoint server: %v", err) | ||
| } | ||
| defer ss.Stop() | ||
| ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) |
There was a problem hiding this comment.
Yeah, talked about offline, this was thought carried over from two of my internship's linters. Didn't realize we have both for some reason. This is fine.
Fixes #5511
RELEASE NOTES: