Skip to content

transport: Invoke net.Conn.SetWriteDeadline in http2_client.Close#8534

Merged
dfawley merged 7 commits into
grpc:masterfrom
jgold2-stripe:keepalive
Sep 29, 2025
Merged

transport: Invoke net.Conn.SetWriteDeadline in http2_client.Close#8534
dfawley merged 7 commits into
grpc:masterfrom
jgold2-stripe:keepalive

Conversation

@jgold2-stripe

@jgold2-stripe jgold2-stripe commented Aug 21, 2025

Copy link
Copy Markdown
Contributor

Fixes: #8425

This PR adds a call to net.Conn.SetWriteDeadline, as discussed in #8425 (comment). Additionally, it updates the previous call to SetReadDeadline to log any non-nil error value (this doesn't affect behavior but proved helpful in some earlier debugging).

RELEASE NOTES:

  • client: Set a read deadline when closing a transport to prevent it from blocking indefinitely on a broken connection.

@jgold2-stripe jgold2-stripe changed the title Keepalive transport/http2_client.go: Invoke net.Conn.SetWriteDeadline in Close Aug 22, 2025
@codecov

codecov Bot commented Aug 22, 2025

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.24%. Comparing base (9186ebd) to head (6b3d325).
⚠️ Report is 61 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8534      +/-   ##
==========================================
- Coverage   82.47%   81.24%   -1.24%     
==========================================
  Files         413      415       +2     
  Lines       40513    40823     +310     
==========================================
- Hits        33415    33166     -249     
- Misses       5743     6101     +358     
- Partials     1355     1556     +201     
Files with missing lines Coverage Δ
internal/transport/http2_client.go 71.87% <100.00%> (-20.26%) ⬇️

... and 258 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@arjan-bal arjan-bal left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a test to verify the fix and catch regressions? You can refer to the test for the write deadline:

// hangingConn is a net.Conn wrapper for testing, simulating hanging connections
// after a GOAWAY frame is sent, of which Write operations pause until explicitly
// signaled or a timeout occurs.
type hangingConn struct {
net.Conn
hangConn chan struct{}
startHanging *atomic.Bool
}
func (hc *hangingConn) Write(b []byte) (n int, err error) {
n, err = hc.Conn.Write(b)
if hc.startHanging.Load() {
<-hc.hangConn
}
return n, err
}
// Tests the scenario where a client transport is closed and writing of the
// GOAWAY frame as part of the close does not complete because of a network
// hang. The test verifies that the client transport is closed without waiting
// for too long.
func (s) TestClientCloseReturnsEarlyWhenGoAwayWriteHangs(t *testing.T) {
// Override timer for writing GOAWAY to 0 so that the connection write
// always times out. It is equivalent of real network hang when conn
// write for goaway doesn't finish in specified deadline
origGoAwayLoopyTimeout := goAwayLoopyWriterTimeout
goAwayLoopyWriterTimeout = time.Millisecond
defer func() {
goAwayLoopyWriterTimeout = origGoAwayLoopyTimeout
}()
// Create the server set up.
connectCtx, cancel := context.WithTimeout(context.Background(), defaultTestTimeout)
defer cancel()
server := setUpServerOnly(t, 0, &ServerConfig{}, normal)
defer server.stop()
addr := resolver.Address{Addr: "localhost:" + server.port}
isGreetingDone := &atomic.Bool{}
hangConn := make(chan struct{})
defer close(hangConn)
dialer := func(_ context.Context, addr string) (net.Conn, error) {
conn, err := net.Dial("tcp", addr)
if err != nil {
return nil, err
}
return &hangingConn{Conn: conn, hangConn: hangConn, startHanging: isGreetingDone}, nil
}
copts := ConnectOptions{Dialer: dialer}
copts.ChannelzParent = channelzSubChannel(t)
ctx, cancel := context.WithTimeout(context.Background(), defaultTestTimeout)
defer cancel()
// Create client transport with custom dialer
ct, connErr := NewHTTP2Client(connectCtx, ctx, addr, copts, func(GoAwayReason) {})
if connErr != nil {
t.Fatalf("failed to create transport: %v", connErr)
}
if _, err := ct.NewStream(ctx, &CallHdr{}); err != nil {
t.Fatalf("Failed to open stream: %v", err)
}
isGreetingDone.Store(true)
ct.Close(errors.New("manually closed by client"))
}

Comment thread internal/transport/http2_client.go Outdated
Comment thread internal/transport/http2_client.go Outdated
@arjan-bal arjan-bal changed the title transport/http2_client.go: Invoke net.Conn.SetWriteDeadline in Close transport: Invoke net.Conn.SetWriteDeadline in http2_client.Close Aug 22, 2025
@arjan-bal arjan-bal added this to the 1.76 Release milestone Aug 22, 2025
@jgold2-stripe

Copy link
Copy Markdown
Contributor Author

PTAL @arjan-bal

@jgold2-stripe jgold2-stripe removed their assignment Aug 22, 2025
@arjan-bal arjan-bal self-assigned this Aug 25, 2025
Comment thread internal/transport/transport_test.go
Comment thread internal/transport/http2_client.go Outdated
func (t *http2Client) Close(err error) {
t.conn.SetWriteDeadline(time.Now().Add(time.Second * 10))
if err := t.conn.SetWriteDeadline(time.Now().Add(time.Second * 10)); err != nil {
logger.Warningf("Failed to set write deadline when closing connection: %v", err)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These logs are too chatty. They're appearing almost every time a connection is closed, e.g:

tlogger.go:126: WARNING http2_client.go:993 [transport] Failed to set write deadline when closing connection: set tcp 127.0.0.1:59286: use of closed network connection  (t=+590.186033ms)

I don't think we should log here because we are calling conn.Close in another goroutine and expecting these calls to fail.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped the logging.

Comment thread internal/transport/transport_test.go

@arjan-bal arjan-bal left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Adding a second reviewer.

@arjan-bal arjan-bal assigned dfawley and unassigned arjan-bal Aug 28, 2025
@arjan-bal arjan-bal requested a review from dfawley August 28, 2025 05:46

@dfawley dfawley left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the change. A couple small things to potentially change.

Comment on lines 992 to +995
t.conn.SetWriteDeadline(time.Now().Add(time.Second * 10))
// For background on the deadline value chosen here, see
// https://github.com/grpc/grpc-go/issues/8425#issuecomment-3057938248 .
t.conn.SetReadDeadline(time.Now().Add(time.Second))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better/simpler to just use SetDeadline with the same 1 second deadline, if possible.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's possible to do that, but am a little concerned about throwing that in here with the other changes, only because it seems like a possible load-bearing change for which I can't really vouch myself. I took a look back at #7371, where the write deadline was added, and couldn't seem to find any direct discussion about why the value of 10 was chosen, or whether a value of 1 might be problematic.

I don't mind just flipping this to 1, but also wonder if you prefer that we not do that in this PR, and then change it later? That way if everything else is good but the change from 10 to 1 introduces some separate problem, it's a simple revert of just that one-liner PR?

Not sure how your team likes to balance risk:throughput.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dfawley -- I didn't realize that my comment above may not have made it to you unless I explicitly request review, so I just did that.

I can make the change to replace calls to SetReadDeadline and SetWriteDeadline with a single call to SetDeadline, but want to check in with you about the concerns ^^^ re combining both changes in the same PR.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK either way, here. It's theoretically less risky to combine them because of the behavior change, but I'm not that concerned since the original number was picked pretty arbitrarily in the first place, and neither are expected to hit unless there is a problem with the connection.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should say "more risky", sorry. Let's just go with what's in this PR, then. Thank you!

co := ConnectOptions{
Dialer: dialer,
}
server, client, cancel := setUpWithOptions(t, 0, &ServerConfig{}, normal, co)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were to set read/write deadlines during handshaking (we currently only block on the context here, but our server does set deadlines here), then this test would be invalid. I think we should either confirm that the values are still false before calling Close, or clear them unconditionally then.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for finding that. I clear them now unconditionally.

@github-actions

Copy link
Copy Markdown

This PR is labeled as requiring an update from the reporter, and no update has been received after 6 days. If no update is provided in the next 7 days, this issue will be automatically closed.

@github-actions github-actions Bot added the stale label Sep 15, 2025
@Pranjali-2501 Pranjali-2501 modified the milestones: 1.76 Release, 1.77 Sep 17, 2025
@github-actions github-actions Bot removed the stale label Sep 17, 2025
@github-actions

Copy link
Copy Markdown

This PR is labeled as requiring an update from the reporter, and no update has been received after 6 days. If no update is provided in the next 7 days, this issue will be automatically closed.

@dfawley dfawley merged commit 2780563 into grpc:master Sep 29, 2025
17 checks passed
Pranjali-2501 pushed a commit to Pranjali-2501/grpc-go that referenced this pull request Oct 6, 2025
…grpc#8534)

Fixes: grpc#8425

This PR adds a call to `net.Conn.SetWriteDeadline`, as discussed in
grpc#8425 (comment).
Additionally, it updates the previous call to `SetReadDeadline` to log
any non-nil error value (this doesn't affect behavior but proved helpful
in some earlier debugging).

RELEASE NOTES:
* client: Set a read deadline when closing a transport to prevent it
from blocking indefinitely on a broken connection.
@github-actions github-actions Bot locked as resolved and limited conversation to collaborators Mar 29, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Transport close after keepalive failure blocks on unresponsive reader

5 participants