fix: treat EOF as retriable error on write stream#917
Merged
mattisonchao merged 2 commits intomainfrom Feb 26, 2026
Merged
Conversation
When a bidirectional write stream is closed by the server before delivering the gRPC status (e.g. the server returns an error from WriteStream before reading any messages), the client's stream.Recv() can return io.EOF instead of the proper gRPC status error. This is a transient condition that should be retried rather than treated as a permanent failure. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The test was asserting that io.EOF propagates to callers, but EOF is now correctly retried. Replace with codes.Internal which is a proper non-retriable gRPC error. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
f75b996 to
12498e9
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
WriteStreamhandler returnsCodeNodeIsNotLeaderbefore reading any messages), the client'sstream.Recv()can returnio.EOFinstead of the proper gRPC status errorio.EOFwas not classified as retriable inisRetriable(), so the batch retry logic treated it as a permanent failureTestLeaderHintWithClientfailures where theDizzyShardManagerforces the client to connect to a non-leader node — the server closes the stream immediately, the client gets EOF, and gives up instead of retrying with the leader hintRoot cause
In the
WriteStreamserver handler (public_rpc_server.go:184-186), when the node is not the leader,getLeader()returns an error and the handler returns immediately — before ever callingstream.Recv(). This causes gRPC to close the server-side stream, and the client'sRecv()can race between receiving the proper status error vsio.EOF.The
isRetriable()function checksstatus.Code(err), butstatus.Code(io.EOF)returnscodes.OKwhich falls through to the non-retriable default case.Fix
Add an explicit
errors.Is(err, io.EOF)check before the gRPC status code switch, treating EOF as a retriable transient condition.Test plan
go vet ./oxia/internal/batch/...passes🤖 Generated with Claude Code