fix: support retry for RangeScan operations by mattisonchao · Pull Request #938 · oxia-db/oxia

mattisonchao · 2026-03-08T16:42:50Z

Summary

Add retry with exponential backoff and leader hint extraction to rangeScanFromShard(), following the same pattern used by read and write batches
To avoid duplicate records, retries are only attempted before any data has been sent to the channel
Export IsRetriable from the batch package so it can be reused by RangeScan retry logic

Test plan

Added TestLeaderHintRangeScanWithoutClient — verifies leader hint is returned for RangeScan RPC on non-leader node
Added TestLeaderHintRangeScanWithClient — verifies end-to-end RangeScan with DizzyShardManager (forces client to hit wrong node first)
All existing tests pass

mattisonchao

Self-review: PR #938 (RangeScan retry)

Code correctness

Retry logic — Same backoff.RetryNotify + FindLeaderHint pattern as List PR and read/write batches. retryCtx (with requestTimeout) is correctly used for both backoff timing and RPC calls.

Duplicate prevention — Same dataSent flag approach as List PR. Retries only before data flows to the channel.

Behavior change: `close(ch)` is now always called

The original rangeScanFromShard had a subtle issue:

client, err := c.executor.ExecuteRangeScan(ctx, request, nil)
if err != nil {
    ch <- GetResult{Err: err}
    return  // ← channel NOT closed
}
defer close(ch)  // ← only set up on success

If ExecuteRangeScan failed (e.g. connection refused), the channel was never closed. In the single-shard path, the consumer would receive the error but never see channel close — any for range ch would block forever after the error.

The new code always calls close(ch) at the end of rangeScanFromShard:

if err != nil {
    ch <- GetResult{Err: err}
}
close(ch)

This is a bug fix — the consumer now always sees a proper channel close. In the multi-shard path, the per-shard channels are also always closed, which lets aggregateAndSortRangeScanAcrossShards know the shard is done.

Observations

requestTimeout may be tight for large RangeScan operations — same observation as List PR. Consistent with other retry paths.
Test pre-allocates results slice (make([]oxia.GetResult, 0, 2)) to satisfy prealloc lint.
Both PRs export IsRetriable independently — merge-order safe.

Verdict

LGTM. The close(ch) change is an improvement over the original behavior.

RangeScan operations previously had no retry logic and passed nil as the leader hint. When the client connected to a non-leader node, the request would fail immediately instead of retrying with the correct leader. This adds retry with exponential backoff and leader hint extraction to rangeScanFromShard, following the same pattern used by read and write batches. To avoid duplicate records, retries are only attempted before any data has been sent to the channel. Also exports IsRetriable from the batch package so it can be reused by the RangeScan retry logic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: mattisonchao <mattisonchao@gmail.com>

The assignment dispatcher on CI may take longer to propagate shard assignments. Increase the Eventually timeout to avoid flaky failures. Signed-off-by: Mattison <mattison@streamnative.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: mattisonchao <mattisonchao@gmail.com>

Signed-off-by: Mattison <mattison@streamnative.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: mattisonchao <mattisonchao@gmail.com>

mattisonchao requested review from RobertIndie, coderzc and merlimat as code owners March 8, 2026 16:42

mattisonchao changed the title ~~fix: support leader hint retry for RangeScan operations~~ fix: support retry for RangeScan operations Mar 8, 2026

mattisonchao self-assigned this Mar 8, 2026

mattisonchao force-pushed the feat/range-scan-leader-hint-retry branch 2 times, most recently from 11eb255 to dd52df0 Compare March 8, 2026 18:09

mattisonchao commented Mar 8, 2026

View reviewed changes

mattisonchao force-pushed the feat/range-scan-leader-hint-retry branch 2 times, most recently from 7c207dd to 7ea4160 Compare March 8, 2026 18:34

mattisonchao changed the base branch from main to feat/list-leader-hint-retry March 8, 2026 18:34

mattisonchao changed the base branch from feat/list-leader-hint-retry to main March 8, 2026 18:47

mattisonchao force-pushed the feat/range-scan-leader-hint-retry branch from 5f00681 to e4092ae Compare March 8, 2026 18:50

mattisonchao and others added 3 commits March 9, 2026 08:43

ci: trigger CI after base branch change

3d1f732

Signed-off-by: Mattison <mattison@streamnative.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: mattisonchao <mattisonchao@gmail.com>

mattisonchao force-pushed the feat/range-scan-leader-hint-retry branch from e4092ae to 3d1f732 Compare March 9, 2026 00:43

mattisonchao merged commit 2d97a69 into main Mar 9, 2026
9 checks passed

mattisonchao deleted the feat/range-scan-leader-hint-retry branch March 9, 2026 00:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: support retry for RangeScan operations#938

fix: support retry for RangeScan operations#938
mattisonchao merged 3 commits intomainfrom
feat/range-scan-leader-hint-retry

mattisonchao commented Mar 8, 2026 •

edited

Loading

Uh oh!

mattisonchao left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mattisonchao commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

mattisonchao left a comment

Choose a reason for hiding this comment

Self-review: PR #938 (RangeScan retry)

Code correctness

Behavior change: close(ch) is now always called

Observations

Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mattisonchao commented Mar 8, 2026 •

edited

Loading

Behavior change: `close(ch)` is now always called