Skip to content

Logging fix or improvement around new shard ID generation#3192

Merged
enjoy-binbin merged 1 commit into
valkey-io:unstablefrom
enjoy-binbin:shard_id_log
Feb 24, 2026
Merged

Logging fix or improvement around new shard ID generation#3192
enjoy-binbin merged 1 commit into
valkey-io:unstablefrom
enjoy-binbin:shard_id_log

Conversation

@enjoy-binbin

Copy link
Copy Markdown
Member

When a cluster reset is performed on a replica node, a new shard ID is generated
because the node is about to become an empty primary node, see #2283.

However, the log added in #2510 caused some confusions. In clusterSetNodeAsPrimary
we will print:

serverLog(LL_NOTICE, "Reconfiguring node %.40s (%s) as primary for shard %.40s", n->name, humanNodename(n), n->shard_id);

In clusterReset, we first call clusterSetNodeAsPrimary and then generate a new
shard ID, which causes us to print an error shard ID log first.

There is an exmaple, when a replica node performs a cluster reset, we will print:

xxx * Cluster reset (user request from 'xxx').
xxx * Reconfiguring node af76a3e0ffcd77bd14fa47ce4d07ab2bdc78702f (xxx) as primary for shard ea528667634af8beed83adac2b9af8360769a1b4

But the node shard id is actually:

xxx> cluster myshardid
"52ede26d1554dd203161ba09011af14574b2cc84"

Now after a new shard ID is generated we will print a log, and we also move the
call to clusterSetNodeAsPrimary after the new shard id, so that we can have the
right one. After this PR:

xxx * Cluster reset (user request from 'xxx').
xxx * Moving myself to a new shard bd31870ce73f5977084e6a46e337a4a1ad38fc66.
xxx * Reconfiguring node 1d54b904efd30cd9d7d1abbfd63c8fafbb62e1c8 (xxx) as primary for shard bd31870ce73f5977084e6a46e337a4a1ad38fc66

This is part of #2989, but i guess we won't merge the extension fix in a short
time, so i am gonna extracting it separately as a log fix (or improvement).

When a cluster reset is performed on a replica node, a new shard ID is generated
because the node is about to become an empty primary node, see valkey-io#2283.

However, the log added in valkey-io#2510 caused some confusions. In clusterSetNodeAsPrimary
we will print:
```
serverLog(LL_NOTICE, "Reconfiguring node %.40s (%s) as primary for shard %.40s", n->name, humanNodename(n), n->shard_id);
```

In clusterReset, we first call clusterSetNodeAsPrimary and then generate a new
shard ID, which causes us to print an error shard ID log first.

There is an exmaple, when a replica node performs a cluster reset, we will print:
```
xxx * Cluster reset (user request from 'xxx').
xxx * Reconfiguring node af76a3e0ffcd77bd14fa47ce4d07ab2bdc78702f (xxx) as primary for shard ea528667634af8beed83adac2b9af8360769a1b4
```

But the node shard id is actually:
```
xxx> cluster myshardid
"52ede26d1554dd203161ba09011af14574b2cc84"
```

Now after a new shard ID is generated we will print a log, and we also move the
call to clusterSetNodeAsPrimary after the new shard id, so that we can have the
right one. After this PR:
```
xxx * Cluster reset (user request from 'xxx').
xxx * Moving myself to a new shard bd31870ce73f5977084e6a46e337a4a1ad38fc66.
xxx * Reconfiguring node 1d54b904efd30cd9d7d1abbfd63c8fafbb62e1c8 (xxx) as primary for shard bd31870ce73f5977084e6a46e337a4a1ad38fc66
```

This is part of valkey-io#2989, but i guess we won't merge the extension fix in a short
time, so i am gonna extracting it separately as a log fix (or improvement).

Signed-off-by: Binbin <binloveplay1314@qq.com>
@codecov

codecov Bot commented Feb 12, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.92%. Comparing base (c2aca72) to head (d356adc).
⚠️ Report is 28 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #3192      +/-   ##
============================================
+ Coverage     74.87%   74.92%   +0.05%     
============================================
  Files           129      129              
  Lines         71328    71332       +4     
============================================
+ Hits          53407    53449      +42     
+ Misses        17921    17883      -38     
Files with missing lines Coverage Δ
src/cluster_legacy.c 87.99% <100.00%> (-0.02%) ⬇️

... and 28 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@zuiderkwast zuiderkwast left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sarthakaggarwal97 sarthakaggarwal97 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@enjoy-binbin enjoy-binbin merged commit 537cb07 into valkey-io:unstable Feb 24, 2026
25 checks passed
@enjoy-binbin enjoy-binbin deleted the shard_id_log branch February 24, 2026 03:13
hpatro pushed a commit to hpatro/valkey that referenced this pull request Mar 5, 2026
…3192)

When a cluster reset is performed on a replica node, a new shard ID is generated
because the node is about to become an empty primary node, see valkey-io#2283.

However, the log added in valkey-io#2510 caused some confusions. In clusterSetNodeAsPrimary
we will print:
```
serverLog(LL_NOTICE, "Reconfiguring node %.40s (%s) as primary for shard %.40s", n->name, humanNodename(n), n->shard_id);
```

In clusterReset, we first call clusterSetNodeAsPrimary and then generate a new
shard ID, which causes us to print an error shard ID log first.

There is an exmaple, when a replica node performs a cluster reset, we will print:
```
xxx * Cluster reset (user request from 'xxx').
xxx * Reconfiguring node af76a3e0ffcd77bd14fa47ce4d07ab2bdc78702f (xxx) as primary for shard ea528667634af8beed83adac2b9af8360769a1b4
```

But the node shard id is actually:
```
xxx> cluster myshardid
"52ede26d1554dd203161ba09011af14574b2cc84"
```

Now after a new shard ID is generated we will print a log, and we also move the
call to clusterSetNodeAsPrimary after the new shard id, so that we can have the
right one. After this PR:
```
xxx * Cluster reset (user request from 'xxx').
xxx * Moving myself to a new shard bd31870ce73f5977084e6a46e337a4a1ad38fc66.
xxx * Reconfiguring node 1d54b904efd30cd9d7d1abbfd63c8fafbb62e1c8 (xxx) as primary for shard bd31870ce73f5977084e6a46e337a4a1ad38fc66
```

This is part of valkey-io#2989, but i guess we won't merge the extension fix in a short
time, so i am gonna extracting it separately as a log fix (or improvement).

Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants