fix(#4274): getReplicaAddresses excludes localPeer instead of leader#4280
Conversation
On follower nodes, getReplicaAddresses() was filtering out localPeerId instead of the Raft leader. This caused the leader to appear inside replicaAddresses and the serving follower to be missing from the list, giving every node a different view of the cluster topology. Fix: exclude the leader ID when known, fall back to localPeerId during elections (preserves prior behavior when no leader is elected yet).
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Complexity | 0 |
🟢 Coverage 0.00% diff coverage · -7.39% coverage variation
Metric Results Coverage variation ✅ -7.39% coverage variation Diff coverage ✅ 0.00% diff coverage Coverage variation details
Coverable lines Covered lines Coverage Common ancestor commit (6e4de2c) 127476 93362 73.24% Head commit (86fb190) 159158 (+31682) 104810 (+11448) 65.85% (-7.39%) Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch:
<coverage of head commit> - <coverage of common ancestor commit>Diff coverage details
Coverable lines Covered lines Diff coverage Pull request (#4280) 6 0 0.00% Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified:
<covered lines added or modified>/<coverable lines added or modified> * 100%
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
There was a problem hiding this comment.
Code Review
This pull request addresses a cluster topology inconsistency where follower nodes incorrectly included the leader in their replica list while excluding themselves. The fix updates getReplicaAddresses() in RaftHAServer to exclude the leader ID (falling back to the local peer ID if no leader is elected) instead of always excluding the local peer. A new regression test, clusterTopologyIsConsistentAcrossNodes, was added to verify that all nodes report a consistent view of the cluster. Feedback suggests that the getStats() method likely contains the same logic error and should be updated to maintain consistency across monitoring APIs.
| @@ -681,9 +681,11 @@ public Map<String, Object> getStats() { | |||
| } | |||
There was a problem hiding this comment.
The getStats() method (lines 655-681) appears to have the same logic issue as getReplicaAddresses(). It filters the replicas list using localPeerId instead of the leader ID. This means that on a follower node, the leader will be included in the replicas list, while the follower itself will be missing. To ensure consistency across all monitoring and status APIs, consider applying a similar fix to getStats() so that it also excludes the leader (or falls back to localPeerId if no leader is known).
There was a problem hiding this comment.
Fixed. Applied the same leaderId != null ? leaderId : localPeerId exclusion pattern to getStats() in the follow-up commit.
Code Review - fix(#4274): getReplicaAddresses excludes localPeer instead of leaderOverviewThe PR correctly identifies and fixes a real bug: Fix (
|
| Area | Status |
|---|---|
| Root cause analysis | Correct |
| Fix logic | Correct |
| Fallback during elections | Acceptable (comment recommended) |
| Regression test coverage | Good, with two small gaps |
| Docs file in repo | Should be removed |
The core change is correct and the test is a good addition. Addressing the replicaAddresses string comparison fragility and removing the docs file are the two items worth fixing before merge.
Code ReviewThe fix is correct. The root cause is clearly identified:
|
| Area | Status |
|---|---|
| Root cause | Correctly identified |
| Fix logic | Correct |
getStats() also fixed |
Good catch (same pattern) |
| Regression test - coverage | Good |
Regression test - doesNotContain fragility |
Fix needed |
| Regression test - ordering sensitivity | Fix needed |
| Regression test - non-empty guard | Add recommended |
| Docs file in repo | Remove |
The core change is solid. The three test robustness items and the docs file are the things worth addressing before merge.
|
Code Review for PR 4280. Overview: This PR fixes a real bug in RaftHAServer - both getReplicaAddresses() and getStats() were excluding localPeerId from the replica list, when they should exclude the leader. On follower nodes this produced an inverted view. The fix is correct. Issue 1 - docs/4274-cluster-topology-consistency.md should not be committed, it is an AI process artifact; remove before merging. Issue 2 - replicaAddresses string comparison is order-dependent, fine for 2 nodes but a set-based comparison is more robust for 3+. Issue 3 - election-period topology inconsistency is worth a comment. Positives: fix is minimal and consistent across both methods, stepDown() at line 784 correctly keeps using localPeerId, regression test has proper try/finally and AssertJ assertions. Summary: correct and ready to merge after removing the docs artifact file. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4280 +/- ##
============================================
- Coverage 64.16% 64.15% -0.02%
+ Complexity 445 444 -1
============================================
Files 1645 1645
Lines 127476 127480 +4
Branches 27324 27326 +2
============================================
- Hits 81796 81781 -15
- Misses 34145 34170 +25
+ Partials 11535 11529 -6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Closes #4274
Summary
RaftHAServer.getReplicaAddresses()was filtering outlocalPeerId(the node that happens to serve the HTTP request) instead of the Raft leader. On a follower node this caused two anomalies:replicaAddresses(it was not the local peer, so it was included).replicaAddresses(it was the local peer, so it was excluded).The leader node produced the correct view only by coincidence, because
localPeerId == leaderIdthere.Fix: exclude the leader ID when known; fall back to
localPeerIdduring elections (no leader elected yet) to preserve prior behavior.Test plan
clusterTopologyIsConsistentAcrossNodesinRaftHTTP2ServersIT:?mode=clusterendpointleaderAddressis identical across all nodesreplicaAddressesdoes not contain theleaderAddressreplicaAddressesis identical across all nodesRaftHTTP2ServersITpass locally