fix(#4273): implement STICKY strategy to pin HTTP transactions to a concrete cluster member#4277
Conversation
…te cluster member CONNECTION_STRATEGY.STICKY was functionally identical to ROUND_ROBIN: no server pinning occurred during a transaction, causing "Remote transaction not found or expired" errors when a load-balancer distributed the begin/command/commit requests across different HA nodes. Fix: RemoteDatabase.begin() now pre-selects the leader (or currentServer as fallback) as the sticky target before the HTTP call. RemoteHttpComponent.getUrl() and httpCommand() consult stickyTransactionServer when STICKY is active, routing all three transaction phases to the same physical node. setSessionId(null) releases the pin when the transaction ends. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request addresses issue #4273 by implementing proper server pinning for the STICKY connection strategy. It introduces a stickyTransactionServer field to ensure that all phases of a remote transaction—begin, command, and commit/rollback—are routed to the same physical node, preventing 'transaction not found' errors in load-balanced environments. The changes include logic to pin the server during begin(), respect the pin in httpCommand(), and clear it upon transaction completion or failure. Feedback suggests refactoring the getUrl method in RemoteHttpComponent to eliminate duplicate URL construction logic and improve maintainability.
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Complexity | 0 |
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
Code Review - fix(#4273): STICKY strategy HTTP transaction pinningOverviewThis PR correctly identifies and fixes a real bug: Issues1. Documentation file should not be committed
2. Asymmetric exception handling in
// More consistent with commit/rollback:
} catch (final Exception e) {
throw new TransactionException(...);
} finally {
if (getSessionId() == null)
setStickyTransactionServer(null);
}3. In final boolean stickyPinned = connectionStrategy == CONNECTION_STRATEGY.STICKY && stickyTransactionServer != null;
// ...
else if (stickyPinned)
connectToServer = stickyTransactionServer; // field re-read - could be null if cleared concurrentlyIf final Pair<String, Integer> stickyPin = connectionStrategy == CONNECTION_STRATEGY.STICKY ? stickyTransactionServer : null;
final boolean stickyPinned = stickyPin != null;
// ...
else if (stickyPinned)
connectToServer = stickyPin; // safe - snapshot captured above4. The new Test CoverageThe 6 new unit tests cover URL routing correctly. What is not directly tested:
Positive Notes
SummaryThe bug fix is correct and the implementation is solid. Three actionable items before merging:
Item 2 on |
…in cleanup - RemoteHttpComponent.getUrl: extract host/port into local vars (gemini) - RemoteHttpComponent.httpCommand: snapshot stickyTransactionServer once to prevent NPE if cleared between check and read (claude) - RemoteDatabase.begin: switch from catch-based cleanup to finally-based cleanup so the pin is released whenever the session was not set, consistent with commit/rollback (claude) - RemoteDatabase: extract resolveStickyTargetServer helper so the leader-vs-fallback selection can be unit tested (claude) - RemoteDatabaseTest: 2 new tests covering the helper (fallback to currentServer when no leader; leader preferred when resolved) - Drop the issue-specific tracking doc from the PR (claude) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Review feedback addressed (commit
|
Code ReviewOverviewThis PR correctly fixes a real bug: CorrectnessTwo independent routing paths are both covered:
Pin lifecycle is sound:
Retry behaviour with a pinned server: IssuesMinor: Missing test for the failure path of There is no test verifying that Minor: Double-clear when If the server returns 204 but omits the Pre-existing: The public method Design NotesSplit routing paths (acceptable trade-off): STICKY without an active transaction falls back to current server: Test CoverageThe 6 new tests cover the core invariants well:
What is not covered (acceptable for unit-test scope):
SummaryThe fix is correct and well-scoped. The code is clean, follows existing patterns, and the tests validate the core invariants. Two items worth addressing before merge:
Everything else is minor polish.
|
|
/gemini review |
…ng comments - RemoteDatabaseTest.stickyStrategyClearsPinWhenBeginFails: points at a closed port so the begin HTTP call fails fast; verifies the sticky pin is released by the finally block when no session is established (claude #1) - RemoteHttpComponent.getUrl + httpCommand: brief leading comments pointing out that STICKY-related routing changes must update both paths (claude #2) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request implements session-pinning for the STICKY connection strategy, ensuring that all operations within a transaction are routed to the same cluster member. The changes include logic to resolve the leader or current server as a sticky target, manage the pin's lifecycle based on the session state, and update HTTP routing to respect the pinned server. Review feedback suggests refactoring the duplicated logic for retrieving the sticky pin into a shared helper method to improve code maintainability.
| Exception lastException = null; | ||
|
|
There was a problem hiding this comment.
The logic to determine the active sticky pin is duplicated here and in the getUrl method. Consider extracting this into a private helper method like getStickyPin() to improve maintainability and ensure consistency across the class.
| Exception lastException = null; | |
| final Pair<String, Integer> stickyPin = getStickyPin(); | |
| final boolean stickyPinned = stickyPin != null; |
| // Regular query/command traffic flows through httpCommand() - any STICKY routing | ||
| // change must update both paths. | ||
| protected String getUrl(final String command) { | ||
| return protocol + "://" + currentServer + ":" + currentPort + "/api/v" + apiVersion + "/" + command; | ||
| final String host; | ||
| final int port; | ||
| final Pair<String, Integer> pin = connectionStrategy == CONNECTION_STRATEGY.STICKY ? stickyTransactionServer : null; | ||
| if (pin != null) { | ||
| host = pin.getFirst(); |
There was a problem hiding this comment.
This logic for resolving the host and port based on the sticky strategy is slightly verbose and duplicates the strategy check. Using a helper method to retrieve the pin would simplify this block.
final Pair<String, Integer> pin = getStickyPin();
final String host = pin != null ? pin.getFirst() : currentServer;
final int port = pin != null ? pin.getSecond() : currentPort;
Code ReviewGood fix for a real HA bug. The root cause analysis is correct and the lifecycle management (pin on begin, clear on session-end or failure) is sound. A few concerns worth addressing before merge. Dual routing paths - maintenance riskThe most fragile part of this change is that STICKY routing logic must now be kept in sync across two independent code paths:
The comments call this out, but comments are not enforcement. If someone adds a new operation that routes through Thread safety of
|
| Issue | Severity |
|---|---|
| Dual routing paths require synchronized changes | Medium |
Thread safety of stickyTransactionServer not documented |
Medium |
| No test for commit/rollback pin lifecycle | Minor |
| Port 65530 can be flaky in CI | Minor |
| Multi-line Javadoc blocks (CLAUDE.md style) | Minor |
The core logic is correct. Addressing the dual-path routing concern (or at least extracting resolveTargetAddress()) would significantly reduce future maintenance risk.
Review feedback on cycle 3 HEAD
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4277 +/- ##
============================================
- Coverage 64.69% 64.46% -0.24%
- Complexity 0 423 +423
============================================
Files 1645 1645
Lines 127456 127474 +18
Branches 27316 27324 +8
============================================
- Hits 82458 82173 -285
- Misses 33448 33686 +238
- Partials 11550 11615 +65 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
RemoteStickyConnectionStrategyIT exercises a 3-node Raft cluster end-to-end with RemoteDatabase pointed at a follower's HTTP port: - STICKY allows a write transaction (begin/command/commit) to succeed when the client is connected to a follower - all three phases pin to the leader - ROUND_ROBIN smoke check (logs the racy 'transaction not found' outcome) - Pin is released after commit so a second STICKY transaction repins fresh - Pin is released after rollback so a subsequent STICKY transaction succeeds Class is tagged @tag("slow"); runs in ~53s. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Code Review - Final CycleThis is the third automated review pass. The first two cycles identified and the author addressed: the docs file removal, the Remaining open items (unchanged from previous cycles)1. Duplicate sticky-pin retrieval logic (medium) Both 2. Port 65530 in Port 65530 is in the Linux ephemeral range (49152-65535) and can be allocated to another process during CI, making the test non-deterministic. A reserved port (e.g. 1) or binding-then-closing a 3. Multi-line Javadoc on Per CLAUDE.md: one short line max. The method name is self-describing; the load-balancer rationale belongs in the PR description. 4. Thread-safety contract not documented (minor)
SummaryThe bug fix is correct, the lifecycle state machine (pin on begin, clear on failure or session-end) is sound, and the test coverage for the pinning invariants is good. Items 1 and 2 are worth addressing before merge; items 3 and 4 can be polish in a follow-on commit.
|
…en comment - RemoteHttpComponent: extract private getStickyPin() and use it from both getUrl() and httpCommand() (gemini #1 / claude #1 - duplicate logic) - RemoteDatabaseTest.stickyStrategyClearsPinWhenBeginFails: replace hardcoded port 65530 (in IANA ephemeral range) with ServerSocket(0)+close so the port is OS-assigned and deterministically closed (claude #2) - RemoteDatabase.resolveStickyTargetServer: collapse 3-line Javadoc into a single-line // comment per CLAUDE.md (claude #3) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Review response (commit
|
Code Review - fix(#4273): STICKY strategy HTTP transaction pinning (final pass)Three previous review cycles have already been completed. The author addressed all raised items in commit `ee62d145e`. This pass reviews the current diff as-is. What the PR doesBefore this fix, `CONNECTION_STRATEGY.STICKY` was a dead branch: `httpCommand()` only branched on `FIXED`, leaving STICKY and ROUND_ROBIN identical. In an HA deployment behind a load-balancer, `begin` could land on pod-A while `command`/`commit` hit pod-B, producing "Remote transaction not found or expired" errors. The fix adds a `stickyTransactionServer` pin that is set in `begin()`, honoured by both `getUrl()` and `httpCommand()`, and cleared on session end or begin failure. Verification of previously raised items
Remaining observations1. Taking 2. The test catches and logs any exception from the ROUND_ROBIN transaction without asserting anything. This means a connection-refused or NPE would also pass silently. Given the documented intent (ROUND_ROBIN failures are racily acceptable), this is acceptable as a negative-control check, but worth noting it will not catch regressions unrelated to routing. 3. Codecov reports 62.5% patch coverage for The uncovered lines are in the sticky-pin failover retry path (when the pinned server is unreachable). This path is hard to exercise without a real cluster and is excluded from the unit-test scope. The IT test `RemoteStickyConnectionStrategyIT` covers the live-cluster path. Acceptable. Overall assessmentThe bug fix is correct. The lifecycle state machine (pin on begin, clear on failure or session-end) is sound. The `getStickyPin()` helper eliminates the previously flagged duplication. The `ServerSocket(0)` approach in the failure test is portable and deterministic. The IT test covers all four documented scenarios against a real 3-node Raft cluster. No blocking issues remain. Ready to merge pending maintainer sign-off.
|
| needs: build-and-package | ||
| steps: | ||
| - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 | ||
|
|
||
| - name: Set up JDK 21 | ||
| uses: actions/setup-java@be666c2fcd27ec809703dec50e508c2fdc7f6654 # v5.2.0 | ||
| with: | ||
| distribution: "temurin" | ||
| java-version: 21 | ||
| cache: "maven" | ||
|
|
||
| - name: Restore Maven artifacts | ||
| uses: actions/cache/restore@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5 | ||
| with: | ||
| path: ~/.m2/repository | ||
| key: maven-repo-${{ github.run_id }}-${{ github.run_attempt }} | ||
|
|
||
| - name: Run HA Integration Tests with Coverage | ||
| run: ./mvnw verify -DskipTests -Pintegration -Pcoverage --batch-mode --errors --fail-never --show-version -pl ha-raft | ||
| env: | ||
| GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} | ||
|
|
||
| - name: HA IT Tests Reporter | ||
| uses: dorny/test-reporter@a43b3a5f7366b97d083190328d2c652e1a8b6aa2 # v3.0.0 | ||
| if: success() || failure() | ||
| with: | ||
| name: HA IT Tests Report | ||
| path: "**/failsafe-reports/TEST*.xml" |
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Complexity | 0 |
🟢 Coverage 90.91% diff coverage · -7.62% coverage variation
Metric Results Coverage variation ✅ -7.62% coverage variation Diff coverage ✅ 90.91% diff coverage Coverage variation details
Coverable lines Covered lines Coverage Common ancestor commit (6918763) 127456 94037 73.78% Head commit (33f6e71) 159152 (+31696) 105288 (+11251) 66.16% (-7.62%) Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch:
<coverage of head commit> - <coverage of common ancestor commit>Diff coverage details
Coverable lines Covered lines Diff coverage Pull request (#4277) 22 20 90.91% Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified:
<covered lines added or modified>/<coverable lines added or modified> * 100%
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
Code Review - STICKY Strategy Fix (#4273)OverviewThis PR correctly identifies and fixes a long-standing no-op: all three CorrectnessPin lifecycle - one subtle edge case in The } finally {
if (getSessionId() == null)
setStickyTransactionServer(null);
}This works correctly when the HTTP call fails outright. However, if the server returns HTTP 204 but omits the Double-clear is harmless but worth acknowledging: When a transaction ends normally, Thread Safety
private Pair<String, Integer> stickyTransactionServer;The existing private volatile Pair<String, Integer> stickyTransactionServer;Minor Code Quality PointsTwo routing paths must stay in sync (documented, but fragile): The comments on
This is reasonable. The Comments on Test CoverageCoverage is thorough:
One suggestion: the ROUND_ROBIN smoke test swallows any exception: } catch (final Exception e) {
LogManager.instance().log(this, Level.INFO, "TEST: ROUND_ROBIN smoke run raised %s", e.getMessage());
}This means the test always passes regardless of outcome. A comment explaining this is intentional (verifying the path does not throw unexpectedly other than the known transaction-not-found error) would help future readers. Alternatively, asserting that any thrown exception is an instance of CI ChangesSplitting SummaryThe fix is correct and the approach is well-designed. Two items worth a follow-up before merge:
Everything else is solid. Good regression test coverage and a clean CI split for HA tests. Review generated with Claude Code |
Closes #4273
Summary
CONNECTION_STRATEGY.STICKYwas a no-op: all threeif-branches inRemoteHttpComponent.httpCommand()that distinguished strategies only branched onFIXED, leavingSTICKYandROUND_ROBINbehaviorally identical. In an HA deployment behind a load-balancer, thebeginrequest could land on pod-A while subsequentcommand/commitrequests were routed to pod-B (or to the concreteleaderServeraddress), causing "Remote transaction not found or expired" failures.Changes:
RemoteHttpComponent: addedstickyTransactionServerfield +setStickyTransactionServer()setter.getUrl()returns the pinned address when strategy isSTICKYand the field is set.httpCommand()usesstickyTransactionServeras the initial target and treats it likeFIXEDfor retry-count and failover (no round-robin rotation during an active pinned transaction).RemoteDatabase.begin(): before issuing the HTTP call, resolves a concrete cluster-member address (leader preferred,currentServer/currentPortas fallback) and pins it asstickyTransactionServer. ThegetUrl()override then routes the begin request - and all subsequentcommand/commit/rollbackcalls - to that same node. Clears the pin on failure.RemoteDatabase.setSessionId(null): clearsstickyTransactionServerwhen the session ends (commit, rollback, or exception) so the next transaction starts fresh.Test plan
RemoteHttpComponentTest- 4 new unit tests:stickyStrategyUrlUsesPinnedServer- URL uses pinned address when STICKY + pin setstickyStrategyUrlUsesCurrentServerWhenNotPinned- STICKY without pin falls back to LB hoststickyStrategyUrlRevertsAfterClearingPin- clearing the pin restores the LB host URLroundRobinStrategyIgnoresStickyServer- ROUND_ROBIN never uses the sticky fieldRemoteDatabaseTest- 2 new unit tests:stickyStrategyUrlUsesPinnedServerDuringActiveSession- URL uses pinned address during transactionstickyStrategySessionClearClearsStickyServer-setSessionId(null)releases the pinRun with:
🤖 Generated with Claude Code