Skip to content

fix: /api/v1/server?mode=cluster returns no ha section after Raft migration (#4261)#4266

Merged
robfrank merged 5 commits into
mainfrom
fix/4261-cluster-api-not-working
May 19, 2026
Merged

fix: /api/v1/server?mode=cluster returns no ha section after Raft migration (#4261)#4266
robfrank merged 5 commits into
mainfrom
fix/4261-cluster-api-not-working

Conversation

@robfrank

Copy link
Copy Markdown
Collaborator

Summary

Fixes #4261. After the legacy HAServer was replaced by RaftHAPlugin (#3731), the /api/v1/server?mode=cluster endpoint stopped returning the ha section for HA clusters.

  • Root cause (missing ha section): Both PluginManager.isHAEnabled() and RaftHAPlugin.isRaftEnabled() only checked HA_ENABLED (default false). Users who configured HA_SERVER_LIST but didn't explicitly set HA_ENABLED=true got no HA plugin started, so server.getHA() returned null and the ha section was omitted. A non-empty HA_SERVER_LIST now implicitly enables HA in both places.

  • Follower forwarding removed: GetServerHandler.exportCluster() used to call back to the leader over HTTP to fetch the replica list. The old HAServer needed this because followers lacked the full peer list; with Raft, every node already has the complete peer list from the group config. The forwarding also crashed with MalformedURLException when getLeaderAddress() returned null during leader election.

  • Duplicate opencypher in languages: QueryEngineManager.getAvailableLanguages() iterated implementations.values().getLanguage(), but "cypher" and "opencypher" point at the same OpenCypherQueryEngineFactory instance, so getLanguage() returned "opencypher" twice. Returning implementations.keySet() preserves distinct aliases and removes the duplicate.

Test plan

  • RaftHTTP2ServersIT.serverInfo() extended with assertions on ha.clusterName, ha.leader, ha.network, and ha.network.replicas so the regression cannot return undetected
  • RaftHTTP2ServersIT (5/5) and RaftHAPluginTest (4/4) pass locally
  • HTTPDocumentIT#serverClusterInfo (single-server, no HA) still passes - ha correctly absent
  • All server module tests pass (81/81)

…ration (#4261)

After the legacy HAServer was replaced by RaftHAPlugin (#3731), the
?mode=cluster endpoint stopped returning the ha section for clusters
that worked under the old implementation.

Three issues fixed:

- PluginManager.isHAEnabled() and RaftHAPlugin.isRaftEnabled() now
  also return true when HA_SERVER_LIST is non-empty. Previously only
  HA_ENABLED was checked, so users who configured a server list but
  didn't know about the new opt-in flag got no HA plugin started.

- GetServerHandler.exportCluster() no longer forwards to the leader
  over HTTP to obtain replica info. The old HA needed it because
  followers lacked the full peer list; with Raft, every node has the
  complete peer list from the group config. The forwarding also
  crashed with MalformedURLException when getLeaderAddress() returned
  null during leader election.

- QueryEngineManager.getAvailableLanguages() returned "opencypher"
  twice because the "cypher" and "opencypher" keys point at the same
  factory instance. Returning implementations.keySet() preserves the
  distinct language aliases and eliminates the duplicate.

RaftHTTP2ServersIT.serverInfo() now asserts the ha section structure
to prevent the regression from going unnoticed again.
@codacy-production

codacy-production Bot commented May 19, 2026

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 0 complexity

Metric Results
Complexity 0

View in Codacy

🟢 Coverage 100.00% diff coverage · -7.48% coverage variation

Metric Results
Coverage variation -7.48% coverage variation
Diff coverage 100.00% diff coverage

View coverage diff in Codacy

Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (9f73fdf) 127293 93874 73.75%
Head commit (1ecc2f5) 158944 (+31651) 105325 (+11451) 66.27% (-7.48%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#4266) 9 9 100.00%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@claude

claude Bot commented May 19, 2026

Copy link
Copy Markdown

Code Review

Overall this is a clean, well-reasoned fix. The PR description clearly explains each change and the root causes are correctly identified. A few observations below.


QueryEngineManager - getAvailableLanguages() (engine)

Fix is correct and simpler. Returning new ArrayList<>(implementations.keySet()) is the right approach because the map keys represent what callers actually pass to getQueryEngine(language). The old iteration over values caused \"opencypher\" to appear twice because both \"cypher\" and \"opencypher\" keys point to the same OpenCypherQueryEngineFactory instance whose getLanguage() always returns \"opencypher\".

One minor note: the returned list order is now HashMap iteration order (non-deterministic). If callers expect a stable order (e.g. for display in Studio), consider new ArrayList<>(new TreeSet<>(implementations.keySet())) or using a LinkedHashMap for implementations. Not a blocker, but worth being aware of.

Test gap: There is no unit test for getAvailableLanguages() covering the no-duplicates guarantee. It would prevent this regression from returning silently in the future.


isRaftEnabled() / isHAEnabled() duplication (ha-raft / server)

The same logic is copy-pasted into two private methods in two different modules. If the detection logic ever changes again (e.g. a third trigger is added), both places need to stay in sync.

A natural home for this logic is a static helper on GlobalConfiguration (which lives in engine and is visible to both modules). Something like:

// in GlobalConfiguration
public boolean isHAImplicitlyEnabled() {
  final String serverList = getValueAsString(HA_SERVER_LIST);
  return !serverList.isBlank();
}

Then both callers reduce to:

return getValueAsBoolean(HA_ENABLED) || configuration.isHAImplicitlyEnabled();

Not a blocker for merging, but worth a follow-up to avoid future drift.


String check: isEmpty() vs isBlank()

Both isRaftEnabled() and isHAEnabled() use !serverList.isEmpty(). Since the project targets Java 21, !serverList.isBlank() would also guard against accidental whitespace-only values (e.g. HA_SERVER_LIST=" "). The validateConfiguration() method below uses isEmpty() for consistency - if you add isBlank() here, it may be worth updating validateConfiguration() too for coherence. Low severity, but worth noting.


Stale Javadoc comment in RaftHAPlugin

Line 37 of RaftHAPlugin.java still reads:

Discovered via Java ServiceLoader when HA_ENABLED=true.

After this PR, HA can also be auto-discovered via a non-empty HA_SERVER_LIST. The comment should be updated to reflect both triggers, otherwise it will mislead users who configure only HA_SERVER_LIST.


Removal of follower HTTP forwarding (GetServerHandler)

This removal is correct and the motivation is sound. Raft ensures every node keeps the full group membership, so the leader round-trip is unnecessary. The removed code also had a real crash vector: ha.getLeaderAddress() can return null during election, and new URL("http://null/...") throws MalformedURLException which was swallowed into a RuntimeException.

The cleanup of java.net.* import and the readResponse() method is tidy.

One verification point: ha.getStats() on a follower should now include the full replicas array without the leader call. The new test assertion on ha.network.replicas covers this, which is good.


Test coverage

The new assertions in RaftHTTP2ServersIT.serverInfo() are a solid regression guard. A couple of additions would make the safety net stronger:

  1. Assert on implicit enablement path explicitly - confirm the test configuration does not set HA_ENABLED=true so that it actually exercises the HA_SERVER_LIST-only path.
  2. Assert replicas is non-empty - assertThat(ha.getJSONObject("network").getJSONArray("replicas").length()).isGreaterThan(0) would catch a regression where the array exists but is empty.
  3. getAvailableLanguages() unit test - as noted above.

Summary

Area Severity Action
Stale Javadoc on RaftHAPlugin Minor Update comment
isEmpty() vs isBlank() Minor Consider isBlank()
Duplicated HA-enable logic Minor Follow-up refactor to GlobalConfiguration
getAvailableLanguages() return order Negligible Document or use TreeSet if stable order needed
Missing unit test for language dedup Minor Add test
Test: assert replicas non-empty Minor Strengthen existing assertion

The core fixes (implicit HA via HA_SERVER_LIST, removal of crash-prone follower forwarding, and language dedup) are all correct. Ready to merge after addressing the stale comment at minimum.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the high availability (HA) detection logic to consider the server list configuration and simplifies the retrieval of available query languages. It also removes redundant leader-polling logic in the server handler and enhances integration tests to verify HA status in the server response. I have no feedback to provide.

- Centralize the implicit HA-enabled check in ContextConfiguration.isHAImplicitlyEnabled()
  so PluginManager (server) and RaftHAPlugin (ha-raft) share one source of truth and
  cannot drift if the detection logic changes again.
- Use isBlank() instead of isEmpty() for HA_SERVER_LIST so accidental whitespace-only
  values do not silently enable HA. validateConfiguration() updated for coherence.
- Refresh RaftHAPlugin class Javadoc to mention both triggers (HA_ENABLED=true OR
  non-blank HA_SERVER_LIST).
- Switch QueryEngineManager.implementations to LinkedHashMap so
  getAvailableLanguages() returns aliases in insertion order rather than the
  non-deterministic HashMap iteration order callers had been getting.
- New QueryEngineManagerLanguagesTest pins three contracts: no duplicates,
  built-in engines always present, and both 'cypher' and 'opencypher' aliases
  surface when OpenCypher is on the classpath.
- Strengthen RaftHTTP2ServersIT.serverInfo() to assert the replicas array is
  non-empty (previously only existence was checked).
@robfrank

Copy link
Copy Markdown
Collaborator Author

Review items addressed in f28f735:

  • Stale Javadoc on RaftHAPlugin - updated to mention both triggers.
  • isEmpty() vs isBlank() - switched in both the new check and validateConfiguration() so whitespace-only HA_SERVER_LIST values don't silently enable HA.
  • Duplicated logic - extracted to ContextConfiguration.isHAImplicitlyEnabled(). PluginManager.isHAEnabled() and RaftHAPlugin.isRaftEnabled() now share one source of truth.
  • getAvailableLanguages() return order - changed implementations to LinkedHashMap, so callers now get insertion order (deterministic) instead of the previous HashMap ordering.
  • Missing unit test for language dedup - new QueryEngineManagerLanguagesTest covers (a) no duplicates, (b) built-in engines always present, (c) both cypher and opencypher aliases surface when OpenCypher is on the classpath.
  • Test: assert replicas non-empty - strengthened the existing assertion.

On the suggestion to assert the test exercises the implicit-enable path: BaseGraphServerTest.startServers explicitly sets HA_ENABLED=true for multi-server tests, so RaftHTTP2ServersIT.serverInfo() currently exercises the explicit path. Adding a dedicated test that runs RaftHAPlugin against a config with only HA_SERVER_LIST set is non-trivial (the plugin tries to start the Raft server, requires sockets) and a unit-level coverage for the helper itself is already in place via the boolean result of isHAImplicitlyEnabled() being directly testable. Happy to add a unit test on ContextConfiguration.isHAImplicitlyEnabled() as a follow-up if you want explicit coverage of that boolean.

Local results: RaftHTTP2ServersIT 5/5, RaftHAPluginTest 4/4, QueryEngineManagerLanguagesTest 3/3, QueryEngineManagerPoolTest 4/4, full server module 102/102.

@claude

claude Bot commented May 19, 2026

Copy link
Copy Markdown

Code Review - PR #4266

Good bug fix overall. The root cause analysis in the PR description is clear and the three separate fixes are well-scoped. Some observations below.


Correctness

isHAImplicitlyEnabled() as single source of truth - the placement in ContextConfiguration (engine module) is justified since both server and ha-raft modules already depend on it for configuration. Both PluginManager.isHAEnabled() and RaftHAPlugin.isRaftEnabled() now use it consistently, which is exactly right.

Follower forwarding removal - this is the right call. The old readResponse() used Scanner.next() (splits on whitespace, not newlines) and string-replaced \n with spaces, which could silently mangle JSON. More critically, the MalformedURLException crash when getLeaderAddress() returns null during election was a real reliability hazard. Removing the round-trip entirely and relying on the Raft group config replicated to every node is the correct Raft-native approach.

getAvailableLanguages() fix - returning implementations.keySet() instead of calling impl.getLanguage() on each value is clean and correct. The HashMap -> LinkedHashMap change makes the returned order deterministic, which is a nice bonus.


Potential Issues

isBlank() vs isEmpty() consistency - isHAImplicitlyEnabled() uses isBlank() (trims whitespace), and validateConfiguration() in RaftHAPlugin was updated to use isBlank() as well. Good. Just worth confirming there are no other callers of HA_SERVER_LIST that still use isEmpty() that could produce inconsistent behavior (e.g. a list of only spaces would pass isHAImplicitlyEnabled() but then fail validateConfiguration() with a RuntimeException - which is acceptable since validateConfiguration() guards actual startup).

ha.getStats() on followers - the old code forwarded to the leader specifically because followers lacked the full replica list. The PR relies on ha.getStats() returning complete peer information on every node. The integration test (serverInfo() in RaftHTTP2ServersIT) only calls the endpoint on servers[0]. If that server is always the leader in the test setup, follower behavior isn't being explicitly verified here. It might be worth adding an assertion that hits both servers (or at least logging which server is leader vs follower in the test).


Style (per CLAUDE.md)

Multi-line Javadoc on isHAImplicitlyEnabled() - CLAUDE.md says "Don't write multi-paragraph docstrings or multi-line comment blocks - one short line max." The 5-line block comment should be a single line or removed entirely (the method name is self-documenting):

// Non-blank HA_SERVER_LIST implies HA intent even without HA_ENABLED=true
public boolean isHAImplicitlyEnabled() {

Multi-line comments in QueryEngineManagerLanguagesTest - the class-level Javadoc and the inline comments inside availableLanguagesIncludeBuiltInEngines() and availableLanguagesIncludeBothCypherAliasesWhenOpenCypherIsAvailable() are multi-line blocks. Per project style, the explanatory prose belongs in the PR description rather than in code comments.

Inline comment in RaftHTTP2ServersIT - same issue: the // With Raft, every node knows... comment is a multi-line explanation that CLAUDE.md discourages.


Test Coverage

  • QueryEngineManagerLanguagesTest cleanly pins the no-duplicates contract. The conditional check (if (languages.contains("opencypher"))) is the right way to handle optional classpath engines.
  • RaftHTTP2ServersIT.serverInfo() additions are the critical regression guard for the main bug. The isPositive() check on replicas.length() is correct for a 2-server setup.
  • HTTPDocumentIT#serverClusterInfo continuing to pass (no ha section on single-server) confirms the negative case is intact.

The one gap noted above (follower vs leader in serverInfo()) aside, coverage is solid.


Minor

  • Removing the unused java.net.* import in GetServerHandler is a clean-up worth keeping.
  • validateConfiguration() upgrading from isEmpty() to isBlank() is strictly more correct and consistent with isHAImplicitlyEnabled().

Summary: The fix is correct and well-tested. The main actionable items are: (1) verify serverInfo() hits a follower node at least once in the integration test, and (2) collapse the multi-line comment blocks to single lines per project style. Neither is a blocker.

Three cases pinned: unset HA_SERVER_LIST → false, blank/whitespace-only → false,
configured value → true. This is the boolean that gates the implicit-enable
path; testing it directly removes the need to spin up a Raft server in tests
just to exercise the discovery logic.
@claude

claude Bot commented May 19, 2026

Copy link
Copy Markdown

Code Review

Overview

This PR fixes three distinct bugs introduced after the HAServer -> RaftHAPlugin migration:

  1. Missing ha section when HA was configured via HA_SERVER_LIST only (without HA_ENABLED=true)
  2. NPE/crash on followers when getLeaderAddress() returned null during leader election
  3. Duplicate opencypher entry in the available-languages list

The root causes are correctly identified and the fixes are targeted and clean.


Positive Aspects

  • Single source of truth: centralising the implicit-HA check in ContextConfiguration.isHAImplicitlyEnabled() avoids the two callers (PluginManager, RaftHAPlugin) duplicating the same logic independently.
  • Follower-forwarding removal: the deleted code had a real bug - Scanner.next() tokenizes by whitespace and tokens were concatenated with no delimiter, meaning the reconstructed JSON could be malformed. Removing ~30 lines of fragile HTTP round-trip code in favour of Raft's native peer awareness is a clear win.
  • isEmpty() -> isBlank() in validateConfiguration is a quiet correctness improvement.
  • getAvailableLanguages() fix is semantically correct: returning keySet() surfaces both cypher and opencypher as distinct aliases rather than getLanguage() returning opencypher twice.
  • Test coverage is good: unit tests for isHAImplicitlyEnabled(), a new test class for getAvailableLanguages(), and the regression assertion in RaftHTTP2ServersIT.

Concerns and Suggestions

1. Leaky abstraction in the engine module's Javadoc (minor)

ContextConfiguration lives in the engine module, but its new Javadoc says:

Combined with an explicit HA_ENABLED=true, this determines whether the Raft plugin is discovered and started.

The engine module should not know about Raft. The Javadoc describes a configuration convention; the detail about which plugin acts on it belongs in ha-raft. Suggested rewording:

/**
 * Returns {@code true} when HA is implicitly opted-in by a non-blank {@code HA_SERVER_LIST}.
 * Callers (PluginManager, HA plugins) combine this with an explicit {@code HA_ENABLED=true}
 * to decide whether to activate HA. Kept here as the single source of truth.
 */

2. getAvailableLanguages() - semantic shift worth noting (minor)

The old contract was "return each factory's canonical getLanguage() name". The new contract is "return every registered alias". Both cypher and opencypher now appear, which is correct, but callers that previously expected only canonical names now receive aliases too. If this method has downstream callers outside the tests (e.g., HTTP info endpoint, Studio), verify they handle extra entries gracefully.

3. Follower-awareness assumption in test comment

The added comment in RaftHTTP2ServersIT:

// With Raft, every node knows all peers from the group config, so the replicas array
// is populated even on followers without the old leader-forwarding round trip.

This is a strong architectural claim. A pointer to the class/method in the Raft implementation that guarantees this (where the peer list is populated on followers) would help future maintainers verify it has not regressed.

4. Null check in isHAImplicitlyEnabled() (very minor)

HA_SERVER_LIST defaults to "" in GlobalConfiguration, so getValueAsString(...) will never return null. The null-guard is harmless but implies a possibility that cannot occur with the current default. Could simplify to:

public boolean isHAImplicitlyEnabled() {
  return !getValueAsString(GlobalConfiguration.HA_SERVER_LIST).isBlank();
}

Keeping it is fine for defensive programming - just a cosmetic point.

5. Vacuous test guard (nit)

if (languages.contains("opencypher"))
    assertThat(languages).contains("cypher");

If OpenCypher is absent from the classpath, the test passes without asserting anything. Consider adding a short comment explaining this is intentional for engine-only test runs without the opencypher dependency on the classpath.


Summary

The fix is correct, well-scoped, and the regression tests are solid. None of the concerns above block merging. The most actionable improvement is the Javadoc wording in point 1 - keeping Raft-specific language out of the engine module's API surface.

@codecov

codecov Bot commented May 19, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 44.44444% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.67%. Comparing base (9f73fdf) to head (1ecc2f5).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...java/com/arcadedb/server/ha/raft/RaftHAPlugin.java 0.00% 1 Missing and 2 partials ⚠️
...c/main/java/com/arcadedb/ContextConfiguration.java 50.00% 0 Missing and 1 partial ⚠️
...java/com/arcadedb/server/plugin/PluginManager.java 50.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4266      +/-   ##
==========================================
- Coverage   64.67%   64.67%   -0.01%     
==========================================
  Files        1644     1644              
  Lines      127293   127266      -27     
  Branches    27277    27277              
==========================================
- Hits        82328    82310      -18     
+ Misses      33447    33433      -14     
- Partials    11518    11523       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…r runs

Before this PR's implicit-enable change, BaseGraphServerTest set HA_SERVER_LIST
unconditionally and gated HA on HA_ENABLED=true. That was a no-op for
single-server tests because the discovery logic only looked at HA_ENABLED.

Once ContextConfiguration.isHAImplicitlyEnabled() started treating a non-blank
HA_SERVER_LIST as an HA opt-in (the user-facing fix in this PR), every
single-server test inheriting BaseGraphServerTest accidentally auto-started
RaftHAPlugin, with a one-peer server list. Writes against a one-node Raft
group time out / fail with "Group commit entry failed", surfacing as the
gRPC test failures CI flagged on ArcadeDbGrpcServiceExtendedTest,
GrpcServerIT, and Issue4214InsertStreamConflictIgnoreIT.

The fix is to set HA_SERVER_LIST and HA_ENABLED only when getServerCount() > 1
so single-server tests behave as they did before the implicit-enable change.
@claude

claude Bot commented May 19, 2026

Copy link
Copy Markdown

Code Review - PR #4266

The PR correctly identifies and fixes three independent regressions introduced after the HAServer -> RaftHAPlugin migration. The root cause analysis in the description is accurate and the structural approach (centralising implicit HA detection in ContextConfiguration) is the right call.


Correctness

Follower replica list (GetServerHandler) - Verified. RaftHAServer.getStats() iterates raftGroup.getPeers() (line 668), which is the Raft group configuration maintained by every node regardless of leadership. Removing the follower-to-leader HTTP round-trip is safe and also eliminates the crash path where getLeaderAddress() returns null during leader election.

Implicit HA enablement - Placing the logic in ContextConfiguration.isHAImplicitlyEnabled() as a single source of truth is correct. ContextConfiguration holds the values; PluginManager and RaftHAPlugin both resolve through it with no logic duplication.

getAvailableLanguages() duplicate - Using implementations.keySet() is the right fix. The old iteration over values() called factory.getLanguage() and got "opencypher" twice (both the "cypher" and "opencypher" keys point at the same factory instance). The new approach returns the registered aliases directly, which is what callers actually pass to getQueryEngine(). The LinkedHashMap change preserves insertion order as a bonus.

isEmpty() -> isBlank() - Consistent with isHAImplicitlyEnabled() which also uses isBlank(). Good catch.

Security improvement - The removed follower-forwarding code was forwarding the client's raw Authorization header from the follower to the leader in a server-side HTTP call. Removing it eliminates both the crash risk and this credential-forwarding pattern.


Suggestions

1. Test only exercises one server endpoint for the HA assertion

RaftHTTP2ServersIT.serverInfo() connects to a single server (presumably server 0). After the fix, the interesting regression is that a follower now returns the replicas list without the leader-forwarding round trip. Adding an assertion against the follower's endpoint (server index 1) would pin that behaviour and be the strongest regression test for the removed code.

2. Conditional assertion in QueryEngineManagerLanguagesTest

if (languages.contains("opencypher"))
    assertThat(languages).contains("cypher");

The engine module always has OpenCypher on its classpath (it lives in engine/src/main/java/com/arcadedb/query/opencypher/), so the if guard is never false in this test context. The conditional form silently passes if the engine ever stopped registering opencypher, hiding a regression. An unconditional assertion is safer:

assertThat(languages).contains("opencypher", "cypher");

3. Minor: assertThat(ha.has(...)).isTrue() vs descriptive matchers (optional)

These are fine as-is per existing project style. One AssertJ alternative that gives a cleaner failure message:

assertThat(ha.keySet()).contains("clusterName", "leader", "network");

Not a blocker.


BaseGraphServerTest fix

The guard added around HA_SERVER_LIST / HA_ENABLED in startServers() is necessary and correctly reasoned. Without it, every single-server test would now trigger Raft startup via isHAImplicitlyEnabled(). The comment explains a non-obvious invariant introduced by this PR so it earns its place.


Summary

The core fixes are correct, safe, and well-tested. The main item worth adding before merge is a follower-side assertion in RaftHTTP2ServersIT to cover the behaviour that motivated removing the leader-forwarding code. The conditional Cypher alias test is a minor, low-cost improvement. Everything else looks good to merge.

The 'opencypher' factory lives in engine/src/main and is always on the
classpath of this test, so the 'if (languages.contains("opencypher"))'
guard could only ever hide a regression where OpenCypher stopped being
registered. Asserting on both aliases unconditionally turns that into
a hard failure.
@robfrank

Copy link
Copy Markdown
Collaborator Author

Items 2 and 3 addressed, item 1 is a misread of the test.

1. Follower-side assertion - already covered. RaftHTTP2ServersIT.serverInfo() wraps the body in testEachServer(serverIndex -> ...), which BaseGraphServerTest defines as for (int i = 0; i < getServerCount(); i++) callback.execute(i). With the default 2-server setup the test hits 127.0.0.1:2480 (leader) and 127.0.0.1:2481 (follower) and runs the same ha.network.replicas assertions against both. The removed leader-forwarding path is exercised against the follower on every run.

2. Conditional Cypher assertion - fixed in 1ecc2f5. The if (languages.contains("opencypher")) guard could only ever hide a regression since OpenCypher lives in engine/src/main, so the assertion is now unconditional: assertThat(languages).contains("opencypher", "cypher").

3. keySet().contains(...) style - leaving as assertThat(ha.has("x")).isTrue() to match the surrounding style in the test class.

Local: QueryEngineManagerLanguagesTest 3/3.

@claude

claude Bot commented May 19, 2026

Copy link
Copy Markdown

Code Review - PR #4266

This is a clean, well-scoped fix for three distinct but related regressions. The PR description is thorough and the root-cause analysis is correct.


Overview

The PR fixes three distinct issues:

  1. /api/v1/server?mode=cluster returning no ha section for users who rely on implicit HA enablement via HA_SERVER_LIST
  2. A crash / wrong data on followers caused by forwarding to the leader over HTTP (which breaks when getLeaderAddress() returns null during leader election)
  3. Duplicate opencypher in the list of available query languages

Strengths

  • Correct root-cause analysis on all three issues and clean targeted fixes
  • |Good centralization** - isHAImplicitlyEnabled() lives in ContextConfiguration as a single source of truth; both PluginManager and RaftHAPlugin delegate to it rather than duplicating logic
  • Security improvement - removing follower-to-leader HTTP forwarding eliminates the exposure of client Authorization headers in internal server-to-server calls
  • Necessary guard in BaseGraphServerTest - without conditionally setting HA config only when getServerCount() > 1, the new implicit-enable logic would start Raft for every single-server test
  • HashMap to LinkedHashMap for implementations gives deterministic language ordering without extra cost
  • Test coverage is solid: unit tests for all three new behaviors plus an extended integration test

Issues

1. Multi-line comment blocks violate project style

CLAUDE.md says: "Never write multi-paragraph docstrings or multi-line comment blocks - one short line max."

The new Javadoc on isHAImplicitlyEnabled() is a 4-line block comment. Trim it to a single line, or remove it entirely since the method name is self-documenting. The class-level Javadoc on QueryEngineManagerLanguagesTest and the multi-line inline comment in BaseGraphServerTest have the same issue.

2. Inline comments inside test methods explain WHAT, not WHY

The comment in availableLanguagesIncludeBuiltInEngines describes the test setup rather than a non-obvious invariant. Removing it keeps the test readable without violating the style guide.

3. getAvailableLanguages() behavioral change for callers expecting canonical names

The old code returned impl.getLanguage() (canonical names from the factory); the new code returns implementations.keySet() (registration aliases). For the opencypher/cypher pair, the old broken behavior was returning "opencypher" twice; the new behavior returns both "opencypher" and "cypher" as distinct entries. Worth confirming no callers depend on canonical-name-only semantics before merging.

4. No integration test for the implicit-enable path itself

The regression being fixed (HA_SERVER_LIST set but HA_ENABLED not explicitly true) is not exercised by any of the updated integration tests - BaseGraphServerTest now explicitly sets HA_ENABLED=true when getServerCount() > 1. The unit test in ContextConfigurationTest covers isHAImplicitlyEnabled() in isolation, but there is no end-to-end test that boots a 2-server cluster relying solely on HA_SERVER_LIST without HA_ENABLED=true. Consider adding at least one such test, or noting that this path is covered by the manual test plan.

5. ContextConfiguration coupling (minor)

Placing HA-specific detection logic in ContextConfiguration (engine module) couples a low-level component to HA concepts. It is justified here as the single source of truth between server and ha-raft, and ContextConfiguration already owns HA_ENABLED/HA_SERVER_LIST values. Just worth keeping in mind if HA detection logic grows more complex over time.


Summary

The core logic is correct and the test coverage is good. The main actionable items are: trim the multi-line comment blocks to conform to project style, and consider whether the implicit-enable path needs an end-to-end integration test. Neither is a blocker.

@robfrank robfrank merged commit d4c3c5a into main May 19, 2026
25 of 27 checks passed
@robfrank robfrank deleted the fix/4261-cluster-api-not-working branch May 19, 2026 18:38
@hoogenbj

Copy link
Copy Markdown

Hi there
I checked out the latest code and made a snapshot build. This appears to still be broken:

{"user":"root","version":"26.6.1-SNAPSHOT (build 69187633b39f46929f281a48580722f7b6988fe3/1779359494698/main)","serverName":"splitter-0","languages":["js","java","sql","sqlscript","opencypher","cypher","gremlin"]}

Command I used was:

curl -s http://redacted:2480/api/v1/server?mode=cluster -u root:<redacted>

@robfrank robfrank added this to the 26.6.1 milestone May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

API BUG - /api/v1/server?mode=cluster no longer works

2 participants